I had found pile of MySQL and PostgreSQL benchmarks on various platforms which I have not seen before. Very interesting reading.
It does not share too much information about how MySQL or PostgreSQL was configured or about queries. Furthermore MySQL and PostgreSQL has a bit different implementations (ie SubQueries avoided for MySQL) so do not just compare it directly.

It also does not mention if Innodb or MyISAM tables are used – it turns out Both are used in the benchmark. This is CPU bound benchmark with working set fitting in memory.

MySQL and PostrgreSQL Scalability on Xeon Woodcrest, Opteron and Niagra
Pretty interesting to see how PostgreSQL scales just as systems should scale in theory – gradually goes up with number of threads about matches number of Cores/Threads and stays at this level at higher concurrency. MySQL with Innodb shows its ugly face and drops pretty quickly as concurrency growths with peak at about number of CPUs. I guess this is lucky case as Innodb may well start to slow down before concurrency reaches number of CPUs.

Yes, Innodb Team has provided the fix for this scalability problem and it is merged into MySQL 5.0.30 “Enterprise” but according to the tests I’ve done so far it is far from full solution yet.

It is also interesting to see CPU comparison in this test. Woodcrest has best performance in this test (and in many other MySQL tests), Opteron comes second and older Intel Xeons as well as Niagra being outsiders.

Niagra scalability is one more interesting story. As you can see MySQL 4.1 actually scaled pretty well with Niagra, suffering slow regression with increased concurrency rather than quick drop. In MySQL 5.0 it is changed dramatically – it climbs to higher peak but it drops down very quickly as well as concurrency growths. It is seen much better on this picture

Linux vs Solaris comparison is also pretty interesting. With MySQL Linux has higher peak but Solaris suffers less with increased concurrency.

Note: I have not validated these benchmarks and as I already mentioned they do not have full disclosure. They however do match my own experience with MySQL so I tend to trust PostgreSQL data points as well.

15 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Odin

The InnoDB concurrency scaling issue was unknown to me before reading your blog.

It seems amazing to me that a feature/engine in MySQL that is touted as a perfect replacement for enterprise solutions in regards to Oracle, MSSQL Server, DB2 etc is so poor at scaling on multiple processors.

Surely enterprise software houses have used multi-cpu based systems for years, why hasn’t InnoDB’s scaling been fixed? Is the scaling issue with InnoDB new?

If the issue has been with InnoDB for a long time, how can MySQL expect enterprise level clients switch to MySQL? The money such clients would save on transaction costs would justify using Oracle/MSSQL/DB2 etc?

Jeff

Peter,

“First in practice you rarely would have 50 users hammering database as fast as they can”

Isn’t that basically saying that you don’t need to scale? You can’t scale with serial queries.

Jeff

Peter,

I understand what you’re saying, and it’s a valid point. However, often there are many factors aside from just a single path over the CPU. We tend to think of computers as serial, but in fact they are highly parallel even with a single CPU.

While query1 is waiting for disk block X, and query2 is waiting for disk block X+1, query3 may be on the CPU running an aggregate function on some information that’s in main memory. The computer may then be able to optimize the disk read into a single sequential read of both blocks. With high concurrency, random disk I/O on a smart controller turns into nice large sequential reads, which are much faster per-block. Also, think about the following block fetch order (abstract numbers, not to scale): 31, 38, 1, 3, 38, 31. In serial execution, that may require dropping 31 & 38 out of cache to get 1 & 3, leading to double-reads of blocks. However, in parallel execution, it knows more information about what blocks are needed before the disk read is completed. With this extra information, it may be able to read each block only once and return them to the queries that need them.

This is just one example of how parallelism allows for smarter caches. More queries being executed simultaneously means you know more information about the work needed to be done. This extra information makes all the pieces smarter: the db cache, the os cache, the controller cache, etc. If MySQL is able to acheive great performance with a parallelism of 4, that’s good. However, if it declines sharply after that, it makes me think that there is a lot left to be gained. We should see benchmarks like this as an indication of what can be improved, we might be pleasantly surprised and find that it can get much better overall throughput.

This benchmark might be actually under-representing the problem because it’s a CPU-centric benchmark.

Oh, and just to be complete, if the performance is better at concurrency 4, why not just serialize any work beyond concurrency 4? That is, why not say that connect 10 has to wait for connection 2 to be idle before it gets to run?

Apachez

http://tweakers.net/reviews/649/5

Exactly which setting/parameter are they speaking about in the text?


Settings can also cater for huge differences: for example, a 5.0.18 version that was tweaked by Sun was a good deal faster than our original configuration, although only one parameter had been altered.

kees

Peter,

I can answer that question. First of all we had this t2000 from the try ‘n buy that sun offers, through this we came in contact with them, and they offered help in fine-tuning mysql and solaris. We have years experience with mysql on linux and we had to learn solaris for these benches, with that those sun’s were a great help.
They noticed that on solaris MySQL used a lot of time on sorts, more than on linux, so the configuration setting that was changed was the sort_buffer_size, we tuned it down from 2M to 1M, which somehow gave a big increase in performance. Further we tried with different schedulers, some settings in /etc/System – which are already set to the right settings in the newest solaris releases, we just had an older solaris to start with, which got replaced by sun fast so we were testing on the latest release. To bad i cant recall all changes anymore, we changed a lot and than changed them back (we did several hunderd of hours testing settings alone). The result in the graphs are of this testing, not only the result of the ‘one parameter’ (note to self: should explain that more clearly in our upcoming reviews, got some quad-cores, and getting a x4600 from sun soon for new reviews)

In the end Hans (the sun employee) commited some changes to the Studio10 compiler that should increase innodb and mysql performance on solaris even more, but that was after the release of the initial review.

We test on a lot of different (hardware and software) platforms, so we decided to use a database that could be loaded in 4-8g memory, so we didnt have to keep the io system the same for all platforms, and if you do io-bound benches you are testing your io-system, and not the CPU, and we wanted to test the CPU.

Alexey

I think this explanation is wrong. mmap() call takes only a few microseconds, it’s insignificant comparing to time spent in sorting.
IMO the real reason is cache miss ratio. If a sort window doesn’t fit your cache, a lot of key swapping operations result in cache misses. If it does, you sort a chunk of data really quick, then drop it into main memory (tmp table which isn’t neccessarily written out to disk), and after all chunks are sorted, do a single pass scan.

James Day

Kees, re sort_buffer_size, see http://bugs.mysql.com/bug.php?id=21727 for a particularly unpleasant case that explains what may have been happening. You may find that the fix for that improves things if you are sorting in subqueries. You’ll need to get it from BK since it’s not yet in any release. Since you can adjust the sort_buffer_size for each query if you want to tune things you could consider that, though it’s going a bit further than most people would go.

It would be good if you said more about your software configuration. For example, Peter wrote that you’re using InnoDB but I normally assume MyISAM and not knowing which engine(s) are in use can make it hard to know what the results mean, if anything. Your comment does seem to say that you’re using InnoDB.

Aria Kokoschka

Solaris has a well-designed and matured threading library as opposed to Linux. Maybe that’s one of the reasons for its performance benchmark.

Senthil

Do you have any numbers on Mysql(MYISAM) vs Postgresql?