I had found pile of MySQL and PostgreSQL benchmarks on various platforms which I have not seen before. Very interesting reading.
It does not share too much information about how MySQL or PostgreSQL was configured or about queries. Furthermore MySQL and PostgreSQL has a bit different implementations (ie SubQueries avoided for MySQL) so do not just compare it directly.
It also does not mention if Innodb or MyISAM tables are used – it turns out Both are used in the benchmark. This is CPU bound benchmark with working set fitting in memory.
MySQL and PostrgreSQL Scalability on Xeon Woodcrest, Opteron and Niagra
Pretty interesting to see how PostgreSQL scales just as systems should scale in theory – gradually goes up with number of threads about matches number of Cores/Threads and stays at this level at higher concurrency. MySQL with Innodb shows its ugly face and drops pretty quickly as concurrency growths with peak at about number of CPUs. I guess this is lucky case as Innodb may well start to slow down before concurrency reaches number of CPUs.
Yes, Innodb Team has provided the fix for this scalability problem and it is merged into MySQL 5.0.30 “Enterprise” but according to the tests I’ve done so far it is far from full solution yet.
It is also interesting to see CPU comparison in this test. Woodcrest has best performance in this test (and in many other MySQL tests), Opteron comes second and older Intel Xeons as well as Niagra being outsiders.
Niagra scalability is one more interesting story. As you can see MySQL 4.1 actually scaled pretty well with Niagra, suffering slow regression with increased concurrency rather than quick drop. In MySQL 5.0 it is changed dramatically – it climbs to higher peak but it drops down very quickly as well as concurrency growths. It is seen much better on this picture
Linux vs Solaris comparison is also pretty interesting. With MySQL Linux has higher peak but Solaris suffers less with increased concurrency.
Note: I have not validated these benchmarks and as I already mentioned they do not have full disclosure. They however do match my own experience with MySQL so I tend to trust PostgreSQL data points as well.
The InnoDB concurrency scaling issue was unknown to me before reading your blog.
It seems amazing to me that a feature/engine in MySQL that is touted as a perfect replacement for enterprise solutions in regards to Oracle, MSSQL Server, DB2 etc is so poor at scaling on multiple processors.
Surely enterprise software houses have used multi-cpu based systems for years, why hasn’t InnoDB’s scaling been fixed? Is the scaling issue with InnoDB new?
If the issue has been with InnoDB for a long time, how can MySQL expect enterprise level clients switch to MySQL? The money such clients would save on transaction costs would justify using Oracle/MSSQL/DB2 etc?
Odin,
I’m wrong person to ask the question 🙂 I can confirm Innodb scaling issue was known for years. In fact you can even get it with as few as 2 CPUs and significant number connections – I have been running into problems with it in as far back as 2002.
The interesting thing with this problem is – it is worse with many CPUs/Cores so it started to get a lot of attention recently as these came to commodity market.
Now even if Benchmark results look very bad it does not mean MySQL does not scale in practice. First in practice you rarely would have 50 users hammering database as fast as they can, there is usually significant think delay which limits true concurrency placed on database. Even for very high loaded sites you frequently can see only few queries in the process list which are being executed. Second – this benchmark is CPU bound, high concurrency often comes together with significant IO simply as wait increases in this case. In such cases problem does not show up itself as the problem is in mutex competition which is low when IO is happening. Third – MySQL is all about “Scale Out” and it is true Innodb simply was not good about scaling UP.
The bottom line is – despite these problems you can build scalable architecture with MySQL and we know all many examples when is done. However you should ever keep these problems into account or simply be lucky and have this problem avoiding you. Hopefully it will be fixed soon.
Peter,
“First in practice you rarely would have 50 users hammering database as fast as they can”
Isn’t that basically saying that you don’t need to scale? You can’t scale with serial queries.
Jeff,
No it does not mean you do not need to scale. What I mean users, for example web users are not just issuing requests one after another in the loop but have large delay between requests. For example there can be 1000 users on the site which request page generation every few seconds, each requiring few queries which take 5ms in total. So for each user we’ll have 5ms query execution vs 5sec between the queries.
Now regarding scaling with serial queries – you should increase concurrency as it helps you to increase total throughtput but increasing it much further is bad idea anyway. Let me illustrate it.
Lets say you have CPU bound workload with 4cores and bunch of queries which require one second of CPU time to execute. So you can execute 4 queries/sec on such system.
Now if you execute it as 4 queries in parallel you will have throughput of 4 queries/sec with response time of 1 query per second.
Now if you instead allow this system to run 100 queries in the parallel at the same time and it will execute them in parallel in fair way you would have each of the queries taking 25 seconds, while throughput remains same 4 queries per second.
What do you think is better for the application performance ?
Peter,
I understand what you’re saying, and it’s a valid point. However, often there are many factors aside from just a single path over the CPU. We tend to think of computers as serial, but in fact they are highly parallel even with a single CPU.
While query1 is waiting for disk block X, and query2 is waiting for disk block X+1, query3 may be on the CPU running an aggregate function on some information that’s in main memory. The computer may then be able to optimize the disk read into a single sequential read of both blocks. With high concurrency, random disk I/O on a smart controller turns into nice large sequential reads, which are much faster per-block. Also, think about the following block fetch order (abstract numbers, not to scale): 31, 38, 1, 3, 38, 31. In serial execution, that may require dropping 31 & 38 out of cache to get 1 & 3, leading to double-reads of blocks. However, in parallel execution, it knows more information about what blocks are needed before the disk read is completed. With this extra information, it may be able to read each block only once and return them to the queries that need them.
This is just one example of how parallelism allows for smarter caches. More queries being executed simultaneously means you know more information about the work needed to be done. This extra information makes all the pieces smarter: the db cache, the os cache, the controller cache, etc. If MySQL is able to acheive great performance with a parallelism of 4, that’s good. However, if it declines sharply after that, it makes me think that there is a lot left to be gained. We should see benchmarks like this as an indication of what can be improved, we might be pleasantly surprised and find that it can get much better overall throughput.
This benchmark might be actually under-representing the problem because it’s a CPU-centric benchmark.
Oh, and just to be complete, if the performance is better at concurrency 4, why not just serialize any work beyond concurrency 4? That is, why not say that connect 10 has to wait for connection 2 to be idle before it gets to run?
Jeff,
First this benchmark is about CPU bound case. I guess I specially mentioned results will be different for IO bound scenario. Second even for IO bound benchmark there will be concurrency where performance will be optimal.
You make right point about possibility of merging requests because there are close together – this is so in theory, in practice it is less of the issue because there are read-aheads on various levels and simply because probability is small for large databases (and small fit in memory anyway) – count for example with 16GB database and uniform access distribution how many outstanding requests do you need for probability of them bring close to each other to become significant ?
There is also flip side of increased concurrency for the disks – multiple sequential scans going in parallel convert sequential reads to random reads and all can become much slower.
Regarding drop with high concurrency level – I agree with you it is extremely bad, I’m just explaining why people still build scalable applications with MySQL.
Speaking about work serialization – you’re right. There is even innodb_thread_concurrency and innodb_commit_concurrency variables in MySQL which allow you to do just that. In the benchmarks these were not set to best values for this load. In practice however it will be different for different workloads so it may be hard to tune. And it does not solve problem completely ether.
http://tweakers.net/reviews/649/5
Exactly which setting/parameter are they speaking about in the text?
”
Settings can also cater for huge differences: for example, a 5.0.18 version that was tweaked by Sun was a good deal faster than our original configuration, although only one parameter had been altered.
“
Apachez,
I do not know you should ask them 🙂 From the text it is not clear even if it is compilation parameter or run time parameter.
There is a patch which changes how Innodb mutexes work which makes it faster on Solaris but it is patch not just configuration.
Peter,
I can answer that question. First of all we had this t2000 from the try ‘n buy that sun offers, through this we came in contact with them, and they offered help in fine-tuning mysql and solaris. We have years experience with mysql on linux and we had to learn solaris for these benches, with that those sun’s were a great help.
They noticed that on solaris MySQL used a lot of time on sorts, more than on linux, so the configuration setting that was changed was the sort_buffer_size, we tuned it down from 2M to 1M, which somehow gave a big increase in performance. Further we tried with different schedulers, some settings in /etc/System – which are already set to the right settings in the newest solaris releases, we just had an older solaris to start with, which got replaced by sun fast so we were testing on the latest release. To bad i cant recall all changes anymore, we changed a lot and than changed them back (we did several hunderd of hours testing settings alone). The result in the graphs are of this testing, not only the result of the ‘one parameter’ (note to self: should explain that more clearly in our upcoming reviews, got some quad-cores, and getting a x4600 from sun soon for new reviews)
In the end Hans (the sun employee) commited some changes to the Studio10 compiler that should increase innodb and mysql performance on solaris even more, but that was after the release of the initial review.
We test on a lot of different (hardware and software) platforms, so we decided to use a database that could be loaded in 4-8g memory, so we didnt have to keep the io system the same for all platforms, and if you do io-bound benches you are testing your io-system, and not the CPU, and we wanted to test the CPU.
Thanks,
Yes I can see why changing sort_buffer can give such effect. Larger allocation can be allocated via mmap() rather than by standard memory allocator which may be relatively expensive as it requires OS syscall plus page table modifications.
I think this explanation is wrong. mmap() call takes only a few microseconds, it’s insignificant comparing to time spent in sorting.
IMO the real reason is cache miss ratio. If a sort window doesn’t fit your cache, a lot of key swapping operations result in cache misses. If it does, you sort a chunk of data really quick, then drop it into main memory (tmp table which isn’t neccessarily written out to disk), and after all chunks are sorted, do a single pass scan.
Alexey,
It actually depends. Sort buffer can be allocated while only couple of rows can be sorted. The slowdown can be very well seen with subqueries which use order by where millions of sorts would happen.
Regarding sort and cache fit this would only apply to rather large data sets – if sort set fits in 1MB and 2MB of sort buffer is allocated still only small portion of it would be touched. If data is in 1MB-2MB range you would have disk based sort if your sort buffer was just 1MB which is likely worse than few key misses.
Two more reasons why I think this is not just cache is how QuickSort works – most of the operations are done local to the small blocks, and only last pass would traverse all data set and the fact it is mention to only apply to Solaris, not other operation systems.
But I can be wrong about guess about mmap() in this particular case – it could be something else. Profiling could show.
Kees, re sort_buffer_size, see http://bugs.mysql.com/bug.php?id=21727 for a particularly unpleasant case that explains what may have been happening. You may find that the fix for that improves things if you are sorting in subqueries. You’ll need to get it from BK since it’s not yet in any release. Since you can adjust the sort_buffer_size for each query if you want to tune things you could consider that, though it’s going a bit further than most people would go.
It would be good if you said more about your software configuration. For example, Peter wrote that you’re using InnoDB but I normally assume MyISAM and not knowing which engine(s) are in use can make it hard to know what the results mean, if anything. Your comment does seem to say that you’re using InnoDB.
Solaris has a well-designed and matured threading library as opposed to Linux. Maybe that’s one of the reasons for its performance benchmark.
Do you have any numbers on Mysql(MYISAM) vs Postgresql?