As we can see, MySQL 5.5.8 comes with great improvements and scalability fixes. Adding up all the new features, you have a great release. However, there is one area I want to touch on in this post. At Percona, we consider it important not only to have the best peak performance, but also stable and predictable performance. I refer you to Peter’s post, Performance Optimization and Six Sigma.
In Percona Server (and actually even before that, in percona-patches builds for 5.0), we added adaptive checkpoint algorithms, and later the InnoDB-plugin included an implementation of  “adaptive flushing”. This post shows the differences between them and MySQL.
The post also answers the question of whether we are going to have releases of Percona Server/XtraDB based on the MySQL 5.5 code line. The answer: Yes, we are. My benchmarks here are based on Percona Server 5.5.7. (You can get the source code from lp:~percona-dev/percona-server/5.5.7 , but it is very beta quality at the moment.)
For this post, I made tpcc-runs on our Dell PowerEdge R900 box, using RAID10 over 8 disks and a FusionIO 320GB MLC card.
First, the results for tpcc-mysql, 500w (around 50GB of data) on RAID10. I used innodb_buffer_pool_size=24G, innodb_log_file_size=2000M (innodb_log_files_in_group=2), and innodb_flush_log_at_trx_commit=2. Also, innodb_adaptive_flushing (ON) / innodb_adaptive_checkpoint (estimate) were the default values.
The raw results, full config files, and scripts are in our Benchmarks Wiki.
The graphical result below shows the throughput on the server over 8 hours. (Yes, 8 hours, to show MySQL performance over a long time period. It is not a short, 5-minute exercise.)
Although it takes a decent time for the Percona Server results to stabilize, for MySQL 5.5.8 we have regular dips (3 times per hour) from 24900 NOTPM to 17700 NOTPM (dips of around 30%).
Next, the second run on the FusionIO card. There I should say that we were not able to get stable results with the existing adaptive_checkpoint or adaptive_flushing algorithms. So, Yasufumi invested a lot of research time and came up with the new innodb_adaptive_checkpoint=”keep_average” method. This method requires setting innodb_flush_neighbor_pages=0 , to disable flushing of neighborhood pages (not available in MySQL 5.5.8). The problem with flushing neighborhood pages is that it makes an exact calculation of how many pages were handled impossible. The flushing neighborhoods feature was created as an optimization for hard drives, since InnoDB tries to combine writing as many pages as possible into a single sequential write, which means that a single I/O may have a size of 32K, 64K, 96K, …, etc. And again, that makes a prediction of how many I/O operations there are impossible. Furthermore, this optimization is not needed for flash devices, like FusionIO or Virident cards.
An additional optimization we have for SSDs is big log files. For this run, I used innodb_log_file_size=4G (innodb_log_files_in_group=2) for Percona Server. That gave 8GB in total size for log files (MySQL 5.5.8 has a 4GB limit). In additional to increasing log_size we added option innodb_log_block_size which allows to change IO block size for logs files. Default is 512 bytes, in test with FusionIO I use 4096 bytes, to align IO with internal FusionIO size.
You can see that MySQL 5.5.8 has periodic drops here, too. The margin between Percona Server and MySQL is about 2500-2800 NOTPM (~15% difference).
MySQL 5.5.8 now has features related to having several buffer pool instances that are supposed to fix the buffer pool scalability issue. Let’s see how MySQL performance changes for the last workload if we set innodb_buffer_pool_instances=8 or 16.
As you see, having several buffer pools makes the dips deeper and longer. It seems that for Percona Server the best choice is innodb_buffer_pool_instances=1, as we implemented buffer pool scalability in a different way.
UPDATE
By request from commenter I put also results with different innodb_io_capacity for MySQL 5.5.8. It is 500 ( which I used in benchmarks above), 4000 and 20000.
As you see there is no improvements from bigger innodb_io_capacity, and it also concurs with my previous experience, that with bigger io_capacity you rather getting worse results.
For reference, here is the config file used for benchmarks on FusionIO:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | [client] socket=/var/lib/mysql/mysql.sock [mysqld] core basedir=/usr/local/mysql user=root socket=/var/lib/mysql/mysql.sock skip-grant-tables server_id=1 local_infile=1 datadir=/mnt/fio320 innodb_buffer_pool_size=24G innodb_data_file_path=ibdata1:10M:autoextend innodb_file_per_table=1 innodb_flush_log_at_trx_commit=2 innodb_log_buffer_size=8M innodb_log_files_in_group=2 innodb_log_file_size=4G innodb_log_block_size=4096 innodb_thread_concurrency=0 innodb_flush_method = O_DIRECT innodb_read_ahead = none innodb_flush_neighbor_pages = 0 innodb_write_io_threads=8 innodb_read_io_threads=8 innodb_io_capacity=500 max_connections=3000 query_cache_size=0 skip-name-resolve table_cache=10000 [mysql] socket=/tmp/mysql.sock |
(post edited by Fred Linhoss)
A few questions:
1) Compare the 1st graph with the 2nd graph, you got around 25000 NOTPM on hard drives but less than 20000 NOTPM on FusionIO. I’d expect the results for FusionIO to be much higher than that of HDD, but that’s not the case here. Why is that?
2) the settings you used for FusionIO such as innodb_flush_neighbor_pages=0, would that also work for SATA SSD such as Intel X25?
3) You used innodb_io_capacity=500 for FusionIO. That implies an IO capacity of 500 IOPS, right? But FusionIO can do much higher than that. Why didn’t you use a higher value of innodb_io_capacity?
Hi Vadim,
few comments so far:
* any reason why innodb_io_capacity=500 ?.. – if I remember well, within XtraDB you’re not limiting flushing by I/O capacity, while 5.5.8 will respect I/O capacity limit and not flush more pages/sec than this limit.. (we discussed about during your previous tests on 5.5.4)
So, may you replay your tests with, say, innodb_io_capacity=20000 ? (I have no idea how intensive your workload on flushing ;-))
* Regarding tests with Buffer Pool instances: did you monitor on which mutexes a contention become more hot??.. – I’m very curious because on my tests I’ve obtained a completely opposite results.. – 16 instances are giving more than 50% better performance..
* Then with a REDO log block size it’s a good point – 4K is the most optimal I/O size for SSD! – however, I have a doubt it makes a huge difference while innodb_flush_log_at_trx_commit=2 is used.. So, to prove it you should present XtraDB results comparing innodb_log_block_size=4096 and 512, may you present them here?..
Very interesting results anyway – as they are involving many questions of all kind 😉
Rgds,
-Dimitri
Andy,
1) for FusionIO I use 1000W, and for hard drive 500W, that is FusionIO case has 2x more data.
2) innodb_flush_neighbor_pages=0 should have benefit for Intel SSD also
3) innodb_io_capacity is tricky parameter, it is not directly corresponds to IOS, in my benchmarks it did not show much benefit to go over 500, but as drawback you can easily saturate your IO system. But Dimitri has good point, I may try big value for MySQL 5.5.8.
Dimitri,
Ok, I should try MySQL 5.5 with big innodb_io_capacity. I just remember it did not do any good for Percona Server, but may be different.
I do not have stats for mutexes, sorry.
Regarding innodb_log_block_size and innodb_flush_log_at_trx_commit=2, I think log_sys->mutex, it still in hot list, and innodb_log_block_size=4096 allows to decrease time spent under mutex.
Vadim,
Thanks for the detailed post. Couple of comments/observations:
* I agree that adaptive_flushing needs more work. One of the things that we did in 5.5 is to make it more accurate. As you know master thread does the adaptive_flushing check every second in a loop. In InnoDB plugin for 5.1 if master thread ends up doing some flushing work in one iteration it won’t sleep and would do the next iteration right away. That means it can do multiple batches in one second. In 5.5 we tidied it up a little and forced master thread to sleep for ‘1 – time spent in flusing’ seconds. What that means is that adaptive_flushing will do a maximum of one innodb_io_capacity batch in one second. Seeing the pattern above and the capabilities of the system you are benchmarking on I’d suggest setting innodb_io_capacity to something more realistic than 500. I’d say something like 4000 won’t be a bad idea. adaptive_flushing is heavily dependent on innodb_io_capacity i.e.: the batch size is capped by this value.
* Assuming your workload to be much like dbt2, I don’t think purge should be a concern. But to keep master thread free of getting stuck in purge maybe it is a good idea to use innodb_purge_threads = 1.
Inaam,
Thanks, I accept your suggestions.
I did try innodb_purge_threads = 1, but I did not see much effect from it under this workload.
I am re-running tests with innodb_io_capacity=4000 and 20000 for FusionIO case and I will update post when I have it.
You set the log block size to 4K to align with FusionIO size. What about non-log IO such as IO for table and index data? Is it necessary to set block size for them too?
Andy,
Table and index data IO is 16K and it is not problem, as it is just 4 internal IO for FusionIO.
What is however is more problematic is system tablespace and doublewrite area, which can be
sequential writes ( I observed single IO like 200-250KB). It may be good idea to put ibdata1 ( without
tables and indexes) into different location, i.e. RAID10. And that’s why we have feature in Percona Server to set innodb_doublewrite_file= , to set doublewrite in different from ibdata1 location.
Vadim,
I thought all the doublewrite data is stored in ibdata1. Are you saying you can set it to be stored somewhere else? What is the name of the doublewrite file?
So tables and indexes should be stored in SSD, ibdata1 in RAID 10, and doublewrite file somewhere else other than ibdata1’s RAID 10? Can you clarify?
How should I use innodb_doublewrite_file=? Can you give an example?
Thanks.
Andy,
doublewrite is stored in ibdata1. so if you decided to put ibdata1 on RAID10 you do not need additional actions.
innodb_doublewrite_file= is available only in recent Percona Server releases.
you should use it like innodb_doublewrite_file=/data/raid10/ib_dbwl.ibd ,
in this case double write area will be located outside of ibdata1.
Vadim — is this a problem in Percona? That is, is it possible for threads handling user connections to request page writes when the async checkpoint limit has been reached? And if it does submit write requests, does it block for them to finish? That is the case for official InnoDB which means that foreground threads block on IO to be done by background threads and can add lots of pauses.
http://bugs.mysql.com/bug.php?id=55004
Dimitri, Inaam,
I updated post with result with different innodb_io_capacity.
I do not see much improvements with 4000, and it getting rather bad with 20000.
Vadim,
Thanks for update!
TPS drops are looking smaller with innodb_io_capacity=20000, but peak performance is lower due higher writes I suppose.. – and what will be a picture within 8 hours? (I was mainly curious on your 500w test, where during first 4 hours you have TPS drops on XtraDB too..)
BTW, how many concurrent user sessions were used during your test?
and how many CPU/cores do you have on your server?
Where may I get the percona-5.5.7 “tarball” to test it on my server?.. (I saw you’ve mentioned a LP trunk and the code is “very beta”, so I supposing it’s changing most of the time and if you may provide a source or binary tarball used within your tests it’ll simplify many things)
Thank you!
Rgds,
-Dimitri
Rgds,
-Dimitri
I would be very interested to see it compared with 5.1 version of Percona
Dimitri,
it is 16 cores system and I used 16 users.
I may run 16 hours tests, but it is all time consuming 🙂
I propose you to wait our patches available for 5.5.8, we will provide .tar.gz for that version.
Vadim,
Thanks for publishing results. Do you think it is possible to do something so Percona Server stabilizes quickly ? Does our Buffer Pool preload helps here ? I mean if I would use this option would I end up at almost stabilized state straight away or it is different self balancing stuff which is going on
Re innodb_io_capacity I think this option is “evil” to begin with. It would be much better to have database to balance to use capacity of storage it has available (probably self tuning based on IO performance it is getting) as there are many ways IO storage can vary a lot during run time. Think shared SAN storage, LVM snapshots or cloud for example.
Hi Vadim,
Thanks for sharing this amazing testing.Just one minor question: where can i get the above “tcpp-mysql” tool ?
Best Regards,
-Liang
Liang,
tpcc-mysql is available from launchpad
https://launchpad.net/perconatools
Mark,
It should not be problem in Percona, we should have it resolved in our adaptive_checkpoint methods.
Any particular reason you used different data for the HDD and FusionIO tests? It’s frustrating because you robbed us of an opportunity to compare the performance of the two types of storage.
Vadim,
it’s pity you’re presenting results for the code I’m not able to test :-))
I’ve get a look on LP – there are only patches, without any info what to do and how to apply them..
so, yes, I may only wait for your 5.5.8 tar.gz..
Regarding the tpcc-mysql: there is no README either how to run it :-))
I may suppose that I have to compile tpcc_load and tpcc_start, then load the data, and then run a test…
May you please, just give me 3 lines:
– how I create db schema
– how do I load the data (exactly params)
– how do I run the test (exactly params)
(when I think that you told me that dbSTRESS is hard to setup… ;-))
And the last: may you finally present the result comparing 512 and 4K bytes in redo log block size?.. – you just told it improves performance on SSD but without giving any numbers..
Thank you!
Rgds,
-Dimitri
Seun,
I’ve done about 100 different combinations, I basically can’t publish them all. The reader will be lost and it also requires a lot of time to prepare.
I published what I think is most interesting ones.
If you want to see direct comparison FusionIO vs HDD, you can see some data there
http://www.mysqlperformanceblog.com/2010/04/08/fast-ssd-or-more-memory/
and there
http://www.mysqlperformanceblog.com/2009/05/01/raid-vs-ssd-vs-fusionio/
Dimitri,
We have Makefile in our directory with patches, and make does all dirty job for applying patches.
tpcc-mysql is just intuitively easy to use 😉
but point is taken, I added README, please check.
I may publish innodb_log_block_size later.
Hi Vandim,
I have two questions regarding MySql 5.5.8:
1.) Previous versions of MySql had a size restriction of 4GB per table. So does MySql 5.5.8 has any such restriction?
2.) How many Schema’s can we prepare in MySql 5.5.8?
3.) Is there any limit for number of Columns a table can contain in MySql 5.5.8 ?
Regards,
Dhiraj
When will Percona Server 5.1 will stable and available?