As you may know, InnoDB has 2 limits for unflushed modified blocks in the buffer pool. The one is from physical size of the buffer pool. And the another one is oldness of the block which is from the capacity of transaction log files.
In the case of heavy updating workload, the modified ages of the many blocks are clustered. And to reduce the maximum of the modified ages InnoDB needs to flush many of the blocks in a short time, if these are not flushed at all. Then the flushing storm affect the performance seriously.
We suggested the “adaptive_checkpoint” option of constant flushing to avoid such a flushing storm. And finally, the newest InnoDB Plugin 1.0.4 has the new similar option “adaptive_flushing” as native.
Let’s check the adaptive flushing options at this post.
HOW THEY WORK
< adaptive_checkpoint=reflex (older method)>
oldest modified age %/max | behavior |
0 ~ 50% | nothing |
50% ~ 75% | constant flushing (weak) |
75% ~ 87.5% | constant flushing (strong) |
87.5% ~ | (flushing storm) |
< adaptive_checkpoint=estimate (newer method)>
oldest modified age %/max | behavior |
0 ~ 50% | nothing |
50% ~ 87.5% | estimate flushing as bellow * |
87.5% ~ | (flushing storm) |
* estimate blocks to flush based on…
- how many modified and unflushed blocks
- progress speed of the transaction log
- the modified age average of all blocks to flush
< adaptive_flushing (default “true” at 1.0.4)>
Its behavior is not based on the oldest modified age %.
* the how many blocks to flush based on…
- how many modified and unflushed blocks
- progress speed of the transaction log
- transaction log capacity
(and adjust along with another flushing)
And it doesn’t exceed the equivalent to “constant flushing (strong)”
of the “adaptive_checkpoint=reflex”
RESULTS
TPC-C like workload (100WH: 16 session full)
innodb_buffer_pool_size = 16G
innodb_max_dirty_pages_pct = 90
innodb_log_file_size = 512M
innodb_log_files_in_group = 2
innodb_io_capacity = 4000
innodb_read_io_threads = 8
innodb_write_io_threads = 8
innodb_flush_method = O_DIRECT
innodb_thread_concurrency = 0
innodb_ibuf_active_contract = 1
<none>
innodb_adaptive_flushing = false
innodb_adaptive_checkpoint = none
<flushing>
innodb_adaptive_flushing = true
<reflex>
innodb_adaptive_flushing = false
innodb_adaptive_checkpoint = reflex
<estimate>
innodb_adaptive_flushing = false
innodb_adaptive_checkpoint = estimate
case1: “innodb_doublewrite = false”
[0~1800 sec.]
[1200~1800 sec.]
averages (1200~1800 sec.)
none:Â Â Â Â 6868.92
flushing: 6655.92
reflex:Â Â Â Â 6481
estimate: 6575.88
case2: “innodb_doublewrite = true”
[0~1800 sec.]
[1200~1800 sec.]
averages (1200~1800 sec.)
none:Â Â Â Â 6569.48
flushing: 5090.12
reflex:Â Â Â 6871.9
estimate: 6609.9
CONSIDERINGS
The new “adaptive_flushing” seems to be tuned for “innodb_doublewrite = false” only.
(cause too much flushing for “innodb_doublewrite = true”)
“innodb_adaptive_checkpoint = reflex” and “adaptive_flushing” need tuning innodb_io_capacity properly.
(The result is based on proper value “innodb_io_capacity = 4000”)
“innodb_adaptive_checkpoint = estimate” is not depend on innodb_io_capacity, and it seems more “soft” result than the other methods.
So,
The “adaptive_flushing” seems good when we use “innodb_doublewrite = false”.
“innodb_adaptive_checkpoint = estimate” may be safe for “innodb_doublewrite = true” for now.
I should adjust “adaptive_flushing” for “innodb_doublewrite = true”. 🙂
IMPROVEMENTS (16 Sep.)
I have tried to improve the performance. But I couldn’t adjust performance of adaptive_flushing with doublewrite… So, instead of that, estimate is adjusted more.
case1: “innodb_doublewrite = false”
[0~1800 sec.]
case2: “innodb_doublewrite = true”
[0~1800 sec.]
The new “innodb_adaptive_checkpoint = estimate” seems to be more stable than before. The next question is, “Why is estimate + doublewrite better than the other no doublewrite settings?”… 🙂
ADDITIONAL TEST IN ANOTHER BALANCE
Also tested in another configuration to check the behaviors in another balance of IO bound.
<changes>
innodb_buffer_pool_size = 2GÂ Â (1/8)
innodb_log_file_size = 128MÂ Â Â (1/4)
case1: “innodb_doublewrite = false”
[0~1200 sec.]
case2: “innodb_doublewrite = true”
[0~1200 sec.]
“innodb_adaptive_checkpoint = estimate” seems stable but seems to flush much in its initial phase. The adaptive_flushing seems to be more soft in this cases.
Hmmm….
We should choose suitable adaptive_xxx for each workload for now… 🙂
Hi Yasufumi,
Thanks to share it! Very interesting analyze of the problem!
May you share more details about:
– was your workload data set kept in memory or not?
– write queries are only UPDATEs or also INSERTs and DELETEs ?
Also, did you try to run more concurrent users? What will change with 32 sessions?..
Thank you!
Rgds,
-Dimitri
Hi Dimitri,
“TPC-C like” means almost same SQL set from DBT-2 (maybe). And contains also INSERTs and DELETEs.
And WH100 database has around 10GB in files, so all of data will be in buffer pool in the end.
But the test is from just after startup InnoDB. So each pages are must be read once from disk actually.
(because using O_DIRECT)
Then, about more sessions, I will try later.
Thank you,
Yasufumi
Hi Yasufumi,
So, if your workload is also using INSERT & DELETE I’m curious if you’ve monitored your history list length and how high it’s growing or staying stable..
On the same time as your part with “adaptive” code is still reached within Master thread I may suppose that the load is not yet reached a critical level (that’s why I’m asking about test with more sessions)..
Also, if I’m not wrong – O_DIRECT option is only applied to the redo files, no?.. All others should be still buffered.. – or probably you mean a filesystem mount options?..
Thank you!
Rgds,
-Dimitri
Dimitri,
I know 16GB buffer pool has enough margin to test 10WH TPC-C at least 30 min in this cases…
“critical”? You means also at smaller buffer pool like 5GB or 1GB?
I also will do that.
And you are wrong about O_DIRECT of InnoDB….
http://dev.mysql.com/doc/refman/5.0/en/innodb-parameters.html#sysvar_innodb_flush_method
What you say is feature for PostgreSQL only….
Do you prefer PostgreSQL to MySQL??? 🙂
(PostgreSQL is one of the RDBMS which has the most strange characteristic of performance for me. I think you should not tune MySQL/InnoDB based on the PostgreSQL tuning knowledge…)
Thank you
Hi Yasufumi,
by “critical” I mean a higher load 🙂 it’s rather more user sessions or higher transaction rates – to see at which moment you’ll start to loop in purge loop 🙂 (all issues I’ve observed before with dbSTRESS recently were also reproduced with DBT-2, so I’m curious about your case ;-))..
And what about History Length?..
O_DIRECT – I’ve missed this point 🙂 (as I kept in mind MySQL is still calling fsync() – and you don’t need if when you’re in true Direct I/O)..
BTW, it may be not bad to split it in 2 options: one for redo and another for data files – the I/O nature is different on them, so it make sense..
(And I’m not here to discuss about PG :-)) but I respect it, it’s just different.. And for a long time before out-performed MySQL, so if we have some things to learn – we should not miss it :-))
Thank you!
Rgds,
-Dimitri
Hi Dimitri,
OK. I understand. 🙂
But this topics is about the flushing. I think the purging is next problem (though both of them are done by master thread currently…). I will test it soon (also your purge_thread effect). Please wait to discuss about it. 🙂
Regards,
Yasufumi
Hi Yasufumi,
on my point of view both topics are coming together 🙂 but well..
another interesting point on flushing: I’ve observed when I have a more wide data set randomizing (and data are reading all the time from disks (mainly random read I/O-bound)) – the more optimal flushing strategy in such situation become to flush by setting a dirty pages limit rather “adaptive” (specially when doublewrite buffer is used (http://dimitrik.free.fr/blog/archives/09-01-2009_09-30-2009.html#93))
Did you observe the same thing?..
Rgds,
-Dimitri
In all of your tests, is it a good assumption that you are not using innodb_file_per_table and that you are using xfs for the file system? Is it a multi-spindle local RAID or a SAN? I’d like to compare your hardware used for testing to hardware we have available so we can estimate what kind of performance we should see.
Dimitri,
Sorry, what is the point of the post?
The effect of the storage’s write buffer? (twice IO Write/s: Writes to double write buffer are buffered)
Todd,
I uses innodb_file_per_table=true and ext3.
And I think it is local RAID, but I don’t know exact about the server.
Because I can login from remote only…
Thank you.
Dimitri,
I have checked the history list length and your purge_thread.
The history list length never exceed 20000 during any TPC-C test at the server.
And purge_thread affect 3~4% improve of throughput and it seems to purge aggressively (not exceed 1000).
And about 32 threads, the server has 16 cores and the each thread of the benchmark is full-powered.
So, almost no difference with 16 threads of the equilibrium throughput.
Thank you.
Hi Yasufumi,
20000 in history length is really small.. Is your test kit available somewhere I can download and test it?..
purge_thread is not for improving performance but rather for performance stability (but it’s nice to see the improvement too)
Any chances you may try dbSTRESS workload too? – I’ll be very curious about your observations 😉
Thank you!
Rgds,
-Dimitri
Hi Dimitri,
Our test kit is
https://code.launchpad.net/~percona-dev/perconatools/tpcc-mysql
OK, I will try dbSTRESS if I will have chance.
Thank you!
Yasufumi,
I thought innodb_flush_method=O_DIRECT will override doublewrite? (I am confused with doublewrite and double buffering)
If that is true what is the implication of the test case with adaptive_flushing with innodb_doublewrite=true?
—
Ratheesh
I’m trying to find more info on innodb_ibuf_active_contract, because it’s not in the current 5.6 RC. I’ve found the 5.1 documentation about innodb_ibuf_active_contract, but then from 5.5.8’s changlog it was renamed to innodb_ibuf_active_merge which also doesn’t have any documentation.
Is there anywhere you could point to me for a full list of options for each server version? Something like MySQL’s Server System Variables page would be nice because it outlines all variables for the current version, which version they were introduced if they were new, and in which version they were deprecated if they’ve been obsoleted.
Hi Alfie,
List of variables introduced in Percona Server 5.5 and 5.6 can be seen here:
https://www.percona.com/doc/percona-server/5.5/ps-variables.html
https://www.percona.com/doc/percona-server/5.6/ps-variables.html