I recently had the chance to witness the effects of innodb_adaptive_flushing on the performance of InnoDB Plugin 1.0.5 in the wild, which Yasufumi wrote about previously here and here.

The server in question was Solaris 10 with 8 disk RAID10 and 2 32GB SSDs used for ZIL and L2ARC, 72G RAM and 40G buffer pool. We started it up with innodb_adaptive_flushing=OFF and innodb_doublewrite=OFF, then ramped up traffic and everything looked stable … but I noticed one troubling thing: ~2GB of uncheckpointed data.

We enabled innodb_adaptive_flushing just before 10am, which resulted in a few  changes  which were recorded by the Cacti MySQL Templates.  The most important change for the client here is the recovery time — by enabling adaptive flushing, InnoDB purged modified data much more aggressively, resulting in very little unflushed changes, which translates to much faster crash recovery.

db01-checkpt-age-1d

There was also a drop in history list length, which may be because of more aggressive flushing. In general this is good for performance because InnodB does not need to consider as many old versions of rows.

db01-innodb-trx-1d

Here we can see the “spiky flushing” before 10am (when adaptive flushing was off), which can be very bad for performance because during those short periods of very high page write activity, the system and/or innodb can become very slow (Yasufumi’s benchmarks show this very well). After enabling adaptive flushing we see a consistent and higher-than-previous-average rate of pages written to disk, which I find rather interesting

db01-buffer-pool-act-1d

Of course, there is also the question of system performance impact… More aggressive flushing means more disk IO and possibly  other impact on the system, which could impact query response time, and so on. And what about CPU, load average, and so on? Actually, those did not change enough to see a difference in the graphs so I am not including them; after all, we are not pushing this system near its limitations so I am not really surprised. We can see a change in disk IO which mirrors the innodb pages written graph  (shown above). I suspect ZFS is heavily buffering writes during the”spiky flushing” period because the delta between peak-write and low-write is much lower in terms of bytes written than pages written.

It is also worth noting there that this system is using 2 SSD’s for the ZIL and L2ARC, and as everyone should know by now, SSD’s lifetime is determined by the number of writes they can perform before failure. This means that more writes per second == shorter lifetime, and so a shorter SSD lifetime is a negative side effect of enabling adaptive flushing on this server.

db01-zfs-iostat-1d

One more point worth mentioning – the normal approach to control InnoDB’s checkpoint age is by adjusting the log file size. This server is configured with 2 * 1.5G log files, and innodb_io_capacity=1000. While adaptive flushing was off, the checkpoint age was comparable to the log file size, however the behavior with flushing enabled (keeping the checkpoint age very small) seems too aggressive. It’s possible we would see different results by adjusting innodb_io_capacity here.

Is there a lesson in this? Sometimes tuning InnoDB settings can have quite unexpected results…. Setting innodb_adaptive_flushing ON seems better in this case because there is IO bandwidth to spare and we are concerned about crash recovery time, but as Yasufumi’s previous posts show, it can sometimes also have bad results, and one may need to try and try again, before finding the “sweet spot” for your server / workload combination. As always, you should benchmark your particular systems and not blindly follow advice you read on the internet, and have good monitoring and trending tools in place from the start.

5 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
MLBR

Great post!

Patrick Casey

That’s really interesting stuff; am I reading it correctly though that adaptive flush actually increased the IOOPS? Looking at the IO graphs, it looks like both the peak IO rate and average IO rate were higher with adaptive flush nabled (assuming I’m reading properly).

Seems to imply that if recovery time isn’t a major factor, you’re better off (for this workload at least) running w/o that option enabled?

Yasufumi Kinoshita

How about the result of XtraDB for the workload with settings

innodb_adaptive_flushing=OFF
innodb_adaptive_checkpoint=estimate
innodb_use_purge_thread=1

?

“estimate” is based on the different strategy, and it is not so much eagerly to flush…

aaa456

How about the result of XtraDB for the workload with settings
”
innodb_adaptive_flushing=OFF
innodb_adaptive_checkpoint=estimate
innodb_use_purge_thread=1
”
?

“estimate” is based on the different strategy, and it is not so much eagerly to flush…