July 24, 2014

Why delayed flushing can result in less work

I can think of at least two major reasons why systems delay flushing changes to durable storage:

1. So they can do the work when it’s more convenient.
2. So they can do less work in total.

Let’s look at how the second property can be true.

A commenter on Deva’s recent post on InnoDB adaptive flushing asked,

That’s really interesting stuff; am I reading it correctly though that adaptive flush actually increased the IOOPS? Looking at the IO graphs, it looks like both the peak IO rate and average IO rate were higher with adaptive flush nabled (assuming I’m reading properly).

Yes. Adaptive flushing actually increased the overall number of I/O operations performed. Smoothing out the workload can cause more work to be done. To see why, remember that InnoDB works in 16kb pages at a time. Suppose someone makes a change to a row. If InnoDB is flushing constantly, it flushes that entire 16kb page. Just afterwards, another row on the same page gets changed, and another page flush results. If the first flush had been delayed, the two flushes could have been done as one flush. This is called write combining. In some workloads, the same rows could be updated many, many times — so delaying and permitting write combining could be an enormous reduction in the of number of I/O operations.

Now on to the next question from that comment:

Seems to imply that if recovery time isn’t a major factor, you’re better off (for this workload at least) running w/o that option enabled?

Maybe. It depends on whether the flushes are consuming a resource that something else needs. If nothing else needs the disk, what the heck — go crazy, write as much as you want. If the data fits in the buffer pool, no reads are happening, so the disk is only used for writes. If everything is lined up at the operating system and RAID controller layers, then a read isn’t required for a write to take place, either. Remember too that these writes are background operations, so they aren’t blocking anything in the foreground, unless there is a side effect such as mutex contention inside MySQL or InnoDB. So whether this matters for you is workload-dependent.

About Baron Schwartz

Baron is the lead author of High Performance MySQL.
He is a former Percona employee.

Comments

  1. Michael Peters says:

    It might also matter in the case where you pay for your IOOPs. Which would be true with something like an attached Amazon EBS storage.

  2. peter says:

    Baron,

    I think there is actually a “bug” in the adaptive flushing in Innodb – it just does it too eagerly.

    I think if I specify log files of 2GB (with no other way to specify how long I’d like to delay my writes) Innodb should try to minimize amount of flushes it does using the provided 2GB of the log space. It is not happening here – instead it flushes a lot – as we can see it does probably 3x more flushes than it could (disabled adaptive flushes does not cause any blips in this case)

    There is indeed a good point about Pay Per IO storage as well as SSDs (which also “cost” you per IO as more writes you do the more quickly it will wear out) may benefit from modified flush policy which does not try to utilize idle bandwidth as much but rather use “write combining” to the maximum.

  3. Very possible. However, my Big Point here is that delayed flushing can be less work. If you take any old version of MySQL, say 5.0.45 with normal InnoDB, and set innodb_max_dirty_pages_pct down lower, you see the same thing: a lot more flushing happens. (Maybe that’s buggy too… but even if it were perfect, you’d still see that happening.)

  4. peter says:

    Baron,

    Right. Larger delay should cause less work to do. In fact you can come up with cases when delaying writes 10 times will cause 10 less amount of writes being done.

    With innodb_max_dirty_pages_pct the story is different – it is working as expected. If you allow system to keep fewer pages dirty this means you can’t delay writes for as much (some of dirty pages will be flushed because of LRU instead of delayed more). With results for adaptive_flushing however I see very unexpected behavior.

  5. My understanding of the old behavior before adaptive_flushing was that it wouldn’t flush fast enough, then would flush a ton to catch up. Which is basically delaying writes, and depending on which pages were getting delayed, would reduce overall writing. I am not sure I understand things correctly now.

Speak Your Mind

*