July 28, 2014

XtraDB feature: save / restore buffer pool

We recently released XtraDB-9, and while we did not highlight it in announcement, the release-making feature is ability to save and restore InnoDB buffer pool.
The idea is not new and was originally developed by Jeremy Cole (sorry, I do not have the link on hands) some time ago, and now we implemented it in XtraDB.

Why would we need to save and restore content of buffer pool ?
There are several reasons.
First, it’s not rate on modern servers to have 32GB+ of RAM, with allocated InnoDB buffer_pool 26GB or more. When you do restart of server, it may take long time to populate cache with useful data before you can bring it back to serve production load. It’s not rare to see
maintenance cycle takes two or more hours, mainly because the slave need to catchup with master and to warm cache.
In case with the server crash, it is even worse, you need to wait possible long time on InnoDB
recovery (we have the patch for that too, in that post you can see InnoDB recovery took 1h to accomplish) and after that warm caches.

Second, it is useful for some HA schemas, like DRBD, when, in case of failover, you need to start passive instance on cold.

So let’s see what results we have.
Details about patch you can get there http://www.percona.com/docs/wiki/percona-xtradb:patch:innodb_lru_dump_restore (Yasufumi names it LRU dump/restore, because he thinks about buffer pool as about LRU list, which how it is internally).

To save buffer pool you execute

and to restore

it will create/read file

from your database directory.

You may want to sort ib_lru_dump in order of pages in tablespaces, so RESTORE will be
performed in most sequential way. The small python script

to sort

is available
in our Launchpad branch

I made small tpcc benchmark to show effect with restored buffer_pool (the condition of
benchmarks are the same as in my runs on fast storages, and I
used RAID10 to store InnoDB files).
First run (xtradb cold) I made just after restart and ran it for 1h.
After that I saved buffer_pool, restarted mysqld, restored buffer_pool ( it took about 4 min
to load 26GB worth of data), and run (xtradb warm) tpcc again.

Here is graphical results (results in New Transactions Per 10 sec, more is better):

tpcc_1000w

As you see in the cold run it took 1500-1800 sec to enter into stable mode, while
it warm run it happened almost from start. There was some period of unstable results, but it
did not affect ability to serve load.

You are welcome to test it, it is available in XtraDB-9 release and also in MariaDB 5.1.41-RC.

About Vadim Tkachenko

Vadim leads Percona's development group, which produces Percona Clould Tools, the Percona Server, Percona XraDB Cluster and Percona XtraBackup. He is an expert in solid-state storage, and has helped many hardware and software providers succeed in the MySQL market.

Comments

  1. How long does it take to warm the cache using conventional means, such as full table scans/full index scans on this hardware? And how does the TPC compare after doing that for 4 minutes?

    Any idea what caused the big dip in results at 1800 for the cold mode?

    Are there any locks or other side effects when taking the snapshot (besides 26G of disk writing in non-synchronous manner)?

  2. Vadim says:

    Harrison,

    I do not know how long it takes to warm cache using full table / index scan.
    The problem with that there is 90GB of data and you need to figure out what tables / indexes you
    want to preload into 26GB buffer_pool. I can try to preload biggest table, but I have doubts how it could be useful. I did not do that exercise, but I suspect that could be almost impossible in the general case.

    Dip at 1800sec in cold case should be caused by starting intensive flushing, caused either by coming checkpoint, and/or by needs to free some buffer pool pages to read new pages.

    There are no any locks, we just scan LRU list.

  3. peter says:

    Vadim,

    This is very cool. Some questions and suggestions though

    1) I think Python script to sort the file is ugly. It is much better if it is done on its own
    2) It would be great to have the option to store the LRU dump on the shutdown and load it on startup (probably in some background thread)
    3) Looking at your graph there are couple of questions – why graph with warmup gets peak when goes down 50% and when picks up again. Should not warmup provide uniform speedup over time ? Also why do we have so uneven performance looking at 10 second samples – did you use adaptive checkpoint in this case ? Could it be something else which makes things so uneven ?

  4. Vadim says:

    Peter,

    Agree on 1) and 2), there is room for improvements.

    on 3)
    I guess warmup goes down by the same reason – started flushing activity, however worth to check it.
    We use adaptive checkpoint, I have no exact answer right now why there so uneven performance for 10 sec samples. Yasufumi suspects new mid-point insertion algorithm, but we need to research it.

  5. Mark R says:

    I’d like to see this made fully automatic – so it would automatically dump the LRU list periodically after a certain amount of uptime – say 24h and also at server shutdown if it had been running for long enough, and automatically load it on restart.

    These could be tunables of course.

    This would mean that most users could just forget about it and have cache warmup goodness happen by itself.

  6. Kim says:

    When you dump the bufferpool, what data is dumped? Is it just pointers to what rows etc that needs to be loaded or is it the actual data in the pool that is dumped.. .

    Im thinking if you restore an old pool will you risk getting invalid cached data or is the buffer pool re-validated against what is stored in the database when loaded.

  7. Vadim says:

    Mark,

    Agree, your suggestions are taken.

  8. Vadim says:

    Kim,

    We store only pointers on data pages (space_id, page_id).
    There is no such problem as stalled data in this case.

  9. Vadim, how does this interact with recovery? Does it work OK if you save the buffer pool contents, then crash the server, restart it, and restore the buffer pool? If not, then that might be a problem for the DRBD use case.

    I think it should work fine, but maybe I’m wrong.

  10. Vadim says:

    Baron,

    There is no reason why it will not work.

    I as said we store just pointers to pages (space_id, page_id),
    and at restore stage we just read pages by pointers.
    It does not matter if InnoDB crashed before, did recovery procedure, etc.

  11. Tobias Petry says:

    Vadim, this means we can make a BufferPool-Dump, run the server 2 hours, crash it and after restarting (and innodb’s recovery) and dumping the BufferPool back to the RAM we have an warm InnoDB instance with no stale data? And no other problems?

  12. Vadim says:

    Tobias,

    Basically yes.
    You need though to sort dump of buffer_pool, so it will be loaded sequentially.
    Also dumping back to RAM may take some time (4 mins in my experiment), but it faster then work with cold cache.

  13. Patrick Mulvany says:

    Vadim, How is this patch effected by increases or decreases in InnoDB buffer pool size parameter?
    I would assume that changing this would not be a good idea.

  14. Vojtech says:

    Vadim, I just recognized, I cannot kill the XTRA_LRU_RESTORE query. I think it should check for ‘killed status’ at least each few seconds. Am I right?

  15. Vadim says:
  16. serbaut says:

    Patch against 5.1.47 for sorting before restore: http://gist.github.com/570107

  17. Vadim says:

    serbaut,

    Great, thanks!
    Can we use it under BSD license ?

  18. serbaut says:

    @Vadim: yes you can.

  19. Will says:

    Typo:

    First, it’s not rate on modern servers to have 32GB+ of RAM
    ———————^ <should be rare, not rate.

Speak Your Mind

*