An argument for not using mysqldump

I have a 5G mysqldump which takes 30 minutes to restore from backup.Â That means that when the database reaches 50G, it should take 30×10=5 hours to restore.Â Right?Â Wrong.

Mysqldump recovery time is not linear.Â Bigger tables, or tables with more indexes will always take more time to restore.

If I restore from a raw backup (LVM snapshot, xtrabackup, innodb hot backup), it is very easy to model how much longer recovery time will take:

Backup is 80G
Copy is at 70MB/s.
10G is already complete.
= ((80-10) * 1024)/70/60 = ~17 minutes

Backup is 80G

Copy is at 70MB/s.

10G is already complete.

= ((80-10) * 1024)/70/60 = ~17 minutes

I can tell progress with mysqldump by monitoring the rate at which show global status like 'Handler_write'; increases and compare it to my knowledge of about how many rows are in each table.Â But progress != a magic number like “17 minutes”.Â Not unless I do a lot of complex modeling.

I am not saying a 5 hour recovery is good or bad.Â What I am saying is knowing remaining time is very important during disaster recovery.Â Being able to say “we’ll be back at 2PM” is much better than saying “we’ll be back between 1PM and 4PM.. maybe”.

27 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Gwen Shapira

13 years ago

Responsible DBAs test restore from their backups on a regular basis.
Say, every 2 month. During this check, it is useful to record how long restore took for exactly this purpose. When management asks – how much longer will this take? I can look in my charts and say “20 more minutes, captain”.

It is still good to be able to calculate the remaining time, but maybe not as critical.

Nikolay

13 years ago

I have huge mysql server with lots of myisam/maria databases, can someone elaborate how we can do non blocking backup, even if we back up some dirty (in the middle of transaction) data.

Gerry

13 years ago

I wouldn’t favor depending solely on mysqldump for backups unless you have some other HA considerations in place (ie: master-master). This way you can work on the restore while the service is still on line.

My $.02
G

Morgan Tocker

Author

13 years ago

@Gwen – when people test their backups it is not always on the same hardware. I (of course) advocate testing backups, but it’s not always easy:

– Comparisons will never truly be equal (hopefully bigger hardware is on the main DB, and any skew is of “the good kind”).

– When you find restore time is starting to take longer, seldom will a customer have the opportunity to just change their backup plan over night. Raw backups take more space, and have some reduced flexibility when just restoring smaller amounts of data (although there is a feature in XtraDB to just restore one file_per_table table at a time).

It is good to plan with this knowledge in advance.

Morgan Tocker

Author

13 years ago

@Nikolay – The best place to ask this question is in our forums – http://forum.percona.com/. Thanks!

Rolando Edwards

13 years ago

There are two alternatives, one using mysqldump and one not.

Option 1 (using mysqldump)

Write a customized backup script
1) Extract the names of all databases (except mysql and information_schema)
2) FLUSH TABLES WITH READ LOCK;
3) Record Log File and Position from SHOW MASTER STATUS;
4) Run all mysqldumps of each database in parallel as background processes

Apply this in reverse when restoring
Load all the mysqldumps back in parallel

Example : I have a client at my DB/Web hosting company with 650+ databases (450+ GB) in one mysql instance.
mysqldump databases one by one : 7 hours
mysqldump in parallel, 7 databases at a time : 3.5 hours

I have performed 3 full restorations in this client’s lifetime with LogicWorks

All 650+ DBs (450+ GB), 10.5 hours, loading 32 mysqldumps in parallel at a time

I am sure 5GB should present no challenge if you parallel dump all databases. If this involves one database, then parallel dump all individual tables (which I have also scripted for other DB Hosting clients)

Option 2 (not using mysqldump)

MAATKIT’s very own toolset has Parallel Dump and Parallel Restore sugin SELECT INTO and LOAD DATA INFILE under the hood.

http://www.maatkit.org/doc/mk-parallel-dump.html
http://www.maatkit.org/doc/mk-parallel-restore.html

Morgan Tocker

Author

13 years ago

@Rolando – The option one is really hard to get perfect. You need to make sure that you hold a lock after FLUSH TABLES WITH READ LOCK, so in command line scripting you need to add SLEEP (9999999..) after it. It sounds like you’ve got it to work – but I’m not sure it’s the best solution.

There are other issues with mysqldump – but I didn’t want to cover them in this article. mysqldump needs to churn all the data through the buffer pool (something often undesired), and the backup as you described it is not “hot” (non blocking). You can make it hot – but this synchronizes the backup to the start of running mysqldump. Xtrabackup synchronizes with the end of the backup, which is probably the better choice if you do not want to bloat undo space.

Patrick Casey

13 years ago

If you’re just running innodb, you can run mysqldump with the –singletransaction flag which will get you a consistent logical backup. It’ll take a long time on a larger database, but it will give you a snapshot in a non-blocking manner.

If you have a non transactional engine in your mix (myisam I’m looking at you), then there really is no way to get a “hot” consistent dump. You can issue flush tables with read lock, but to do this right you have to hold the lock until your dump completes. A read lock is just that, a lock, and that means no delta transactions as long as the backup is in flight.

If you are on some form of snapshot aware storage you an execute:

Flush tables with read lock
SNAP
Release lock

This will get you a consistent snap even on myisam. Problem is the FLUSH can take an unpredictable amount of time b/c it must wait for all in flight queries to complete. That means its impossible to guarantee a “no downtime backup” via this method. You’re locked up for flush time + snap time.

If you’re pure innodb, I generally trust its recovery engine so you can snap it w/o the read lock. This is a crash consistent copy and you’ll have to go through a recovery when you try to restore that backup. There are horror stories on the internet of this form of snap coming up corrupt, but I will say I’ve personally never had a problem with it.

Baron Schwartz

13 years ago

Very often there is no way at all to make any kind of dump & load work, period. It is not uncommon for me to see clients whose dumps would take days or weeks to load a single table. At such sizes, dumps are completely unusable. One of the main problems is building indexes.

Lucas

13 years ago

I relly like the mysql dump, but they have any compability problems whith another version`s of mysql, maybe its better now.

Mark Wilden

13 years ago

Don’t update indexes for each record added. Drop them before the load, then recreate them after.

Morgan Tocker

Author

13 years ago

@Mark – what you are suggesting is only a valid optimization in Percona Server/InnoDB plugin releases (which have the fast index create feature). For regular releases, it will actually be worse to add the indexes after full import.

There are still some limitations with the fast index create feature. For example, BUG #33650, and that it will never work with primary keys.

Mark Wilden

13 years ago

Actually, I was just making a general comment – not necessarily referring specifically to Percona plugins.

However, I don’t see how it could be slower to create the index all at once than to do it one record at a time.

Morgan Tocker

13 years ago

@Mark: It’s in the manual:

“In MySQL versions up to 5.0, adding or dropping an index on a table with existing data can be very slow if the table has many rows. The CREATE INDEX and DROP INDEX commands work by creating a new, empty table defined with the requested set of indexes. It then copies the existing rows to the new table one-by-one, updating the indexes as it goes.”

http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-create-index-overview.html

In the text it says up to 5.0, but it’s including 5.1 if you don’t use the plugin.

Mark Wilden

13 years ago

Wow, I had no idea – that’s pretty incredible. I guess it’s for the benefit of the clustered index, but still.

Thanks for the information>

Blagomir Ivanov

13 years ago

By my opinion there are =rare cases that you have to restore the entire database. I have made an entire restore few times in my life. In most cases there is one one or few broken tables. The best solution for me was to write a script that takes all databases and all tables names in those databases and archive them (with mysqldump) in separate files.
At the end i have:
db.table1
db.table2
…..
db.tableX

Dont know its the best, but it`s working for me 🙂

Morgan Tocker

Author

13 years ago

@Blagomir – I agree with you. Real “disasters” happen far less frequent than accidental deletes.

You can still restore a table at a time with xtrabackup/innodb hot backup. You may want to check out this discussion on Shlomi’s blog (and my reply):

http://code.openark.org/blog/mysql/an-argument-for-using-mysqldump

Patrick Casey

13 years ago

Blagomir,

We use a similar technique for a subset of our logical backups and it works well.
One challenge folks should be aware of though is that if you do no explicitely dump all tables without one transaction, each table will dump out at a different point in time so they won’t necessarily be consistent between tables.

Example, if you have:
Food
Meals (contains a foreign key to food)

And you dump food at 12:51 PM and Meals at 13:22 PM, then its possible there will be a reference in Meals to a piece of food which didn’t exist at 12:51 PM.

Depending on data model this may or may not be an issue for any given app, but its an important constraint to be aware of if you’re considering going down this road.

Andy

13 years ago

Use LVM for your filesystems and place MySQL on an LVM partition.

You can then lock all tables and snapshot the partition – only a few seconds of lock time.
Then copy the snapshot to another partition or device.

You now have a full copy of the raw data – so restore is only the time taken to copy it into place – no restoring of dumps etc.

Here is a really nice tool to do it all – http://www.lenzg.net/mylvmbackup/

Baron Schwartz

Reply to Andy

13 years ago

Agree, except for “only a few seconds.” http://www.mysqlperformanceblog.com/2010/04/24/how-fast-is-flush-tables-with-read-lock/

There are lots of articles on this blog about LVM backups.

Patrick Casey

13 years ago

Andy, other challenge I’ve run into with LVM snapshots is that they really load up the disk subsystem.
While I have a snap mounted I’m doubling my write iops.
To use the snap as a backup, I have to copy it off main system and onto a backup device, which means I have to read hundreds of GB of data, which, again, loads up my io subsystem at the same time I’m taking a double-write penalty.

We’ve found in our environment that SAN snaps work great (very low cost), but that LVM snaps are too intrusive on the production servers for regular use, so we typically snap the slaves nightly and the masters weekly since doing an LVM snap on a busy database will grind it down too much to tolerate.

Andy

13 years ago

yes, LVM does have the issue of extra writes… so doing snapshots off the slave helps.

Marc

13 years ago

What I’m really missing here is the “this is the way to do it” with a full example. I’m no sql management expert but find myself having to set one up regardless.
I’m looking for the “works in 99% cases” solution but it almost seems like it’s more of a “there’s no good solution”. Which is kind of weird for such a popular database system.

Ilan Hazan

13 years ago

Another argument for not using mysqldump for recovery is that when restoring a dump table into the MySQL master server, it can lead to serious replication delay.
To overcome the replication delay, caused by restoring the dump table on the master, there is a need to widespread the massive inserts. This can be done by the MySQL SLEEP command.
See As restoring a dump table into the MySQL master â€“ you better get some sleep

arun

11 years ago

Please try the simple technique explained in this article http://www.geeksww.com/tutorials/database_management_systems/mysql/tips_and_tricks/fast_parallel_restore_from_sql_dumps_mysqldump_for_mysql.php

Mark Wilden

11 years ago

@Morgan Tocker I do see in http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-create-index-overview.html that in 5.1, creating a *secondary* index is faster with ADD INDEX than inserting one-by-one.

Simon J Mudd

10 years ago

@Andy, I have been doing LVM snapshots for cloning and backups. Indeed it works well and is pretty easy to setup. However, I have seen considerable performance problems on CentOS 6 with xfs which were not apparent in CentOS 5, especially with filesystems of a few hundred GB or more. This can make using this method rather more cumbersome that you would expect.

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

An argument for not using mysqldump

Related

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: Not-So-Good Practices

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

Valkey/Redis: Configuration Best Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

An argument for not using mysqldump

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: Not-So-Good Practices

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

Valkey/Redis: Configuration Best Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation