Dealing with dumping and recovery of large and partially corrupted database I’ve got couple of feature ideas for mysqldump or similar tool and appropriate import tool

Dump in parallel single thread dump is not efficient of course especially on systems with multiple CPUs and disks. It is lesser issue in recovery case because import takes most of the time anyway but will be handy for some backup needs.

Dump each table in its own file This is much more convenient compared to single say 100GB sql file allowing to prioritize data load and if load fails for any reason you can easily restart it. Also it allows to prioritize data load if you’ve got to load many tables which have different priority for your application.

Safe Dump Dumping corrupted Innodb tables you will have some of the them crashing Innodb which breaks mysqldump process. It would be great for load script to record in the logs such table was not dumped successfully, wait for MySQL to complete restart and continue going with other tables.

Parallel restore This is absolutely required if time is the issue as serious systems may perform much better in such case.

As usually when need arises there is no time to implement solid solution which will work for a lot of people so we get by with few quick and dirty shell scripts to do this.

17 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Alexey Kovyrin

I think all of these ideas could be implemented as a wrapper for standard mysqldump:
1) Script reads tables list and spawns N mysqldump processes, when one process finishes its work, then script starts dumping next table… (I think it could be really nice parallel dump and file-per-table implementation)

2) Safe dump… harder to implement, but I think it is possible.

3) Parallel restore: The same engine, but now it would feed dump files to mysql processes.

Shlomo Priymak

Does anyone have such a script? It could be usefull as a starting point.
In my opinion mysql should have those features added though.

Brian Aker

Hi!

Dump in parallel – Good idea, but hard to implement currently unless you can handle the fact that you would have to lock the database.

Dump each table in its own file – We have that in tab format.

Parallel restore – 5.1’s import has this ability (and can be used with any version of MySQL).

Cheers,
-Brian

Kevin Burton

You can already dump in parallel… just use a WHERE clause and divide the PK space in 1/2….

Parallel restore is a bit harder….. You’d DEF see a boost if you did it on both a master AND the slave at the same.

The #1 wish *I* wanted is the ability to write slave replication values to a temp file…. we hacked a version of our mysqldump to do this and I need to release a patch.

Speaking of which…. we’ve implemented a mysqlslavesync command which takes a source MySQL server as the command line and syncs the local slave machine with its snapshot and puts the machine into production as a replicated slave.

All with one command line 🙂

It uses SSH agent to auth on the remote end…. I need to release that too 🙂

Kevin Burton

Hey …..

WRT mysqlslavesync…

I agree. Binary copy is nice.

We have two internal mechanims now. ‘dump’ and ‘hotcopy’ .. one uses mysqldump and the other uses mysqlhotcopy….

Adding LVM snapshot and/or ZFS snapshots would be nice too. I think the script will be easy enough to extend.

I’ll put it up on google code or code.tailrank.com tomorrow.

Kevin

Larry Damman

While we’re requesting features, I have one that I’d love to see implemented. It would be similar to –master-data=1 except it would get the Exec_Master_Log_Pos from the “SHOW SLAVE STATUS” command. This way you could (easily) create a sibling of the dumped db instead of a child. This would come in handy when you take your backups/snapshots off of a slave in order to keep the master running smoothly.

fr3nd

I wrote a quick and dirty wrapper in python that launches many threads with mysqldump. Each thread stores the result in memory and waits for the preceding one. That’s the reason why it could be so memory consuming with databases with large tables.

You can find it here: http://www.fr3nd.net/projects/mysqlpdump

It’s pretty raw at the moment, but it can be used as skeleton to build something more complex.

Any comments will be appreciated 🙂

Brian Aker

Hi!

I’ve been thinking about a –clone option for a bit now… allow you to get the slave options plus take the name of a machine to feed the data to. Bypass the need to pump data via:
mysqldump [args] | mysql –host –port

Which is something I do all of the time to make new slaves. If I get a spare minute I’ll look at your idea on parallel dump with locks. If I do it though, I would be interested in it via a tab dump.

Cheers,
-Brian

flupps

Brian, Larry:

There’s been a patch around for this since 2005, unfortunately never got pushed into the mysqldump…
http://bugs.mysql.com/file.php?id=959

fr3nd

Hey,

I’ve released a new version of mysqlpdump whith some important improvements:

* Store dumps to files directly instead to stdout
* Can compress files
* Dump each table in its own file
* Can pass parameters directly to mysqldump

http://www.fr3nd.net/projects/mysqlpdump

Next thing I want to implement is to store all the replication data to a separate .sql file.

Any suggestion?

Mark Callaghan

Chip released compact_innodb.py many months ago. Despite the _compact_innodb_, it should work with more storage engines. It does parallel load and restore by forking mysqldump and mysql processes. The project page for the code is http://code.google.com/p/google-mysql-tools.

Thomas

3. I wrote a simple perl script that does just that… I only tested with 16 threads so far (on a 16-proc server) and IO seek seemed to me a big problem (the restore was probably between 50 and 200 times slower! – I just aborted it). It’s not a production RAID (only for local C drive, “value” kind of hardware…) and I may do more tests on a real RAID.

My script is a bit crappy; it buffers tables in memory then send them in one block to child processes, and block all threads on db changes. I know how I can make it better by using queues and non-blocking I/O but I’m unlikely to to this anytime soon due to time constraints. It’s fairly simple to do with Event::Lib though.

Also I’m not sure if the specs for dumps are published… I rely solely on comments in the dump and on the fact that global parameters are only present at the start of it. Also I based my work on a specific database dump; If there’s a spec somewhere it could help.

almontinavelt

Does mysqlimport using user-threads speed up import for a single large table?

Or

Is it only beneficial while importing multiple tables [files]?

Thanks
Nav