A week or so ago at the MySQL conference, I visited one of the backup vendors in the Expo Hall. I started to chat with them about their MySQL backup product. One of the representatives told me that their backup product uses FLUSH TABLES WITH READ LOCK, which he admitted takes a global lock on the whole database server. However, he proudly told me that it only takes a lock for “a couple of milliseconds.” This is a harmful misconception that many backup vendors seem to hold dear.
The truth is, this command can take a lock for an indeterminate amount of time. It might complete in milliseconds on a test system in a laboratory, but I have seen it take an extremely long time on production systems, measured in many minutes, or potentially even hours. And during this time, the server will get completely blocked (not just read-only!) To understand why, let’s look at what this command actually does. There are several important parts of processing involved in the command.
Requesting the lock
The FLUSH TABLES WITH READ LOCK command immediately requests the global read lock. As soon as this happens, even before the lock is granted to it, all other processes attempting to modify anything in the system are locked out. In theory, this might not seem so bad because after all, the command acquires only a read lock. Other commands that need only a read lock can coexist with this. However, in practice, most tables are both read and written. The first write query to each table will immediately block against the requested global read lock, and subsequent read queries will block against the write query’s requested table lock, so the real effect is that the table is exclusively locked, and all new requests into the system are blocked. Even read queries!
Waiting for the lock
Before the FLUSH TABLES WITH READ LOCK command can successfully acquire the lock, anything else that currently holds the lock must finish what it’s doing. That means that every currently running query, including SELECT queries, must finish. So if there is a long-running query on the system, or an open transaction or another process that holds a table lock, the FLUSH TABLES WITH READ LOCK command itself will block until the other queries finish and all locks are released. This can take a very long time. It is not uncommon for me to log on to a customer’s system and see a query that has been running for minutes or hours. If such a query were to begin running just before the FLUSH TABLES WITH READ LOCK command is issued, the results could be very bad.
Here’s one example of what the system can look like while this process is ongoing:
1 2 3 4 5 6 7 8 9 10 | mysql> show processlist; +----+------+-----------+------+------------+------+-------------------+----------------------------------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+------+-----------+------+------------+------+-------------------+----------------------------------------------------------------------+ | 4 | root | localhost | test | Query | 80 | Sending data | select count(*) from t t1 join t t2 join t t3 join t t4 where t1.b=0 | | 5 | root | localhost | test | Query | 62 | Flushing tables | flush tables with read lock | | 6 | root | localhost | test | Field List | 35 | Waiting for table | | | 7 | root | localhost | test | Query | 0 | NULL | show processlist | +----+------+-----------+------+------------+------+-------------------+----------------------------------------------------------------------+ 4 rows in set (0.00 sec) |
Notice that connection 6 can’t even log in because it was a MySQL command-line client that wasn’t started with -A, and it’s trying to get a list of tables and columns in the current database for tab-completion. Note also that “Flushing tables” is a misnomer — connection 5 is not flushing tables yet. It’s waiting to get the lock.
Flushing tables
After the FLUSH TABLES WITH READ LOCK command finally acquires the lock, it must begin flushing data. This does not apply to all storage engines. However, MyISAM does not attempt to flush its own data to the disk during normal processing. It relies on the operating system to flush the data blocks to disk when it decides to. As a result, a system that has a lot of MyISAM data might have a lot of dirty blocks in the operating system buffer cache. This can take a long time to flush. During that time, the entire system is still locked. After all the data is finished, the FLUSH TABLES WITH READ LOCK command completes and sends its response to the client that issued it.
Holding the lock
The final part of this command is the duration during which the lock is held. The lock is released with UNLOCK TABLES or a number of other commands. Most backup systems that use FLUSH TABLES WITH READ LOCK are performing a relatively short operation inside of the lock, such as initiating a filesystem snapshot. So in practice, this often ends up being the shortest portion of the operation.
Conclusion
A backup system that is designed for real production usage must not assume that FLUSH TABLES WITH READ LOCK will complete quickly. In some cases, it is unavoidable. This includes backing up a mixture of MyISAM and InnoDB data. But many installations do not mix their data this way, and should be able to configure a backup system to avoid this global lock. There is no reason to take a lock at all for backing up only InnoDB data. Completely lock-free backups are easy to take. Backup vendors should build this capability into their products.
I have a story about this, actually. One morning I woke up to find my website (twcenter.net) was down. I figured out that everything was waiting on MySQL, so I restarted it to see if that would fix it without any further downtime. It did, so I went on with my life — thankfully just a game fan site, no real money lost.
Then I received an e-mail, saying “Backup failed: MySQL has gone away” or something to that effect. Eventually I realized what had happened. The previous day I had run some statistical query or other on a little-used log table. The query took too long to complete, so I aborted with Ctrl-C. In fact, in that version of MySQL, this only caused the client to exit, and left the query running in the background for hours.
That night, the backup script had run FLUSH TABLES WITH READ LOCK. It immediately acquired locks on all tables — effectively write locks, as you say — except this one log table. It held those locks waiting for the lock on the log table, shutting the site down until I came around to fix it manually, because the statistics-gathering query I ran was taking many hours to run. Some convoluted thing with a subquery, if I remember correctly.
I’ve since moved to InnoDB for everything, and use mysqldump –single-transaction for backups. Hopefully more people will start using InnoDB now that Oracle is making it the default engine in 5.5!
It should be worthwhile to note that even for InnoDB-only systems, using FLUSH TABLES WITH READ LOCK may be a requirement: in the case you want to do incremental backups, hence must store the master position.
That is, if your’e doing “mysqldump –single-transaction –master-data” – you’re good. But a mylvmbackup would have to use the FLUSH TABLES… in order to store the master’s log and pos.
Shlomi,
If you’re using LVM for backups for Innodb only tables there are 2 other methods to find your binary log position, if you’re performing backup on master.
The last binary log file on the snapshot and its size will show the position. Also when Innodb does crash recovery it prints the binary log position.
This unfortunately does not work if you are backing up from the slave. You may hack around with master.info but I have not really looked into it.
Peter,
Thanks!
With regard to method #1: this is assuming sync_binlog=1, right? Otherwise innodb transaction log is not in sync with binary log?
And, actually, the same for method #2?
Shlomi
Shlomi,
Yes you’re right. Regarding position which Innodb prints on recovery – it is logical position so it will match backup, however it is possible at the time backup is taken the binary log is not yet flushed up to that point, though it is not really that important – it will be in a few seconds.
In general there is some of black magic here no question. If you can afford flush tables with readlock you may not want to bother.
How exactly do I get the master binlog position consistent with a transaction snapshot, in a way which does NOT involve FLUSH TABLES WITH READ LOCK?
My understanding is that mysqldump –single-transaction –master-data does a FLUSH TABLES WITH READ LOCK, only so that it can do a BEGIN TRANSACTION; SHOW MASTER STATUS then immediately UNLOCK TABLES.
This means we have to “stop the world” even if only for a short time, but the world may take quite a long time to stop, as noted.
Does MySQL have any plans to fix this, or is there a workaround I’m not aware of?
Okay. But xtrabackup takes a global read lock too, as I’ve been told. Why does it do it ?
Baron,
when you say that backup systems should not need any locks, you’re not taking into account DDL operations.
Backup needs to stop DDL at least, which is not transactional on most systems, so some sort of lock is still necessary, even if the data can be backed up fully online.
I agree, however, that using FLUSH TABLES WITH READ LOCK for backups is a misnomer. This entire command is a MySQL and MyISAM specific hack.
To help the matters a little bit in 5.5 we implement FLUSH TABLES WITH READ LOCK, so that you at least don’t have to wait for all tables for your backups, which in many cases only touch a part of database instance.
WordPress ate part of my formatting. In 5.5 we implement FLUSH TABLES table1, table2, table3 WITH READ LOCK.
Mark,
InnoDB does store master binlog position in its system tables. I believe Marc Callaghan at some point worked on a patch that implemented START TRANSACTION WITH CONSISTENT SNAPSHOT statement, that would print the binlog position as part of it. This should be sufficient for hot backups of InnoDB. But one still needs some sort of metadata lock to protect the system against DDL.
Sergei,
XtraBackup doesn’t do that. The wrapper script innobackupex, which is a patched copy of innobackup (which comes from InnoDB) does that. innobackupex has an option to disable this behavior, but innobackup doesn’t have such an option.
Baron,
this only means that your backup tool is not DDL-safe. A table may be dropped in MySQL half-way through the external tool backing it up.
Baron,
Vadim said that it does. Of course, it could be that he meant innobackupex, but didn’t say that.
Are you saying that XtraBackup takes the binlog position from the xtradb logs ?
How do you get the binlog offset when using Xtrabackup?
START TRANSACTION WITH CONSISTENT INNODB SNAPSHOT gets the binlog offset without blocking for mysqldump backup. See http://bugs.mysql.com/bug.php?id=48124
Actually instead of issuing FlUSH TABLES WITH READ LOCK on the first call; instead one should issue FLUSH TABLES and may be another FLUSH TABLES followed by the READ LOCK as the 3rd call; so that if there is any long running queries; then rest of them are not blocked…and everything can run as usual.. the only tradeoff is if there is new query in the middle of flushes.
FLUSH TABLES;
FLUSH TABLES;
FLUSH TABLES WITH READ LOCK;
Sergei,
I meant innobackupex, and this is what usually used to make backups.
xtrabackup binary by itself copies only .ibd and ibdata1, and does not execute FLUSH TABLES WITH READ LOCK, but I
do not consider it as regular way to perform backups.
Vadim,
I think there are couple of good and easy ideas in this thread 🙂
We can add a feature to Percona Server to return binary log position while starting transaction. It should be easy enough. I know Heikki was not very eager to do anything to improve backup in MySQL because this would canibalize innodb hot backup sales. Now it is not the problem any more.
Indeed this however would NOT be very safe regarding DDL but I believe most systems do not need DDL during backups so it is mainly what if it happens issue ? I belive same as MySQL has read_only option we can have something like backup mode which does not allow to create/alter/drop any non temporary tables.
Kostya,
There are 3 ways of backup (at least) – in case you’re using LVM DDLs are completely fine.
Peter,
In this post
http://www.mysqlperformanceblog.com/2008/06/09/estimating-undo-space-needed-for-lvm-snapshot/
you say “do not place [binary] logs on LVM”
which means you can’t use get binary logs without FLASH TABLES
There has been some talk of ways to start a transaction and get binary log position:
http://bugs.mysql.com/bug.php?id=48124
Peter,
We can port START TRANSACTION WITH CONSISTENT INNODB SNAPSHOT to Percona Server,
and I also wait for Facebook’s fix of broken group commit. That will be really great combination.
Vadim,
You can still get binary log position with LVM backup as Innodb will print it during crash recovery 🙂
Hey all, didn’t see any mention here of flush_time, which as I understand it has a major effect on the “…might have a lot of dirty blocks in the operating system buffer cache…” note. Speeding up the automatic flushes in the hours before the backup, then setting back to zero or whatever you normally use, can optimize this part of the FTWRL process. In fact on a Windows install the flush_time is, of necessity, so low that dirty blocks during backup are unlikely to be an issue (of course the periodic flushing on Win can make the server perform suboptimally in general, but that isn’t under discussion right now!).
I thought that the idea of innodb hotbackup was to copy the date and logs without communicating with the server at all. So, I don’t quite understand why you guys discuss different ways of getting the binlog position from the server instead of doing it the hotbackup way – just copy the files. After all, InnoDB records binlog position in the log, you can get it from there after you copied the files.
Setting flush_time on a server that is heavily used will destroy performance server-wide every flush_time seconds. It would be better to have a tool that periodically looks at SHOW OPEN TABLES and flushes tables one at a time. It would be even better to just tell the operating system to lower the fraction of dirty blocks in its cache. There are a lot of bugs with various operating systems and filesystems that make approaches such as running “sync” really bad. See Richard Hipp’s slideshows about “the great fsync() bug” for example.
Sergei,
Innodb Hot Backup and XtraBackup indeed does a hot copy without any communication with server – this works for Innodb part of the story. There is however also .frm, “mysql” database and binary log position. You can get binary log position from the innodb log file indeed however you still need to have a consistent copy of .frm files and mysql database to have solution which works in all cases. In case you know there is no DDL are being run and no changes to mysql database (such as new stored procedures, no new users etc) you do not have to take the lock. This looks dirty but it can work in practice for a lot of installations.
I wasn’t trying to debate the merits of flush_time, but like it or not, there are thousands of installs flushing every 30m. Of course, comparatively few of them are in prod and need to worry at all about FTWRL delays, but I thought this variance should be kept in mind.
FLUSH TABLES WITH READ LOCK; also tries to close open files that obviously takes time too. I did not double check it using lsof etc but the post http://www.geeksww.com/tutorials/database_management_systems/mysql/installation/mysql_tablecache_informationschema_and_open_files_limit.php says so.
However, the things I do not understand: Is there really a way to take online backups of MyISAM tables without issuing FLUSH TABLES WITH READ LOCK command and/or setting flush_time (performance killer)? As far as I know, xtrabackup does the same thing, please correct me if I am wrong.
Also, I wasn’t clear about data being flushed to disk 100% ie. leaving the OS buffers and getting flushed to disk after the FLUSH TABLES WITH READ LOCK command returns? If it still exists in OS buffer then backups might be different from the actual data, so what is the proper solution in that case?
Hello,
i know this topic is quite old but maybe someone can help me. I would like to know why flush
tables with read lock
decreases performance on queries started before the lock command. In my case queries take up 5-10 times as long with a waitingflush tables with read lock
.Thanks in advance
MySql sucks. There I said it.
I’m not an expert on MySQL, but I would like to know the problematic of skiping FLUSH TABLES WITH READ LOCK in the mylvmbackup configuration. One of our customers has a 48GB MySQL database and we were trying mylvmbackup with default configuration, but the backup takes 6 hours to finish and we found mysqld process down after 4 hours. The database has a lot of read/write queries at the same time as we are getting data from 8000 network devices (Zabbix monitoring enviroment) by SNMP. As I could see in the man page of mylvmbackup, it exists the –skip-flush-tables option that avoid locking tables and flushing the buffers. Does it exist the risk of obtaining an inconsistent backup skipping FLUSH TABLES WITH READ LOCK?