Many backup tools including Percona Xtrabackup, MyLVMBackup, and others use FLUSH TABLES WITH READ LOCK to temporary make MySQL read-only. In many cases, the period for which server has to be made read-only is very short, just a few seconds, yet the impact of FLUSH TABLES WITH READ LOCK can be quite large because of the time it may take to complete this statement. Let’s look at what the problem is.
As of MySQL 5.5, FLUSH TABLES WITH READ LOCK does not work as optimally as you could think it works. Even though with general lock compatibility guidelines Read Lock should not conflict with another Read Lock, it does for this statement, and as such it has to wait for any SELECT statement to complete in order to complete locking tables. This means if you have a workload which includes some very long SELECT queries you can be potentially waiting for hours for this statement to complete. Here is example of how it can look:
1 2 3 4 5 6 7 8 9 | mysql> show processlist; +-------+------+-----------+----------+---------+------+-------------------------+-----------------------------+-----------+---------------+-----------+ | Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | Rows_read | +-------+------+-----------+----------+---------+------+-------------------------+-----------------------------+-----------+---------------+-----------+ | 10219 | root | localhost | dumptest | Query | 324 | Sending data | select count(*) from A,B | 0 | 0 | 2359297 | | 10290 | root | localhost | NULL | Query | 317 | Waiting for table flush | flush tables with read lock | 0 | 0 | 1 | | 10291 | root | localhost | dumptest | Query | 0 | NULL | show processlist | 0 | 0 | 2 | +-------+------+-----------+----------+---------+------+-------------------------+-----------------------------+-----------+---------------+-----------+ 3 rows in set (0.00 sec) |
As you can see FLUSH TABLES WITH READ LOCK is waiting for that very nasty “full join” select to complete. What is worse as the statement started execution all writes will be blocked to the server, which in the end causes “field list” operation to be blocked too, which among other tools used by mysql client, so you might get a feel you can’t connect to the database at all (use mysql -A in this case, to prevent it from getting field list from all tables)
Reads though would not be blocked, ie there is no MYISAM like table lock priority problem with pending WRITE query blocks any READ queries to execute on the table.
1 2 3 4 5 6 7 8 9 10 11 | mysql> show processlist; +-------+------+-----------+----------+------------+------+------------------------------+------------------------------+-----------+---------------+-----------+ | Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | Rows_read | +-------+------+-----------+----------+------------+------+------------------------------+------------------------------+-----------+---------------+-----------+ | 10219 | root | localhost | dumptest | Query | 688 | Sending data | select count(*) from A,B | 0 | 0 | 2359297 | | 10290 | root | localhost | NULL | Query | 681 | Waiting for table flush | flush tables with read lock | 0 | 0 | 1 | | 10291 | root | localhost | dumptest | Query | 74 | Waiting for global read lock | insert into C values ("a",1) | 0 | 0 | 1 | | 10304 | root | localhost | dumptest | Field List | 15 | Waiting for table flush | | 0 | 0 | 4 | | 10305 | root | localhost | dumptest | Query | 0 | NULL | show processlist | 0 | 0 | 1 | +-------+------+-----------+----------+------------+------+------------------------------+------------------------------+-----------+---------------+-----------+ 5 rows in set (0.00 sec) |
As a result, this means single run away select can effectively cause downtime if you use backup solution which does FLUSH TABLES WITH READ LOCK which is very unfortunate.
I’d love to see this problem fixed. There is no reason (other than code design) for read lock to wait for other read lock in order to be set.
As a workaround, you can consider having scripts which would check for a situation like above and either kill FLUSH TABLES WITH READ LOCK and fail backup or kill long-running SELECT queries to let backup to proceed but resolving server gridlock one way or another.
If you’re just using Innodb tables and you’re not actively changing users, stored procedures, etc (which are stored in MyISAM tables anyway) you can consider using –no-lock option with Percona Xtrabackup. Many other tools have a similar option. This option can be especially helpful when backing up data from the slave with –safe-slave-backup option, as preventing writes to the slave can be done by simply pausing replication.
> Even though with general lock compatibility guidelines Read Lock should not conflict with another Read Lock, it does for this statement, and as such it has to wait for any SELECT statement to complete in order to complete locking tables.
The query state says “Waiting for table flush” which indicates that FTWRL hasn’t closed a open table yet, hence it must wait until all other threads have closed the table in question. So, this is not really a question of lock conflicts.
I think there are plans to make this better for some use cases in MariaDB. The Facebook patch has an InnoDB-only solution that avoids blocking. I think the problems for FTWRL come from having to support MyISAM and I hope that the distributions (Percona, MariaDB, MySQL) also provide a solution.
See “START TRANSACTION WITH CONSISTENT INNODB SNAPSHOT” in
https://www.facebook.com/note.php?note_id=390420710932
Davi,
Yes. However for Innodb tables FLUSH TABLE is nil – so what we care about is just about Locking the table to prevent writes to it.
I think you’re correct in theory as FLUSH TABLES WITH READ LOCK suppose to flush tables not just make them read only. Most tools however do not need FLUSH PART (for Innodb tables) and might be we just need different command which locks table read only without flushing.
Even killing FLUSH TABLES WITH READ LOCK won’t resolve the deadlock as of MySQL 5.5 – other queries will still be stuck in ‘Waiting for table flush’ even after the FLUSH TABLES WITH READ LOCK goes away. 🙁
http://bugs.mysql.com/bug.php?id=44884
Have you considered filing a bug report ?
Peter,
Why not use similar patch as in Google Patches II, where InnoDB file writes can be altogether disabled using a global variable setting?
For InnoDB based databases, this allows for immediate locking of writes, safe BINLOG position snapshot, no waiting on read-queries, and no actual flushing of data (I intentionally disregard MyISAM tables here).
Any caveats I’m not aware of? Would such a feature introduce a global mutex on which all writes would need to serialize or anything?
Shlomi,
Yes setting server read only would be solution. Though it is still too bad setting server read only…. When we speak about Innodb tables and backup with Xtrabackup or LVM we do not really need it. What we need is MyISAM tables flushed and read only so they can be copied and Innodb tables can be written to but we do not want any DDL to take place. But indeed this might go beyond flush tables with readlock fixes.
SoboLAN,
This is known and documented behavior which I wanted to point out. So I would not quite call it a bug.
Andrew,
Yeah… There are many deadlock bugs in related to FLUSH TABLES WITH READ LOCK in various circumstances. While testing it I got some weird cases of some connections dying off while other need to be killed separately. The code does not seems as robust as it could be.
The behavior might be briefly documented in the manual. I don’t think the potential problems from FTWRL have been documented in the manual.
There is a second problem, which is worse, I believe. That is: Even with FLUSH TABLES WITH READ LOCK you are getting in some cases a snapshot that is not recoverable. Apparently, FTWRL locks frontend threads, but some backend InnoDB can still go on and play with tables?
That worries me at several levels: For one, how is, in the case of mylvmbackup, the LVM snapshot implemented? Is it not atomic in itself, so that InnoDB file writes can happen while the snapshot it being made?
If LVM snapshots are atomic, yet a FTWRL InnoDB snapshot can be inconsistent and unrecoverable sometimes, how is InnoDB ACID?
Kristian,
What do you mean by “snapshot what is recoverable”. Even without Flush tables with read lock, LVM snapshot should be recoverable 100% of the time. You might have found recovery bug but there have been very few of them. Typically problems with LVM snapshots can originate from either not having log files and data on the same partition or synchronization of binary log and innodb snapshot. In certain versions FLUSH TABLES WITH READ LOCK would not block “commit” operation which caused wrong binary log positions but it is fixed now at least in 5.5
Hi, Peter
These days, i have discussed the hot backup problems with my friends, for percona xtrabackup, i think there are three problems which forced us using FLUSH TABLES WITH READ LOCK.
1. copy meta data files, like frm file.
2. copy data files other than InnoDB.
3. get the binlog file and binlog postion information.
And if we ensure the following requirements, we can do a hot backup without using FLUSH TABLES WITH READ LOCK:
1. do not do any ddl operations in the progress of hot backup, frm files are consistent.
2. we only use InnoDB as our engine, no other engines used.
3. get binlog file and binlog postion information from xtrabackup_binlog_pos_innodb file since InnoDB will wirte these information in its commit log.
Am i clear and right? If i’m wrong, please tell my why? Thanks!
Peter, by ‘Even with FLUSH TABLES WITH READ LOCK you are getting in some cases a snapshot that is not recoverable’ I mean that i work in an environment with a large three digit number of servers, and in which we clone new slaves from existing slaves using mylvmbackup or a method that is derived from that.
Essentially, we are using FLUSH TABLES WITH READ LOCK, and then run ‘lvcreate -s’ to create a snapshot, mount this, and then rsync it across the network to a target location, a virgin box. In the target location, we bring up the new mysqld, which finds a crashed innodb, recovers this, and then it is being configured to become a new slave to the same master as the donor machine. Every once in a while the recovery of the innodb in the target location fails, though, even though it has been made even under FLUSH TABLES WITH READ LOCK (I believe it should work even without that).
We are using CentOS 5.x, LVM2, XFS and Oracle MySQL 5.5.x (x=16 on many boxes atm), but we have been experiencing this with many versions of MySQL. We are now using ‘service mysql stop’ instead of FLUSH TABLES WITH READ LOCK to get reliable snapshots from a donor box.
Kristian – this is worth fixing if that is a bug in InnoDB. Can you file one at bugs.mysql.com with details from the mysqld error log? AFAIK, InnoDB is snapshot safe as long as database files and log files are on the same filesystem. But MySQL/InnoDB is not snapshot safe if DDL operations are in flight because DDL between MySQL (frm files) and InnoDB is not atomic and some failures are not recoverable (which can cause problems for recovery after a snapshot).
Alas, I don’t use LVM2 any more. I do use xtrabackup. It isn’t perfect but it continues to get better.
Kristian,
I agree with Mark. It would be great to find the bug if it is the bug or user error. I think the innodb data from the backup where Innodb fails to recover should be passed to your support provider for investigation. It might be related to Innodb itself or how
XFS plays with LVM or something.
As mark mentions – indeed there are known problems which are not handled if you have DDL however FLUSH TABLES WITH READ LOCK should prevent these.
Yes. The problem is that it is not easily being reproduced.
I grabbed two slaves right now, and am constantly recloning one from the other. I will leave my script running inside a screen for a few weeks, and see what comes out of that.
I have a script that is constantly recloning a database now, in a loop, until that produces an InnoDB clone that is unrecoverable.
I have a followup question, though. The target InnoDB, when it recovers, says:
…
120327 11:25:22 InnoDB: Starting an apply batch of log records to the database…
…
InnoDB: Apply batch completed
InnoDB: Last MySQL binlog file position 0 15906301, file name ../log/binlog.000290
…
This is a slave, it has no binlog of its own. SHOW SLAVE STATUS says
Master_Log_File: binlog.000062
Relay_Log_File: relaylog.000181
That does not match. So what is this information, and how is it useful to print this?
Kristian,
This is different issue. In many versions Innodb would print wrong position for current binary log (especially if it is disable) and not print the position of slave thread in master logs at all. Here is some old post of mine about it:
http://www.mysqlperformanceblog.com/2008/01/29/no-more-mysql-crash-safe-replication-in-50/
We have fixed it in Percona Server. Also if you’re doing FLUSH TABLES WITH READ LOCK anyway you can consider doing SHOW SLAVE STATUS/SHOW MASTER STATUS to get proper position instead of relying on that info.
hi, Peter
i don’t know what the
mysql -A
it real meaning? what is the “getting field list from all tables”?can you explain it more detailed ?
Hi Peter,
has this to your knowledge been fixed at all yet?