Many backup tools including Percona Xtrabackup, MyLVMBackup, and others use FLUSH TABLES WITH READ LOCK to temporary make MySQL read-only. In many cases, the period for which server has to be made read-only is very short, just a few seconds, yet the impact of FLUSH TABLES WITH READ LOCK can be quite large because of the time it may take to complete this statement. Let’s look at what the problem is.

As of MySQL 5.5, FLUSH TABLES WITH READ LOCK does not work as optimally as you could think it works. Even though with general lock compatibility guidelines Read Lock should not conflict with another Read Lock, it does for this statement, and as such it has to wait for any SELECT statement to complete in order to complete locking tables. This means if you have a workload which includes some very long SELECT queries you can be potentially waiting for hours for this statement to complete. Here is example of how it can look:

As you can see FLUSH TABLES WITH READ LOCK is waiting for that very nasty “full join” select to complete. What is worse as the statement started execution all writes will be blocked to the server, which in the end causes “field list” operation to be blocked too, which among other tools used by mysql client, so you might get a feel you can’t connect to the database at all (use mysql -A in this case, to prevent it from getting field list from all tables)

Reads though would not be blocked, ie there is no MYISAM like table lock priority problem with pending WRITE query blocks any READ queries to execute on the table.

As a result, this means single run away select can effectively cause downtime if you use backup solution which does FLUSH TABLES WITH READ LOCK which is very unfortunate.

I’d love to see this problem fixed. There is no reason (other than code design) for read lock to wait for other read lock in order to be set.

As a workaround, you can consider having scripts which would check for a situation like above and either kill FLUSH TABLES WITH READ LOCK and fail backup or kill long-running SELECT queries to let backup to proceed but resolving server gridlock one way or another.

If you’re just using Innodb tables and you’re not actively changing users, stored procedures, etc (which are stored in MyISAM tables anyway) you can consider using –no-lock option with Percona Xtrabackup. Many other tools have a similar option. This option can be especially helpful when backing up data from the slave with –safe-slave-backup option, as preventing writes to the slave can be done by simply pausing replication.

21 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Davi Arnaut

> Even though with general lock compatibility guidelines Read Lock should not conflict with another Read Lock, it does for this statement, and as such it has to wait for any SELECT statement to complete in order to complete locking tables.

The query state says “Waiting for table flush” which indicates that FTWRL hasn’t closed a open table yet, hence it must wait until all other threads have closed the table in question. So, this is not really a question of lock conflicts.

Mark Callaghan

I think there are plans to make this better for some use cases in MariaDB. The Facebook patch has an InnoDB-only solution that avoids blocking. I think the problems for FTWRL come from having to support MyISAM and I hope that the distributions (Percona, MariaDB, MySQL) also provide a solution.

See “START TRANSACTION WITH CONSISTENT INNODB SNAPSHOT” in
https://www.facebook.com/note.php?note_id=390420710932

Andrew

Even killing FLUSH TABLES WITH READ LOCK won’t resolve the deadlock as of MySQL 5.5 – other queries will still be stuck in ‘Waiting for table flush’ even after the FLUSH TABLES WITH READ LOCK goes away. 🙁

http://bugs.mysql.com/bug.php?id=44884

SoboLAN

Have you considered filing a bug report ?

Shlomi Noach

Peter,
Why not use similar patch as in Google Patches II, where InnoDB file writes can be altogether disabled using a global variable setting?
For InnoDB based databases, this allows for immediate locking of writes, safe BINLOG position snapshot, no waiting on read-queries, and no actual flushing of data (I intentionally disregard MyISAM tables here).
Any caveats I’m not aware of? Would such a feature introduce a global mutex on which all writes would need to serialize or anything?

Mark Callaghan

The behavior might be briefly documented in the manual. I don’t think the potential problems from FTWRL have been documented in the manual.

Kristian Köhntopp

There is a second problem, which is worse, I believe. That is: Even with FLUSH TABLES WITH READ LOCK you are getting in some cases a snapshot that is not recoverable. Apparently, FTWRL locks frontend threads, but some backend InnoDB can still go on and play with tables?

That worries me at several levels: For one, how is, in the case of mylvmbackup, the LVM snapshot implemented? Is it not atomic in itself, so that InnoDB file writes can happen while the snapshot it being made?

If LVM snapshots are atomic, yet a FTWRL InnoDB snapshot can be inconsistent and unrecoverable sometimes, how is InnoDB ACID?

He Dengcheng

Hi, Peter

These days, i have discussed the hot backup problems with my friends, for percona xtrabackup, i think there are three problems which forced us using FLUSH TABLES WITH READ LOCK.
1. copy meta data files, like frm file.
2. copy data files other than InnoDB.
3. get the binlog file and binlog postion information.

And if we ensure the following requirements, we can do a hot backup without using FLUSH TABLES WITH READ LOCK:
1. do not do any ddl operations in the progress of hot backup, frm files are consistent.
2. we only use InnoDB as our engine, no other engines used.
3. get binlog file and binlog postion information from xtrabackup_binlog_pos_innodb file since InnoDB will wirte these information in its commit log.

Am i clear and right? If i’m wrong, please tell my why? Thanks!

Kristian Köhntopp

Peter, by ‘Even with FLUSH TABLES WITH READ LOCK you are getting in some cases a snapshot that is not recoverable’ I mean that i work in an environment with a large three digit number of servers, and in which we clone new slaves from existing slaves using mylvmbackup or a method that is derived from that.

Essentially, we are using FLUSH TABLES WITH READ LOCK, and then run ‘lvcreate -s’ to create a snapshot, mount this, and then rsync it across the network to a target location, a virgin box. In the target location, we bring up the new mysqld, which finds a crashed innodb, recovers this, and then it is being configured to become a new slave to the same master as the donor machine. Every once in a while the recovery of the innodb in the target location fails, though, even though it has been made even under FLUSH TABLES WITH READ LOCK (I believe it should work even without that).

We are using CentOS 5.x, LVM2, XFS and Oracle MySQL 5.5.x (x=16 on many boxes atm), but we have been experiencing this with many versions of MySQL. We are now using ‘service mysql stop’ instead of FLUSH TABLES WITH READ LOCK to get reliable snapshots from a donor box.

Mark Callaghan

Kristian – this is worth fixing if that is a bug in InnoDB. Can you file one at bugs.mysql.com with details from the mysqld error log? AFAIK, InnoDB is snapshot safe as long as database files and log files are on the same filesystem. But MySQL/InnoDB is not snapshot safe if DDL operations are in flight because DDL between MySQL (frm files) and InnoDB is not atomic and some failures are not recoverable (which can cause problems for recovery after a snapshot).

Alas, I don’t use LVM2 any more. I do use xtrabackup. It isn’t perfect but it continues to get better.

Kristian Köhntopp

Yes. The problem is that it is not easily being reproduced.

I grabbed two slaves right now, and am constantly recloning one from the other. I will leave my script running inside a screen for a few weeks, and see what comes out of that.

Kristian Köhntopp

I have a script that is constantly recloning a database now, in a loop, until that produces an InnoDB clone that is unrecoverable.

I have a followup question, though. The target InnoDB, when it recovers, says:


120327 11:25:22 InnoDB: Starting an apply batch of log records to the database…

InnoDB: Apply batch completed
InnoDB: Last MySQL binlog file position 0 15906301, file name ../log/binlog.000290

This is a slave, it has no binlog of its own. SHOW SLAVE STATUS says

Master_Log_File: binlog.000062
Relay_Log_File: relaylog.000181

That does not match. So what is this information, and how is it useful to print this?

repls

hi, Peter

i don’t know what the mysql -A it real meaning? what is the “getting field list from all tables”?
can you explain it more detailed ?

dominic

Hi Peter,

has this to your knowledge been fixed at all yet?