August 1, 2014

InnoDB’s gap locks

One of the most important features of InnoDB is the row level locking. This feature provides better concurrency under heavy write load but needs additional precautions to avoid phantom reads and to get a consistent Statement based replication. To accomplish that, row level locking databases also acquire gap locks.

What is a Phantom Read

A Phantom Read happens when in a running transaction, two identical statements get different values, because some other transaction has modified the table’s rows. For example:


transaction1> START TRANSACTION;
transaction1> SELECT * FROM t WHERE i > 20 FOR UPDATE;
+------+
| i |
+------+
| 21 |
| 25 |
| 30 |
+------+


transaction2> START TRANSACTION;
transaction2> INSERT INTO t VALUES(26);
transaction2> COMMIT;

transaction1> select * from t where i > 20 FOR UPDATE;
+------+
| i |
+------+
| 21 |
| 25 |
| 26 |
| 30 |
+------+

Phantom reads do not occur if you’re simply doing a SELECT. They only occur if you do UPDATE or DELETE or SELECT FOR UPDATE. InnoDB provides REPEATABLE READ for read-only SELECT, but it behaves as if you use READ COMMITTED for all write queries, in spite of your chosen transaction isolation level (considering only the two most common isolation levels, REPEATABLE READ and READ COMMITTED).

What is a gap lock?

A gap lock is a lock on the gap between index records. Thanks to this gap lock, when you run the same query twice, you get the same result, regardless other session modifications on that table. This makes reads consistent and therefore makes the replication between servers consistent. If you execute SELECT * FROM id > 1000 FOR UPDATE twice, you expect to get the same value twice. To accomplish that, InnoDB locks all index records found by the WHERE clause with an exclusive lock and the gaps between them with a shared gap lock.

This lock doesn’t only affect to SELECT … FOR UPDATE. This is an example with a DELETE statement:

transaction1 > SELECT * FROM t;
+------+
| age |
+------+
| 21 |
| 25 |
| 30 |
+------+

Start a transaction and delete the record 25:

transaction1 > START TRANSACTION;
transaction1 > DELETE FROM t WHERE age=25;

At this point we suppose that only the record 25 is locked. Then, we try to insert another value on the second session:

transaction2 > START TRANSACTION;
transaction2 > INSERT INTO t VALUES(26);
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
transaction2 > INSERT INTO t VALUES(29);
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
transaction2 > INSERT INTO t VALUES(23);
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
transaction2 > INSERT INTO t VALUES(31);
Query OK, 1 row affected (0.00 sec)

After running the delete statement on the first session, not only the affected index record has been locked but also the gap before and after that record with a shared gap lock preventing the insertion of data to other sessions.

How to troubleshoot gap locks?

Is possible to detect those gap locks using SHOW ENGINE INNODB STATUS:

---TRANSACTION 72C, ACTIVE 755 sec
4 lock struct(s), heap size 1248, 3 row lock(s), undo log entries 1
MySQL thread id 3, OS thread handle 0x7f84a78ba700, query id 163 localhost msandbox
TABLE LOCK table `test`.`t` trx id 72C lock mode IX
RECORD LOCKS space id 19 page no 4 n bits 80 index `age` of table `test`.`t` trx id 72C lock_mode X
RECORD LOCKS space id 19 page no 3 n bits 80 index `GEN_CLUST_INDEX` of table `test`.`t` trx id 72C lock_mode X locks rec but not gap
RECORD LOCKS space id 19 page no 4 n bits 80 index `age` of table `test`.`t` trx id 72C lock_mode X locks gap before rec

If you have lot of gaps locks in your transactions affecting the concurrency and the performance you can disable them in two different ways:

1- Change the ISOLATION level to READ COMMITTED. In this isolation level, it is normal and expected that query results can change during a transaction, so there is no need to create locks to prevent that from happening.
2- innodb_locks_unsafe_for_binlog = 1. Disables the gap locks except for foreign-key constraint checking or duplicate-key checking.

The most important difference between these two options is that the second one is a global variable that affects all sessions and needs a server restart to change its value. Both options cause phantom reads (non repeatable reads) so in order to prevent problems with the replication you should change the binary log format to “row”.

Depending on the statement, the behavior of these locks can be different. In the following link there is a good source of information:

http://dev.mysql.com/doc/refman/5.1/en/innodb-locks-set.html.

Conclusion

MySQL uses REPEATABLE READ as the default isolation level so it needs to lock the index records and the gaps to avoid phantom reads and to get a consistent Statement based replication. If your application can deal with phantom reads and your binary log is in row format, changing the ISOLATION to READ COMMITTED will help you to avoid all those extra locks. As a final advice, keep your transactions short :)

About Miguel Angel Nieto

Miguel joined Percona in October 2011. He has worked as a System Administrator for a Free Software consultant and in the supporting area of the biggest hosting company in Spain. His current focus is improving MySQL and helping the community of Free Software to grow. Miguel's roles inside Percona are Senior Support Engineer and Manager of EMEA Support Team.

Comments

  1. Miguel,
    What isolation mode do you use for your first example ? I just tried it with PS 5.5 and default (REPEATABLE READ) isolation mode and the insert is failing

    mysql> START TRANSACTION;
    Query OK, 0 rows affected (0.00 sec)

    mysql> INSERT INTO t VALUES(26);
    ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
    mysql> show variables like “%isol%”;

  2. Peter,

    The first example is READ COMMITTED. I’ve changed the isolation level to get that example of a phantom read. With REPEATABLE READ the rows and gaps gets locked and then you get the lock wait timeout.

  3. Andy says:

    Statement 1: “it behaves as if you use READ COMMITTED for all write queries, in spite of your chosen transaction isolation level”

    Statement 2: “changing the ISOLATION to READ COMMITTED will help you to avoid all those extra locks”

    But if MySQL behaves as if READ COMMITTED is used regardless of which isolation level is actually chosen, then why would changing ISOLATION make any difference?

    Whether ISOLATION is set to READ COMMITTED or REPEATABLE READ, MySQL would still behave as if READ COMMITTED is used according to statement 1 above. So what’s the point of changing ISOLATION?

  4. Andy,

    With READ COMMITTED InnoDB doesn’t take additional locks on the gap, improving the concurrence. You would avoid all “Lock wait timeout exceeded” messages in my example. So only row with value 25 would be locked, not everything between 21 and 30, including the gap. Compare this example with the following one using READ COMMITTED:

    session1> delete from t where i=25;
    Query OK, 1 row affected (0.00 sec)
    session2> insert into t values(26);
    Query OK, 1 row affected (0.00 sec)
    session1> commit;
    session2> commit;
    session1> select * from t;
    +——+
    | i |
    +——+
    | 21 |
    | 26 |
    | 30 |
    +——+

    No locks on the gap, so no wait timeout and more concurrence.

  5. I wish this post had been around a few months ago! I just went through this issue recently and there weren’t a lot of resources available on the topic. I also wrote up a post how we worked through the issue at http://insatiabledemand.ideeli.com/post/18850015294/diagnosing-mysql-auto-inc-and-gap-locks

  6. Andy says:

    Miguel,

    SO MySQL does indeed behave differently depending on whether ISOLATION is set to READ COMMITTED or REPEATABLE READ?

    I’m just confused by the statement in the article: “it behaves as if you use READ COMMITTED for all write queries, in spite of your chosen transaction isolation level”

    That statement makes it sounds like no matter which level you set ISOLATION to, MySQL is just going to behave as if READ COMMITTED is being used. Maybe you could edit that statement to clarify?

  7. repls says:

    hi Miguel,
    as you said :’After running the delete statement on the first session, not only the affected index record has been locked but also the gap before and after that record with a shared gap lock preventing the insertion of data to other sessions.”

    so how to calculate the shared gap lock ?
    and another question, what the index information about your table t?

  8. repls says:

    hi,Miguel

    what the environment of your second example? whether is a index on column age?
    if so, i think the outcome must not be the same with yours.

  9. Aaron Brown says:

    Miguel –

    Not to speak for Miguel, but I believe that what he means by “it behaves as if you use READ COMMITTED for all write queries, in spite of your chosen transaction isolation level” is that transactions will be REPEATABLE READ until the first write (or SELECT FOR UPDATE) happens, at which point the transaction is implicitly converted to READ COMMITTED. I wrote up a demo of this over on DBA StackExchange: http://dba.stackexchange.com/questions/15854/innodb-row-locking-how-to-implement/15864#15864. I haven’t tested it with INSERT/DELETEs, but it should definitely behave the same if you change the SELECT FOR UPDATE into an UPDATE.

  10. repls says:

    hi,Miguel,
    i have more questions, as you said : ‘Phantom reads do not occur if you’re simply doing a SELECT ‘ . i think you may make a mistake. in the first example the isolation level is READ-COMMITTED. but as i know, when the isolation level is READ-COMMITTED, the phantom reads are always exist, and not related to the select type,but only up to whether the another transaction commit. because the READ-COMMITTED means read the latest result that it could.

  11. I revisted my Stack Exchange post and found a fundamental flaw in how I was testing. Now, I am confused what you mean when you say that writes behave as if in READ COMMITTED. My testing shows that the SELECT … FOR UPDATE statement itself operates in RC, despite the transaction being RR, but subsequent queries continue to be RR even if you execute INSERT/UPDATE/DELETE operations. I may write up a blog post about this myself to give myself an excuse to explore the behavior.

  12. Hi all,

    I’m going to try to explain the following phrase with an example:

    “InnoDB provides REPEATABLE READ for read-only SELECT, but it behaves as if you use READ COMMITTED for all write queries, in spite of your chosen transaction isolation level”

    I have two sessions with REPEATABLE READ transaction isolation level.

    mysql> show create table t\G
    *************************** 1. row ***************************
    Table: t
    Create Table: CREATE TABLE `t` (
    `i` int(11) DEFAULT NULL,
    `c` char(20) DEFAULT NULL,
    KEY `i` (`i`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1

    mysql> select * from t;
    +——+——–+
    | i | c |
    +——+——–+
    | 1 | miguel |
    | 3 | andy |
    | 5 | aaron |
    | 8 | repls |
    +——+——–+

    Then I start two transactions, session1 and session2:

    session1> start transaction; select * from t;
    session2> start transaction; select * from t;

    I delete one row:

    session2> delete from t where i=1;
    session> commit;

    In session 1 as I said, phantom reads do not occur if you’re doing a simple SELECT.

    session1> select * from t;
    +——+——–+
    | i | c |
    +——+——–+
    | 1 | miguel |
    | 3 | andy |
    | 5 | aaron |
    | 8 | repls |
    +——+——–+

    As I said, write queries behaves as if you were using READ COMMITTED in spite of your chosen transaction isolation level. So lets try to remove the row 1 again:

    session1> delete from t where i=1;
    Query OK, 0 rows affected (0.00 sec)

    Rows affected 0 :) Because the row doesn’t exist anymore. You can read it (REPEATABLE READ) but you can modify it (READ COMMITED).

    I open a new transaction again on the session 2 without closing the first one:

    session2> select * from t;
    +——+——-+
    | i | c |
    +——+——-+
    | 3 | andy |
    | 5 | aaron |
    | 8 | repls |
    +——+——-+

    session2> update t set c=”baron” where i=3;
    session2> commit;
    session2> select * from t;
    +——+——-+
    | i | c |
    +——+——-+
    | 3 | baron |
    | 5 | aaron |
    | 8 | repls |
    +——+——-+

    Now go back to session1 and try to search the row with the value “andy”:

    session1> select * from t;
    +——+——–+
    | i | c |
    +——+——–+
    | 1 | miguel |
    | 3 | andy |
    | 5 | aaron |
    | 8 | repls |
    +——+——–+

    session1> update t set c=”peter” where c=”andy”;
    Query OK, 0 rows affected (0.00 sec)

    Again, you can read the row with the value andy, but for queries that write data, the isolation is READ COMMITED so you can’t update/delete that row, because it doesn’t exist anymore.

    I hope this example clarify your questions, if not, feel free to ask :)

    Regards,

  13. repls says:

    hi,Miguel,

    thanks very much, i can understand what you mean.

  14. GuoSai says:

    Hi Miguel ,

    I’ve tried your example “the affect gap locks to SELECT … FOR UPDATE ” on my server ,and my results are as follow:

    Transaction1:

    root@localhost:test 14:33:38>start transaction;
    Query OK, 0 rows affected (0.00 sec)

    root@localhost:test 14:35:22>delete from t where i=25;
    Query OK, 1 row affected (0.00 sec)

    root@localhost:test 14:35:29>select * from t;
    +——+
    | i |
    +——+
    | 21 |
    | 30 |
    | 26 |
    +——+
    3 rows in set (0.00 sec)

    root@localhost:test 14:51:40>

    Transaction2:

    root@localhost:test 14:35:50>start transaction;
    Query OK, 0 rows affected (0.00 sec)

    root@localhost:test 14:35:54>insert into t values(29);
    ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

    root@localhost:test 14:38:03>insert into t values(23);
    ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

    root@localhost:test 14:44:37>insert into t values(31);
    ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

    root@localhost:test 14:50:14>insert into t values(32);
    ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

    root@localhost:test 14:55:29>insert into t values(33);
    ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

    Here I tried to insert a value out of [21,30],but it came out my insert still be blocked. The SHOW ENGINE INNODB STATUS information as follow:

    —TRANSACTION 0 27638, ACTIVE 1207 sec, process no 2909, OS thread id 1100101952 inserting
    mysql tables in use 1, locked 1
    LOCK WAIT 2 lock struct(s), heap size 1216, 1 row lock(s)
    MySQL thread id 31929, query id 484616 localhost root update
    insert into t values(33)
    ——- TRX HAS BEEN WAITING 34 SEC FOR THIS LOCK TO BE GRANTED:
    RECORD LOCKS space id 57 page no 3 n bits 72 index `GEN_CLUST_INDEX` of table `test`.`t` trx id 0 27638 lock_mode X insert intention waiting
    Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
    0: len 8; hex 73757072656d756d; asc supremum;;

    ——————

    The lock_mode is “X insert intention” .

  15. Coway Zhou says:

    Regarding the Phantom Read definition in your post,

    “A Phantom Read happens when in a running transaction, two identical statements get different values, because some other transaction has modified the table’s rows.”

    You used ‘modified’, which seems not precise. Another transaction can modified the existing rows that satisfies the select criteria.

    A precise definition of Phantom Read is:
    A Phantom Read happens when in a running transaction, two identical statements get different values, because some other transaction has inserted a new row that matches search criteria, or modified a row that is not initially in the first transaction’s query, and the the row contains search criteria for the first transaction query after the modification.

    My understanding is the modified rows must not initially in the result set of the first transaction, or they are just new rows.

  16. Ferry says:

    Hi Miguel,

    I experienced the same problem in my production database when my batch process which creating a EOD file reads my transaction table which caused new transaction failed with error “Lock wait timeout exceeded; try restarting transaction”. The EOD process takes more than 30 minutes to complete.

    Isolation level of my database (mysql innodb) is using REPEATABLE READ.

    My question: is it safe to change the isolation level to READ COMMITED?

    Please advise.

    Thanks,

    Ferry

  17. Amit Shah says:

    hi Ferry,

    Message “Lock wait timeout exceeded” will appear irrespective of isolation level “Repeatable Read” or “Read Committed”.
    This message is either because of long running transaction(trxA) which is holding locks on rows or transaction(trxA) which has completed operation but neither committing nor rolling back transaction. This transaction(trxA) is blocking other transaction(trxB) from acquiring lock on table (check wait_timeout variable) and this error appears.

    Now isolation level, two factors can be considered for choosing isolation level
    - If you have existing replication set up which is using “statement base replication” then don’t change isolation level to “read committed.”
    - If you have long running queries and same time data is being updated for same tables then performance may go down as for such transaction MySQL has to manage snapshot for long period of time. In this scenario also “read committed” is not advisable without testing and matching stats with both isolation level.

    Thanks,
    Amit.

  18. Federico Ciliberti says:

    Hi Miguel.

    Trx 1
    mysql> start transaction;
    Query OK, 0 rows affected (0.00 sec)

    Trx 2
    mysql> start transaction;
    Query OK, 0 rows affected (0.00 sec)

    Trx 1
    mysql> select max(nro_comp) from comp_mov where cod_emp = 1 and cod_tcomp = 11 for update;
    +—————+
    | max(nro_comp) |
    +—————+
    | 58515 |
    +—————+
    1 row in set (0.00 sec)

    Trx 2:
    mysql> select max(nro_comp) from comp_mov where cod_emp = 1 and cod_tcomp = 11 for update;
    (waiting…..)

    Trx 1:
    mysql> insert into comp_mov set cod_emp =1, cod_tcomp = 11, nro_comp = 58516;
    ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
    m

    Trx 2:
    (loses lock and show)
    +—————+
    | max(nro_comp) |
    +—————+
    | 58515 |
    +—————+
    1 row in set (10.56 sec)

    I expect trx2 to wait until trx 1 commits or rollback and then get the last nro_comp to make trx insert. But this is not the case.

    Because of lock order the insert statement throw deadlock but i think this is wrong.

    ¿Is there a way to accomplish concurrence when inserting multi column primary key rows?

    Thanks an sorry about my poor English.

Speak Your Mind

*