Write contentions on the query cache

While doing a performance audit for a customer a few weeks ago, I tried to improve the response time of their top slow query according to pt-query-digest‘s report. This query was run very frequently and had very unstable performance: during the time data was collected, response time varied from 50µs to 1s.

When I ran the query myself (a two-table join with a WHERE condition, the whole dataset was in memory), I always got a consistent response time (about 160ms). Of course, I wanted to know more about how MySQL executes this query. So I used commands you’re probably familiar with: EXPLAIN, SHOW PROFILE, SHOW STATUS LIKE 'Handler%'.
EXPLAIN and Handler counters only confirmed that the execution plan seemed reasonable and that fields were correctly indexed.

With SHOW PROFILE, I saw that most of the time was spent sending the result set to the client, which was not surprising as the result set was around 30,000 rows:

+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000075 |
| checking permissions | 0.000004 |
| checking permissions | 0.000004 |
| Opening tables       | 0.000021 |
| System lock          | 0.000009 |
| init                 | 0.000022 |
| optimizing           | 0.000013 |
| statistics           | 0.000075 |
| preparing            | 0.000016 |
| executing            | 0.000003 |
| Sending data         | 0.162272 |
| end                  | 0.000008 |
| query end            | 0.000004 |
| closing tables       | 0.000032 |
| freeing items        | 0.000035 |
| logging slow query   | 0.000004 |
| cleaning up          | 0.000005 |
+----------------------+----------+

+----------------------+----------+

| Status | Duration |

+----------------------+----------+

| starting | 0.000075 |

| checking permissions | 0.000004 |

| Opening tables | 0.000021 |

| System lock | 0.000009 |

| init | 0.000022 |

| optimizing | 0.000013 |

| statistics | 0.000075 |

| preparing | 0.000016 |

| executing | 0.000003 |

| Sending data | 0.162272 |

| end | 0.000008 |

| query end | 0.000004 |

| closing tables | 0.000032 |

| freeing items | 0.000035 |

| logging slow query | 0.000004 |

| cleaning up | 0.000005 |

+----------------------+----------+

So the unstable response times did not come from a bad execution plan, but rather from contention/excessive waiting somewhere in the server. A good candidate for contention issues was the query cache as it was enabled. Contention on the query cache mutex when checking if the result set can be served from the is quite common.

It reminded me that I executed the SHOW PROFILE and SHOW STATUS LIKE 'Handler%' with the SQL_NO_CACHE hint. Will the output be different without SQL_NO_CACHE?

Indeed it was. If the Handler counters were the same, I got around 200 lines of output from SHOW PROFILE instead of the 15 lines above. Particularly interesting, a sequence was repeated on and on:

[...]
| Sending data                   | 0.003067 |
| Waiting for query cache lock   | 0.000004 |
| Waiting on query cache mutex   | 0.000002 |
| Sending data                   | 0.003407 |
| Waiting for query cache lock   | 0.000003 |
| Waiting on query cache mutex   | 0.000003 |
| Sending data                   | 0.003515 |
| Waiting for query cache lock   | 0.000003 |
| Waiting on query cache mutex   | 0.000002 |
| Sending data                   | 0.003365 |
| Waiting for query cache lock   | 0.000003 |
| Waiting on query cache mutex   | 0.000002 |
| Sending data                   | 0.003380 |
| Waiting for query cache lock   | 0.000003 |
| Waiting on query cache mutex   | 0.000002 |
| Sending data                   | 0.003474 |
| Waiting for query cache lock   | 0.000003 |
| Waiting on query cache mutex   | 0.000002 |
[...]

[...]

| Sending data | 0.003067 |

| Waiting for query cache lock | 0.000004 |

| Waiting on query cache mutex | 0.000002 |

| Sending data | 0.003407 |