While doing a performance audit for a customer a few weeks ago, I tried to improve the response time of their top slow query according to pt-query-digest‘s report. This query was run very frequently and had very unstable performance: during the time data was collected, response time varied from 50µs to 1s.

When I ran the query myself (a two-table join with a WHERE condition, the whole dataset was in memory), I always got a consistent response time (about 160ms). Of course, I wanted to know more about how MySQL executes this query. So I used commands you’re probably familiar with: EXPLAIN, SHOW PROFILE, SHOW STATUS LIKE 'Handler%'.
EXPLAIN and Handler counters only confirmed that the execution plan seemed reasonable and that fields were correctly indexed.

With SHOW PROFILE, I saw that most of the time was spent sending the result set to the client, which was not surprising as the result set was around 30,000 rows:

So the unstable response times did not come from a bad execution plan, but rather from contention/excessive waiting somewhere in the server. A good candidate for contention issues was the query cache as it was enabled. Contention on the query cache mutex when checking if the result set can be served from the is quite common.

It reminded me that I executed the SHOW PROFILE and SHOW STATUS LIKE 'Handler%' with the SQL_NO_CACHE hint. Will the output be different without SQL_NO_CACHE?

Indeed it was. If the Handler counters were the same, I got around 200 lines of output from SHOW PROFILE instead of the 15 lines above. Particularly interesting, a sequence was repeated on and on:

Total response time was still around 160ms, but with so many waits for the query cache locks, it was easy to imagine that this query could take ages as soon as there is competition for the lock.

Of course, the question was: why did MySQL need so many accesses to the query cache lock?

The answer is in the way the query cache works. Simply stated, the server wants to lock the query cache both when checking if a result is in the cache and when writing a result set into the cache. When writing, locking can occur several times: the server sends results to the cache before computing the entire result set (so the total size of the result set is not known), so the cache has to assign memory block by block. If a block is full and the server keeps sending rows, a new block must be assigned, requiring the cache to be locked!

More specifically, the size of the blocks is set by the value of the query_cache_min_res_unit variable (4KB by default). For the query I was working on, each row was about 6 bytes, so the result set was around 180KB. It means that around 50 accesses to the query cache were needed to cache the entire result set. This is what SHOW PROFILE revealed.

A solution to the problem could be raise the value of query_cache_min_res_unit. But this could increase the fragmentation of the cache and degrade performance for other queries. The best solution was simply to turn off the query cache.

Conclusions:

  • Once again, the query cache revealed bad scalability, but it was in an unusual manner.
  • When you have unstable response times for a query, you should suspect either some change in data access pattern (reads on disk vs reads in memory, execution plan instability) or contention somewhere while the query is executing.
  • You should not forget that improving queries also means improving the stability of the response time, and not only having low response times sometimes. In this case, a stable 160ms response time was preferred to a response time ranging from 50µs to 1s.

And a final note for sharp-eyed readers: why do we have 2 lines showing waits on the query cache lock instead of one in the output of SHOW PROFILE? This is specific to Percona Server 5.5 and used only for debugging purposes to have a finer granularity.

11 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Daniël van Eeden

The “Sending data” phase in the Query Profile does not mean it’s actually sending data. A better description for that phase is “executing query plan”. Some queries will spend a lot of time in the “Sending data” phase but don’t actually send any data.

Gil Ganz

great post stephane, thx

aftab

>The best solution was simply to turn off the query cache
Alternatively you could have use SQL_NO_CACHE, would that help? Because turning the query_cache globally might effect other sqls? or perhaps disable query cache in the session?

Susane

Hi Stephane,

Great post! Thank you!
I work at a web hosting, and we have a server that have lots of databases, and constantly many queries on different databases appear with state “Waiting on query cache mutex”.I tried disable query cache, but this state continued appearing.What else I can do to eliminate, or decrease this state on my server?

Susane

Hi Stephane,

Thanks for your answer in first place.
I disabled the query cache this way:

SET GLOBAL query_cache_type=0;

Following Percona documentation:

https://www.percona.com/doc/percona-server/5.1/performance/query_cache_enhance.html

“Setting query_cache_type=0 in Percona Server ensures that both the cache is disabled and the mutex is not used.”

My Percona server version is 5.1.54, and run under Centos 5.8.

Thank you for your help!

Susane

Thank you Stephane!

I’ll try!

🙂

Michael Denney

What are your thoughts on the key cache?