April 18, 2014

mk-query-digest, query comments and the query cache

I very much like the fact that MySQL allows you to embed comments into SQL statements. These comments are extremely convenient, because they are written into MySQL log files as part of the query. This includes the general log, the binary log and the slow query log. Maatkit includes tools which interact with these logs, including mk-query-digest. This tool, in particular, has a very nice option called –embedded-attributes which can process data embedded in query comments.

The support for embedded attributes makes some cool tricks possible. Peter and I co-presented a talk at this past MySQL Conference and Expo. In this talk I presented my Instrumentation-for-PHP class as a demonstration instrumentation application for the LAMP stack. Instrumentation-for-php collects information about the PHP process such as wall time, cpu time, mysql query times, etc, and automatically places this information into the Apache environment. It also includes support for “augmenting” SQL queries with additional information such as the PHP source line and user session information by embedding this information in a specially formatted SQL comment. This information can then be cross-referenced between MySQL logs and Apache logs.

Notice that the query ends with a comment, and that this comment includes important information, such as the PHP session ID, and the application user name. This allows mk-query digest to discover these attributes and use them for aggregation. You could also filter on the embedded information to display only queries from a particular file, particular session or particular user for example.

The –embedded-attributes option takes two parts. The first part is a Perl regular expression which identifies which comment will be treated as the one with attributes. The second part is another expression used to split the comment into attribute key/value pairs.

For example:

The first part: ‘^– [^\n]+’ matches everything starting with — and ending with newline.
The second part includes two expressions, each enclosed by parenthesis so that they will be captured. The first matching expression:(\w+): matches one or more words followed by colon and a space. The second match expression:([^ ]+)captures everything up to the first space.

A problem with this approach is that MySQL treats identical queries with different comments as different queries. This essentially renders the query cache inoperative when this approach is taken. A cache that can’t return any results is worse than not having a cache at all. The cache won’t return any result if each query has some sort of unique identifier in it. Evicting queries will become very frequent when every query is unique and the query cache will use additional resources for no effective benefit.

Luckily the problem is mostly* fixed in Percona Server 11.

I wrote a simple test script to demonstrate the difference. The test loops twice over a set of query patterns. Each pattern is executed 1000 times in each iteration of the loop. The queries differ only by comments. In the best case scenario where comments are stripped we would expect 1000 misses and then all hits from this test. You can enable query cache stripping in Percona Server at runtime:

QueryHitsMissesHitsMisses
 StrippingNo Stripping
no comment1000 1000 1000 1000
random at start2000 0 0 2000
random at middle2000 0 0 2000
static at start2000 0 1000 1000

As you can see, the first loop of the queries (that is, those with no comment at all) get 1000 misses. Everything beyond that is a hit when stripping is enabled. Contrast this with the results where stripping is not enabled. There are many more misses due to the random values in the query comments. Also notice the query with static comments. There are 1000 hits and 1000 misses because MySQL treats identical queries with identical comments as the same. When comments are stripped every one of these queries is answered by the cache.

If you are benefiting from the query cache, but would like better instrumentation in your queries, consider using Percona Server and turning on the –strip-query-cache-comments feature.

* Percona Server 11.1 still has a problem with # and — comments at the start of a query(before SELECT), and in some cases, with varying white-space. These issues will be rectified in the next release. I’ve updated instrumentation-for-php to place the comment at the end of the query to avoid these problems.

About Justin Swanhart

Comments

  1. peter says:

    Justin,

    I wish you could post some “real life” examples of use of this feature of MK Query digest. I think being able to see all queries which correspond to single PHP session (and being able to aggregate such data from multiple database servers if needed) is really neat, same as being able to check all queries from given page view or what queries contribute to response time to the
    particular application functionality.

  2. Justin Swanhart says:

    Hi Peter,

    I am going to write up another post with more real world examples.

Speak Your Mind

*