July 25, 2014

MySQL caching methods and tips

“The least expensive query is the query you never run.”

Data access is expensive for your application. It often requires CPU, network and disk access, all of which can take a lot of time. Using less computing resources, particularly in the cloud, results in decreased overall operational costs, so caches provide real value by avoiding using those resources. You need an efficient and reliable cache in order to achieve the desired result. Your end users also care about response times because this affects their work productivity or their enjoyment of your service. This post describes some of the most common cache methods for MySQL.

Popular cache methods

The MySQL query cache

When the query cache is enabled, MySQL examines each query to see if the contents have been stored in the query cache. If the results have been cached they are used instead of actually running the query.. This improves the response time by avoiding the work of the query. If you are getting the impression that I’ve just introduced you to the query performance magic bullet, unfortunately, I haven’t.

The problem with the query cache is “coarse invalidation”. That is, as soon as you change a single row in any table, the query cache entries for every query which accessed that table must be invalidated. If a frequently executed or expensive query is invalidated, response time will be significantly impacted.

The invalidation frequency is controlled by the rate of change in the database tables. This results in unpredictable and therefore, undesirable, performance. The other big problem with the query cache is that it is protected by a single mutex. On servers with many cores, a high volume of queries can cause extensive mutex contention. Percona Server even has a state in the processlist ‘waiting on query cache mutex’ so that this is easier to spot.

External cache (Memcached)

In order to eliminate the unpredictable nature of the the query cache, external caches like Memcached are usually employed. When rows in the database change, the old query results remain available in the cache, but they are now “stale”. Until the cache key expires, the contents are available immediately and you avoid performing work in the database.

Eventually the cache contents expire. When this happens, the application attempts to get the cached value and fails. At this time, the application must compute the results, then place them into the cache. Additionally, memory pressure may cause unexpected invalidation of items, once again resulting in unpredictable, and therefore undesirable performance.

If the cache is emptied (perhaps due to a restart, crash, upgrade or power loss) then all of the results are invalidated, which can cause very poor performance.

Cache invalidation is a problem

With both of these cache methods, once the cache is invalidated or expires the entire result must be recalculated. This recalculation may be very expensive. For example, suppose we need to calculate the total count of items sold. A query may have to access many millions of rows of data to compute that result and this takes time.

Another problem which can occur is the “cache stampede” aka a “miss storm”. This happens when multiple requests need data for the same key, but the key recently expired. The stampede impacts performance because multiple requests try to recompute the contents at the same time. These cache stampedes are essentially the cause of the unpredictable performance of the MySQL query cache, since the rate of invalidation can not be controlled, and multiple cache entries may be invalidated by a single table change.

Peter’s advice in the miss storm blog post suggests that for best performance one should pre-compute the data for expensive keys. That is, the keys which are frequently accessed, or those that are expensive to recompute. For the queries that are expensive to recompute, the pre-computation normally takes the form of summary tables, which are discussed next.

If request frequency is the problem (mostly likely because the response time goes up due to increased concurrency) but the time to compute the contents is low, then using a mutex around the content generation is a possible solution to the problem. This forces one query to do the computation while others wait for the result. There are also probabilistic methods to enqueue items to be rebuilt with some increasing probability as the request time approaches the expiration time. This does not, however, offer very much improvement for keys which are not accessed very frequently.

Use what you need

Both Memcached and the MySQL query cache are limited in size. If you try to cache more information than you can store, space will need to be freed in order to store the new information in the cache. In order to ensure cache efficiency, you must only place information in the cache that you intend to retrieve again. Storing data in the cache that you won’t read again also increases response time in your application because it wastes a round trip to the cache server. It wastes CPU and other resources too.

Pick an efficient cache representation

If you can cache an entire HTML block instead of the rows used to create it, then do so. This avoids the CPU usage on your web server to create the block from the rows again and again. If you are paying for compute cycles in the cloud, this can be very beneficial as may reduce the number of instances you need.

Don’t make too many round trips

Asking the cache for many different pieces of data to satisfy a single request is not very efficient. Use multi_get when possible. Once again, caching entire portions of pages is a good way to reduce the number of round trips to the cache.

Summary tables

Queries that access a lot of data usually face two bottlenecks: disk IO and CPU usage. Disk IO is expensive, and even if the disk bottleneck is eliminated, the sorting, aggregation and join operations are still CPU intensive and single threaded. In order to avoid these operations, the results can be pre-aggregated into a summary table.

Ideally, the summary tables can be updated with only the information that changed since they were last populated. MySQL includes two statements that make this easier: CREATE TABLE .. SELECT and INSERT .. SELECT. These SQL commands can be used to either replace the contents of, or insert new data into the summary table.

One advantage of summary tables is that they are persistent unlike the query cache or Memcached. There is no risk of unexpected invalidation, either. The summary table always exists, and should be fast to access, since it can be indexed appropriately for your queries.

Using INSERT .. SELECT for summary tables

The INSERT .. SELECT approach works best when there is some sort of log table, such as a log of clicks, a web access log, metric data from a monitoring system, or the like, which is to be aggregated over time. With this type of source data, one does not expect to see very many (if any) updates to the data once it has been collected.

This method does not usually work well when database tables may be updated or when rows may be deleted. Such changes may happen if a log is accidentally imported twice, for example and then the duplicate items are deleted. When this happens, decreasing the counts in summary tables may not be possible and thus they may be out of sync with the underlying data.

If there are frequent changes to the data then other options for maintaining summary tables must be used. Either they must be rebuilt from scratch each time (like a memcache miss) or they must be updated. Updating the summary tables efficiently is a hard problem to solve. My latest post on using Flexviews addresses this problem in a dedicated post.

Conclusion

If the least expensive queries are the ones you never run, then the most expensive queries very well may be the ones you have to run when the cache is empty. When talking about cache, the miss path is at least as important as the hit one. In order to make the miss path less expensive, use a layered approach to your caching. Cron jobs and summary tables can be used to make the miss path much less expensive. If you don’t pre-compute, and your website performance is dependent on your cache, then you could have a serious performance problem if your cache is unexpectedly emptied.

Effective caching is important. It is important to only cache things you know you will need again. Cache data in the form that it makes most sense to your application. Don’t cache the results of every query, simply because it is easy to do so.

About Justin Swanhart

Comments

  1. nate says:

    I’ve always thought of the mysql query cache as being pretty much worthless, I mean it’s a joke that something as simple as changing the case of a query can cause it not to use the cache -

    http://dev.mysql.com/doc/refman/5.1/en/query-cache-operation.html

    “Queries must be exactly the same (byte for byte) to be seen as identical.”

    The amount of limitations that determine whether or not the cache is used(in the url above) is staggering.

    Also kind of surprised you didn’t mention the innodb buffer cache, which to me is the most important anyways

    http://dev.mysql.com/doc/refman/5.0/en/innodb-parameters.html#sysvar_innodb_buffer_pool_size

    There’s lots of other buffers/caching settings too of course..

  2. Justin Swanhart says:

    This post is about result caching. They key buffer and innodb buffer pool don’t cache the results of queries, they cache indexes and tablespace pages, respectively.

    Percona Server defaults to stripping comments from queries, so they do not have to actually byte-for-byte identical. Besides, I make it pretty clear (at least I thought I did) that the query cache isn’t a very good idea because it holds a mutex, has coarse invalidation,etc.

    External cache (Memcached, Redis, etc) is almost always a good idea, but you have to plan for the miss path.

  3. Using an external cache is the most powerful and flexible solution, but it also requires other types of planning and coding. Which queries deserve to be cached? How long of an expiration time do you want? How do you clear cache entries manually when your data changes before the cache expiration?

    Nice things about the query cache: it’s nearly transparent to the developer; you don’t have to write any additional code; multiple applications that access the same data benefit from it immediately without having to add their own cache-handling code (as long as they all use identical SQL queries).

    Given the limitations of the query cache that Justin described, it’s clear that there ain’t no such thing as a free lunch. What you gain in convenience, you pay for in other ways. But in agile environments, developer productivity trumps optimal runtime performance.

    Still, I’d set query_cache_type=2, so it caches only cacheable queries that explicitly use “SELECT SQL_CACHE …” That’s an easy way to give yourself more fine control over which queries deserve to be cached, but still let MySQL handle cache invalidation transparently.

  4. Justin,

    Percona Server has an option to strip comments but it has same default behavior as MySQL – not stripping the comments:

    mysql> show variables like “%query_cache_strip%”;
    +—————————-+——-+
    | Variable_name | Value |
    +—————————-+——-+
    | query_cache_strip_comments | OFF |
    +—————————-+——-+
    1 row in set (0.00 sec)

    I also would note the big problem people have with query cache is also contention. It was not designed with multiple CPUs in mind and so its scalability with multi-cores is horrible.

  5. Arick says:

    I own a server which is fast enough to run my simple blog but lately it behave strangely.
    Sometimes it load fine and anther time it wasn’t load at all.
    Checking the log file, I saw lots of error in accessing Database. It said the query X was failed and database wasn’t available for few moments.
    I’m thinking to Cache the Database as my blog is Database-intensive. Using Cache for page only help a little as it is impossible to make connection to database.
    I have read your article about using Memcached and thinking it maybe good idea to use it for my server but I believe it wouldn’t help if concurrent connection to database happen at the same times.
    Maybe my only solution is to code the PHP to use less query and limit it to only make query which is not taking too long to retrieve. But with Hundred or millions of data in table, it maybe impossible.
    I wish there is solution to cache the database by server and by CMS.

  6. Mamta says:

    Hi,

    I am using mamcache,Is it create cache tmp files in system tmp directory ?
    What I am gettting in my centos system temp directory is temp files with php prefix and they too hugh and every minutes they are upading and therefore I am getting sometimes tmp directory memory issue.

    Is it possible,it is because of memcache or do yo know what is that ?

    Thanks,
    Mamta

  7. Mamta says:
  8. Hi Mamta,

    This post is well over 2 years old – and it’s not the best place to ask for help. Our Percona forums, however, are the perfect place to ask this. http://www.percona.com/forums/

    I look forward to seeing you join our MySQL and related discussions there. :)

Speak Your Mind

*