Over last few years I’ve been suggesting more people to disable Query Cache than to enable it. It can cause contention problems as well as stalls and due to coarse invalidation is not as efficient as it could be. These are however mostly due to neglect Query Cache received over almost 10 years, with very little changes done to the core of original implementation which appeared in MySQL 4.0 Query Cache was designed to work with single core systems and relatively small memory amounts, both of which are things of the past.

However, if you think about Core idea of the MySQL Query Cache – it is great. The transparent cache which does not require any extra handling from an application side and which just makes things faster without adding complexity to the application (as memcached does for example). Query Cache is also one of the few cache implementations which I’d call an Active Cache meaning it takes care of the misses itself, without you have to do anything about them. This property is different from completely transparent cache and you’ll see why below.

So what I think could be done with MySQL Query Cache so it once again is appealing to improve performance for MySQL Applications ?

Make it Lockless Can we re-implement MySQL Query Cache so it is lockless ? I believe it may be possible doe to rather simple nature of algorithms involved, at least we can use same “cheat” as MySQL 5.5 uses for Innodb Buffer Pool and have multiple MySQL Query Cache instances hashed by query hash. It would help to reduce complexity dramatically.

Fast Invalidation Right now when you update the table all queries have to be invalidated from the query cache… which can take a lot of time if you have millions of queries cache for given table. Instead we could track versions for tables (increment each time table is updated) and check whenever version matches for all tables on which query depends on. If not you need to re-generate the query result. Removing of stale queries in this case can be done through some sort of garbage collection.

Support for Delayed Invalidation Many MySQL application do not need absolutely up to date data, something which is couple of seconds or sometimes minutes old can do just as well. This especially applies to reads issued to Slaves – because these are potentially lagging we can’t expect to get absolutely up to date data anyway. I’d like to see support for something like SQL_CACHE_TTL=60 to specify we’re OK to get query result which is up to 60 stable. We can also get query_cache_ttl as session variable, which is very helpful for connections opened to the slaves, which would allow to read old data as application is designed to deal with it anyway.

Smart Query Matching Query Cache requires queries to be exactly the same which does not work well with queries containing dynamic comments as well as simply multiple instances by the same query typed in by different developers. Supporting different comments is especially important as many application put extra information in the comments to assist diagnosing the application, such as where this query comes from or what application user triggered such query. Query Cache matching can be done a lot smarter. At Percona Server we’ve done the first step by allowing you to strip comments when checking query cache for match.

Protection from Multiple Misses So you have item invalidated from query cache and there are multiple connection requesting the same item. What MySQL will do ? It will allow all of them to execute concurrently and when one of them will produce result which can be cached and used by future queries. What would be smarter thing to do is to detect the result for exactly same query is already being generated, wait for completion and when simply use it.

More Granular Invalidation Can we get something less coarse than per table invalidation ? It is tricky but I believe it can be done for rather large set of applications. Many applications in MySQL have very large number of queries doing updates through primary key and use primary key for selects. I believe in such cases we can track it in a more granular fashion. This is one I’m most unsure about though as MySQL Query Cache is most useful for complex expensive queries, for which tracking exact rows used from query syntax may be hard.

Query Cache Storage Engines MySQL Query Cache now stores queries in local memory, which is great however memory amount might be limited especially in constrained environments as EC2. We could implement API which would allow one to use local memory, Memcache or for example local flash storage for storage.

Compression You can see it as a property of Storage Engine but let me put it separately as well. I believe Query Storage would benefit a lot from fast compression, such as QuickLZ. Results sets for most queries is highly compressible and light compression can compress results at faster than wire speed for 1Gb networks. Plus if we can get MySQL protocol to support such light compression in addition to zlib we could stream compressed results directly from storage and only de-compress it on the client.

Delayed Updates Delayed Updates is complement to delayed invalidation. In many applications you have updates which are kind of not important you just need them to propagate sometime. For example if I’m maintaining counter I may not want changing that to have complete invalidation. Having the option to have some updates not to invalidate query cache can be powerful, though sometimes dangerous feature. I’m not quire sure if it is worth doing or if delayed invalidation is enough.

Improve Manageability What is in the MySQL Query Cache ? How many hits did we get for this query item ? Right now none of such information is available which makes it hard to understand how exactly MySQL Query Cache is working in many cases. This however can be improved and data about Query Cache content and operation can be presented as some INFORMATION_SCHEMA tables.

Self Refresh MySQL Query Cache causes not uniform performance as you get very fast responses for hits but you might need to take significant hit for misses. This is especially the problem for very expensive queries. The good thing about MySQL Query Cache is it knows exactly what is needed to refresh its content and so it can do it automatically, preventing some queries with SQL_CACHE_TTL from expiring as well as re-populating the cache after table update invalidated a lot of items. Frankly if you improve query cache manageability and make stats available this can even be done by external script which can have application-specific tunings to prioritize which queries are more important to refresh for given applications.

As a Summary I believe there is a lot of things which can be done to Query Cache to make it cool again. The question is whenever anyone from “MySQL Market Players” – Oracle, MariaDB, Drizzle, Percona Server, will allocate resources to implement at least some portion of items from this list.

I also challenge you to post your own query cache wishes and ideas here, as far as I’m concerned this list is far from being complete.

68 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Justin Swanhart

The QC is still subject to coarse invalidation at server restart unless you also think about persisting it, which adds another level of complexity.

Additionally, the query cache still doesn’t help when non-deterministic functions are used.

Last, I think you’ve blogged in the past about how the query cache interacts with transactions, and found some serious drawbacks. This probably needs review as well.

Much of what you describe, I think, could be done by supporting query rewrite in the server and allowing the database to access summary tables transparently. A scalable query cache would then just cache the results of rewritten queries.

tobi

If you are into such stuff, like me, this must be an exciting job.

LinuxJedi

I suspect much of this can be achieved if you make Query Cache a plugin type and then add a memcached plugin for it.

Personally I’d also like to see better support for distributed engines, by this I mean some kind of storage engine API trigger that invalidates or alters the content of the cache when the data changes externally to the MySQL server.

Steve

“More Granular Invalidation”

Interesting you bring this up Peter. This is something that’s been buzzing around in my head for the last 6 months. For me, granularity of invalidation is probably the biggest issue with the query cache implementation – and for sure is a large contributor to lock contention.

2 things that get on my nerves… the first, and I am sure the most easy to fix is per-schema caching. I don’t understand why this hasn’t been done before. It is a huge annoyance when you have multiple equally-busy schemas on the same instance contending query cache mutex.

Second… needless invalidation of results sets unrelated to affected rows a.k.a per-table invalidation. I would love the ability to provide a context hint to table definitions which are stored in the query cache and invalidated only when rows containing the appropriate value are modified – I realize for the most of the time, this will be primary keys in a properly normalized database, however I think a hint is the best way to allow for flexibility and integration into existing applications.

I also TOTALLY second the TTL hints. It is a major annoyance to not have control over cache lifetime and I will take the dev out for lots of beers who implements this.

All these kind of changes would give a considerable boost to scalability… lets face it… many devs are already doing this, and more, in Memcache.. but it would be sweet if it was available out of the box! Maybe the philosophy should be – if its good enough for Memcache, its good enough for MySQL…

Hartmut Holzgraefe

What about restricting query caching to certain tables only?

qcache-(do|ignore)-cache-table=

That way one could have the benefit of cached queries on pure lookup tables without the cost of invalidations on busy tables …

Patrick Casey

For my use cases, wiring in an external cache like memcached probably doesn’t help me out too much. Most of my really expensive queries are against tables with a high degree of volatility so I rarely get a cache hit.

Where I do get a benefit these days is “cheap” queries that I do a lot. These I save the parse/execute overhead on since they just come right out of query cache.

I’d assume that its faster to actually parse/execute say, a select by primary key, than it is to add a network hop to a memcachd server. Depends on the query of course, but in my use case that’s what we’d be doing.

What I’d actually like to be able to do would be to configure on a per table basis whether or not a particular table was part of the query cache. Logic in the case of a join would be to only use the cache if all members of the join were part of the cache.

I’d also like to see it work better under parallel load either going to a true lockless implementation or at least breaking the lock up a bit, say using breaking the cache into 16 segments and choosing a cache segment on hash of sql mod 16 or some such.

Anyway, this is one of those features where I’ve gone full circle kind of like Peter has. I loved the idea when I read about it, got burned in production with invalidation stalls and cursed it for months, and now I’m coming around to wishing somebody would fix it.

Arjen Lentz

An advantage of memcached storage would be that the cache could be shared between multiple MySQL instances, for instance a group of slaves.
That makes the system overall more efficient as long as tables don’t change.

When something does change, it can cause multiple invalidations because replication is not synchronous – if you’d be able to tie a transaction ID to the cache (part of the semisync patch, etc), then you’d be able to identify an item that’s already been updated by another instance and thus not unnecessarily invalidate.

I think there’s benefits for using query cache. Of course it’s not in all situations and we know that. But it’s a useful tool to have in the toolchest.

I also agree with Patrick Casey’s desire for including/excluding dbs/tables from the cache. While that can be done in the app, it could be beneficial to do this in my.cnf.

Patrick Casey

Hartmut,

Yah, that’s more or less what I’d like to see.

Selfishly, because it’d be the easiest possible implementation for me, I’d like cacheability to be a table attribute e.g.

alter table foo disable query_cache;
create table bar (id int(4) auto_increment, value varchar(255)) disable query_cache;

Not that I’m sure that’s the best general case solution mind you, just that it’d be easy for me :).

Davi Arnaut

I tend to think that the query cache does not belong in the database. It is solving a problem, but at the wrong layer. Along these lines, a memcached query cache plugin sounds like a bad idea. It would mean a added network latency for the most basic database operations, not counting the issues with network outages (and stale data). A memchached query cache is something that should be on top of the database, not below.

“Smart” query caching is also another shoot in the foot. Properly parsing comments and other components of a query is not that simple, one ends up caching queries that might collide with other somewhat unrelated queries, or caching syntactically invalid queries (as is the case of stripping comments).

I would vote for getting rid of the query cache.

Davi Arnaut

Hartmut,

That is already implemented in the form of SQL_NO_CACHE. When querying busy tables, use it.

LinuxJedi

I refer people to Domas’ query cache tuner:

http://dom.as/tech/query-cache-tuner/

Rob Wultsch

In the shared hosting space the query cache is mega WIN.

@LinuxJedi: iirc, he also sent WordPress a patch to make it cool with the QC.

Daniël van Eeden

Improving QC would be great, but Materialized Views could also be used instead of the QC for quite a few senario’s.

Could the QC be used to speedup COUNT(*) w/o where for InnoDB/XtraDB?

Steve

Another interesting approach would be to automatically SQL_NO_CACHE queries which return highly efficient execution plans from the optimizer.

I mean… if we are getting EXPLAINS like the following

explain select username from users where id = 1;

+—-+————-+————+——-+—————+———+———+——-+——+——-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+————+——-+—————+———+———+——-+——+——-+
| 1 | SIMPLE | users | const | PRIMARY | PRIMARY | 4 | const | 1 | |
+—-+————-+————+——-+—————+———+———+——-+——+——-+

Why should we bother caching that? It so fast anyway, and putting it in query cache is just swelling it for no reason, resulting in TONS of invalidation locks. Yes yes, I know you can add SQL_NO_CACHE, but going through an entire application looking for these kind of situations is a PITA. Let MySQL decide?

Patrick Casey

Steve,

I *think* there is still a benefit even on super-cheap queries with the query cache, assuming you don’t have to invalidate a lot. I have to think that:

HASH SQL
Find packet in query cache
Write packet to wire

is faster than
PARSE SQL
Execute plan
Find root of btree (top block)
Go left, read next node (next block)
Go right, read next node (data block)
Construct response packet
Write packet to wire

Even if the whole table is memory mapped, I’d still *think* it’d be faster to just yank the packet out of the query cache than to go through the whole parse/execute sequence.

Come to think of it, isn’t that one of the reasons HandlerSockets is so fast? Doesn’t have to deal with parse/execute overhead?

Timmy Denike

Would be great to have a little attention paid to the query cache. I’ve always disabled it by default, but just recently started looking into tracking down dupe queries and moving them into memcached. From brief sample, I found that 26% of our SELECT qps are duplicate queries. (This from a codebase that’s already had substantial attention paid to caching – apparently not enough, however.)

Steve

Patrick. I think you will find that when you are executing thousands of PK lookups per second – flooding the cache, and you then perform an update on just ONE row of that table you will find that invalidation (plus the potential 50 ms timeout on the cache which anyway will cause your queries to go down through the parser/optimizer layers… http://dev.mysql.com/doc/refman/5.1/en/news-5-1-42.html) will soon add up. Yes I agree that in low threaded scenarios query times will be quicker… but up at high QPS and high threads, things start to look a little different. The mysql query cache IS a hack… pure and simple… as was the query time-out implemented in 5.1.42 – interesting that it was called a fix… however, I am not gonna bash the actual CONCEPT of the query cache because it would be a very powerful feature if implemented correctly.

Re HandlerSocket, yes performance figures are fantastic…I have been doing plenty of benchmarking myself… but HandlerSocket is only suited to certain types of applications, and the majority of applications making heavy use of SQL joins would have to be totally re-written to benefit. I think that a performance boost to none “NoSQL” applications would serve as a nice stopgap solution while developers adapt to application design paradigms…

Steve

correction…. last line should be “stopgap solution while developers adapt to NEW application design paradigms”

Kristofer Pettersson

I agree with Davi. The QC doesn’t really belong in the server and it doesn’t really belong in the SQL world either. I would rather recommend a document oriented caching approach in the back end of your application tier and spend resources on fixing the parser and optimizer code in the database tier.

The ideas listed in the blog has also been discussed many times before without any enthusiastic response. Rather than continue to hoping that these change 1) will happen 2) will make a difference I would rather speculate in alternatives:
It shouldn’t be difficult to make the application tier cache db aware either if you allow for outgoing calls using triggers. For example Brian Akers function storage engine can be a nice “cache statistic table” which links directly to your application cache or (my current favorite) you can do change data capture and listen to the binary replication log (in row mode!) as table gets updated.

Stu davies

Personally the query cache can be very useful as long as you bear in mind it’s limitations. Having the query cache turned off by default for tables then using SQL_cache on a per query basis is where the gains are to be had. For tables that are often read only it can be a great boost.
Like any caching system it works best when you take into account application and table behaviour. Avoiding lots of inserts into the cache is also best.
Expanding the scope so less invalidates happen would be best certainly from my viewpoint as a developer as it would allow expansion of the types of queries and tables that can suitably be used.
Great to have a piece on it at any rate, keep up the good work!

Patrick Casey

I’m with Peter, I don’t see anything intrinsically wrong in adding another caching layer to the database, I just think it has to work. Problem with current query cache is that it has too many failure modes.

Davi Arnaut

Sure, I guess anyone would like to just have to flip a switch to speed up his or her application. But it just moves the complexity to the database, hindering the maintainability and improvement of the core functionalities of the database. Bad design and short-term thinking comes with a price, no wonder the query cache has so many failure modes.

Davi Arnaut

Also, a query cache can be implemented transparently without being in the database. Point is, no matter how clever, a query cache in the database creates more problems than it solves — dealing with locking, invalidation (which needs to cross internal layers of the database). Looking at the proposed changes, the more you advance on it, the more the query cache becomes a database in itself, more parsing, purging, compression, etc, duplicating functionality that is already in lower layers.

Davi Arnaut

Peter,

But these have a natural place in the database. In summary, all I’m saying is that it might be worth exploring solutions that do not involved the query cache being in the database. Don’t expect a good design out of this.

Tim Vaillancourt

Although this could be worked around, I second this suggestion:

“qcache-(do|ignore)-cache-table=”

Also, to expand on the INFORMATION_SCHEMA idea, I’d really benefit from a metric of which (or at least a count of) queries that were put into the cache, but changed before it was used. Essentially, which queries we should not have cached due to the rate of change.

I’m sure I’ll think of a few more ideas. I agree, 10 years is too long. We can do better! I really appreciate this being discussed.

Best regards,

Tim Vaillancourt

Justin Swanhart

You can use Flexviews to build materialized views, then easily query those with HandlerSocket. Incremental refresh takes a divide-and-conquer approach to maintaining the materialized views (giving you a persistent query cache with controlled invalidation) and HandlerSocket lets you avoid SQL parse overhead when getting the data out of the database so you don’t have to rewrite your whole app to use HandlerSocket.

ramon chavez

ffffffffffffffffffffffffffffkkkkkkkkkkkkkkkkkkkkkkkkkkkk

SteveT

Have been told its a dupe of

http://bugs.mysql.com/bug.php?id=13375

2005… typical…

SteveT

Done.

http://bugs.mysql.com/bug.php?id=61060

I bet there are a few people in here who would like this feature.

SteveT

Thanks Peter. Will do.

SteveT

A bit off topic… where do I put up feature requests to the Percona team? I cant be bothered with Oracle buglists cos they will just sit there and go mouldy.

I want to ask about mixed direction indexes.

Is anyone ever going to implement this?

i.e (x asc, y desc, z desc)

I see the asc and desc in the syntax for years (but ignored)…

Eric Bergen

I played around with adding a query cache ttl a few years ago but no one seemed interested in it. It was trivial to add the ttl to the cache key but changing the ttl meant a new set of cache entries so wasn’t a great approach. The patch is hanging around my patch directory if you want a head start on a similar approach.

http://ebergen.net/patches/query_cache_ttl.patch

Sean Hull

Hi Peter,

Great topic. I remember years ago coming from the Oracle world, with my first exposure to the MySQL query cache, I had assumed it was caching query execution plans. In Oracle there’s a distinction between hard parse (find out how to get the data – explain plan) and soft parse (check objects & syntax). So I was rather very surprised to find that MySQL was caching query result sets. Isn’t that what the InnoDB buffer cache was for?

Perhaps it is because the query cache was designed before InnoDB or perhaps because MySQL supports multiple storage engines, but to my thinking you’re just caching what is already on the warm end of your buffer cache linked list. Repeated queries would keep those blocks hot, so why cache it again? Why not just cache the execution plan to avoid future hard parsing and weight on the optimizer engine?

Your discussion above about “smart query matching” brings up the question of bind variables as well. As I recall there is some limitations in MySQL around cursors, and that the parse, execute and fetch phases are only simulated in PHP & Perl libraries, and that actually the execute and fetch phases happen together in MySQL. I guess that also explains how the limit clause helps in MySQL where it’s not even supported in Oracle.

All interesting topics for discussion, and will definitely be following this topic closely.

Regards,
Sean

Tim Vaillancourt

Although this could be worked around, I second this suggestion:

“qcache-(do|ignore)-cache-table=”

Also, to expand on the INFORMATION_SCHEMA idea, I’d really benefit from a metric of which (or at least a count of) queries that were put into the cache, but changed before it was used. Essentially, which queries we should not have cached due to the rate of change.

I’m sure I’ll think of a few more ideas. I agree, 10 years is too long. We can do better! I really appreciate this being discussed.

Best regards,

Tim Vaillancourt

Davi Arnaut

Peter,

But these have a natural place in the database. In summary, all I’m saying is that it might be worth exploring solutions that do not involved the query cache being in the database. Don’t expect a good design out of this.

Davi Arnaut

Also, a query cache can be implemented transparently without being in the database. Point is, no matter how clever, a query cache in the database creates more problems than it solves — dealing with locking, invalidation (which needs to cross internal layers of the database). Looking at the proposed changes, the more you advance on it, the more the query cache becomes a database in itself, more parsing, purging, compression, etc, duplicating functionality that is already in lower layers.

Davi Arnaut

Sure, I guess anyone would like to just have to flip a switch to speed up his or her application. But it just moves the complexity to the database, hindering the maintainability and improvement of the core functionalities of the database. Bad design and short-term thinking comes with a price, no wonder the query cache has so many failure modes.

Patrick Casey

I’m with Peter, I don’t see anything intrinsically wrong in adding another caching layer to the database, I just think it has to work. Problem with current query cache is that it has too many failure modes.

Stu davies

Personally the query cache can be very useful as long as you bear in mind it’s limitations. Having the query cache turned off by default for tables then using SQL_cache on a per query basis is where the gains are to be had. For tables that are often read only it can be a great boost.
Like any caching system it works best when you take into account application and table behaviour. Avoiding lots of inserts into the cache is also best.
Expanding the scope so less invalidates happen would be best certainly from my viewpoint as a developer as it would allow expansion of the types of queries and tables that can suitably be used.
Great to have a piece on it at any rate, keep up the good work!

Kristofer Pettersson

I agree with Davi. The QC doesn’t really belong in the server and it doesn’t really belong in the SQL world either. I would rather recommend a document oriented caching approach in the back end of your application tier and spend resources on fixing the parser and optimizer code in the database tier.

The ideas listed in the blog has also been discussed many times before without any enthusiastic response. Rather than continue to hoping that these change 1) will happen 2) will make a difference I would rather speculate in alternatives:
It shouldn’t be difficult to make the application tier cache db aware either if you allow for outgoing calls using triggers. For example Brian Akers function storage engine can be a nice “cache statistic table” which links directly to your application cache or (my current favorite) you can do change data capture and listen to the binary replication log (in row mode!) as table gets updated.

SteveT

correction…. last line should be “stopgap solution while developers adapt to NEW application design paradigms”

SteveT

Patrick. I think you will find that when you are executing thousands of PK lookups per second – flooding the cache, and you then perform an update on just ONE row of that table you will find that invalidation (plus the potential 50 ms timeout on the cache which anyway will cause your queries to go down through the parser/optimizer layers… http://dev.mysql.com/doc/refman/5.1/en/news-5-1-42.html) will soon add up. Yes I agree that in low threaded scenarios query times will be quicker… but up at high QPS and high threads, things start to look a little different. The mysql query cache IS a hack… pure and simple… as was the query time-out implemented in 5.1.42 – interesting that it was called a fix… however, I am not gonna bash the actual CONCEPT of the query cache because it would be a very powerful feature if implemented correctly.

Re HandlerSocket, yes performance figures are fantastic…I have been doing plenty of benchmarking myself… but HandlerSocket is only suited to certain types of applications, and the majority of applications making heavy use of SQL joins would have to be totally re-written to benefit. I think that a performance boost to none “NoSQL” applications would serve as a nice stopgap solution while developers adapt to application design paradigms…

Timmy Denike

Would be great to have a little attention paid to the query cache. I’ve always disabled it by default, but just recently started looking into tracking down dupe queries and moving them into memcached. From brief sample, I found that 26% of our SELECT qps are duplicate queries. (This from a codebase that’s already had substantial attention paid to caching – apparently not enough, however.)

Patrick Casey

Steve,

I *think* there is still a benefit even on super-cheap queries with the query cache, assuming you don’t have to invalidate a lot. I have to think that:

HASH SQL
Find packet in query cache
Write packet to wire

is faster than
PARSE SQL
Execute plan
Find root of btree (top block)
Go left, read next node (next block)
Go right, read next node (data block)
Construct response packet
Write packet to wire

Even if the whole table is memory mapped, I’d still *think* it’d be faster to just yank the packet out of the query cache than to go through the whole parse/execute sequence.

Come to think of it, isn’t that one of the reasons HandlerSockets is so fast? Doesn’t have to deal with parse/execute overhead?

SteveT

Another interesting approach would be to automatically SQL_NO_CACHE queries which return highly efficient execution plans from the optimizer.

I mean… if we are getting EXPLAINS like the following

explain select username from users where id = 1;

+—-+————-+————+——-+—————+———+———+——-+——+——-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+————+——-+—————+———+———+——-+——+——-+
| 1 | SIMPLE | users | const | PRIMARY | PRIMARY | 4 | const | 1 | |
+—-+————-+————+——-+—————+———+———+——-+——+——-+

Why should we bother caching that? It so fast anyway, and putting it in query cache is just swelling it for no reason, resulting in TONS of invalidation locks. Yes yes, I know you can add SQL_NO_CACHE, but going through an entire application looking for these kind of situations is a PITA. Let MySQL decide?