August 30, 2014

Why do we care about MySQL Performance at High Concurrency?

In many MySQL Benchmarks we can see performance compared with rather high level of concurrency. In some cases reaching 4,000 or more concurrent threads which hammer databases as quickly as possible resulting in hundreds or even thousands concurrently active queries.

The question is how common is it in production ? The typical metrics to use for concurrency defined as number of queries being actually processed is “Threads_Running” which you can easily see for your production system:

In my experience most of the systems out in the field will run with concurrency no more than low tens with normal load. Many have monitoring set up to kick in if number of threads running jumps over 50-100 and stays there for any significant amount of time.

So if people do not really run MySQL with high concurrency does it really matter how MySQL performs with high concurrency or is it just marketing gimmicks to promote new software versions ?

Performance at High Concurrency is important but for other reasons. Real world systems are different from benchmarks in terms they typically do not have fixed concurrency, instead they have to serve requests as they come in what can be close to “random arrivals” but in fact can be a lot more complicated in the practice. There is the chance both for burst of queries to come and hit database server at almost the same time (often when there is some stall happens on external system, such as memcached server) or the database server itself to experience the “micro stall” which can cause the buildup of the queries. Such build ups can happen very quickly.

Imagine for example some high volume Web system. It well might have 100 of Web servers each having Apache configured to run up to 100 apache children concurrently each of which might open a connection to MySQL server… this ends up with very high number of up to 10K connections and potentially running queries. Now imagine typically we have 30K queries/sec coming in from the Web level with some 1ms average latency which requires just around 30 queries to be ran at the same time. Imagine now database stalls just for 100ms – which you most likely will not even see with naked eye. This will results with expected 3000 queries to be backed up considering our inflow rate, which are quite likely to come from 1000 or more connections.

This is when performance at High Concurrency is the difference between life and death. Take a look at these graphs The Blue line of MySQL Community Server has just 1/4th of its Peak performance of concurrency of 1000 while Red Line of MySQL Enterprise Edition with Thread Pool plugin remains very close to the peak.

In practice this will mean one server will be able to process the backlog very quickly and recover after such stall the other server will be depressed and might be unable to serve the inflow of requests getting higher and higher under water. Some systems may never recover in this case until Web server is restarted or load removed from them other way, others will recover but taking a lot more time and with a lot more user impact – after all slower responses will cause users to submit less requests to the system reducing the load.

So yes. The performance at high concurrency matters as it helps systems in distress. But performance at low and medium concurrency matters too as this is what will define system performance during its normal operation.

P.S I think it would be interested for people to see how high concurrency people are running servers in the field. If you can run the command above for your loaded production servers (but which are not in distress) I think it would be very interesting.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. dalin says:

    One of our clients is a news website that occasionally breaks into the Alexa 1000. Nominal threads running is <10 but it jumps up to around 100 when new content is created (which causes several caches to be busted (stored in Memcache and Varnish)).

  2. Andy says:

    Peter,

    But with a thread pool like that in MariaDB even in distress time the number of concurrent threads would still be limited. So I guess if you’re using thread pool high concurrency performance is not as important?

  3. Alexey Kopytov says:

    @Andy,

    As http://bugs.mysql.com/bug.php?id=49169 explains, for InnoDB not only the number of active threads is important, but also the number of currently open transactions. So, for example, with a 16-threads pool one can still have say 1000 connections and up to 1000 open transactions. Which in turn may slow down the active connections due to the list scan.

    In fact a thread pool may even make things worse in some cases. Consider a case when a transaction grabs a resource (be it just the increased list of transactions, a metadata/row/user lock) and is then put to the queue by the thread pool before it has a chance to release that resource. Which means all active transactions that are scheduled to run before it may be affected.

  4. Wlad says:

    The “transaction grabs a resource such as lock” problem is solved by wait callbacks. Those threads that wait on locks are not considered active by the thread pool scheduler, and other queries are scheduled, and threads are started when required. It is true that in case of contention threadpool might be not as efficient, but in worst case it degenerates to thread-per-connection with slowly starting threads. It won’t be much worse than thread-per-connection still. On thing that is counterproductive with threadpool is Innodb long spin looping, because it does not let threadpool know that wait ín fact occured. As for http://bugs.mysql.com/bug.php?id=49169, I hope this can be solved on appropriate level, i.e in Innodb :)

  5. Alexey Kopytov says:

    Wlad, so what happens when one transaction does SELECT * FROM t FOR UPDATE and before it commits other connections are scheduled to run which want to do INSERT INTO t? Naturally they have to wait until the first transaction is scheduled again. Which, depending on the workload and what other connections want to do may be thousands of queries away in the queue. This is where the threadpool would actually increase latency, right?

    I agree that wait callbacks are required, because if they were not present, the threadpool would basically be useless. I’m just saying the threadpool may have and does have its own side effects.

  6. Wlad says:

    In your scenario, INSERT INTO would have to wait,callback will be executed, and threadpool will possibly wake or start a new thread to collect network events or execute already queued events. this INSERT thread will have to wait in thread-per-connection , or in threadpool. the design tries to maintain CPU loaded, so the pool will run “runnable” queries until they finish of become “waiting”, and then select a runnable again. In some situations it can degenerate to a slightly slower “thread-per-connection” indeed. Like say, 100 “select sleep(1)” queries will create 100 threads.

  7. Alexey Kopytov says:

    Wlad, I understand the design of the MariaDB threadpool implementation. I’ve spent a non-trivial time looking at its code recently.

    What I am saying is that the number of concurrent threads is not the only metric that defines latency. There may be other cases, as shown in my example, where scheduling plays an important role. Currently, the threadpool implements a simple FIFO scheduling policy. Which is rather easy to abuse.

  8. Wlad says:

    The FIFO policy is simple, and also responsible for a good response time distribution (95%-99% values) in sysbench with threadpool :) Prioritizing is not impossible with some effort. However I’d like to avoid MySQL Enterprise Threadpool style prioritizing as described in Oracle blogs, and I question a statement that number of concurrent transactions is eviil and needs to be minimized (transactions can be independent after all) . Mikael spent lot of time describing non-trivial amount of logic to workaround http://bugs.mysql.com/bug.php?id=49169 (it is one of the big features on that implementation), and that strikes me somewhat as dirty hack. Why not solve problems at the place where they occure, i.e in Innodb in this case.

  9. MySQL Enterprise Edition, MariaDB, Percona Server all have support for Thread Pool now. Yet I think it is too early to claim all “Performance at High Concurrency” problems are solved as there is a limited practical use for these features so we do not know how well they work in practice. Majority of high load systems I worked with currently do not use these feature.

    Also note high concurrency has other implications beyond number of threads serviced by operating system. Held locks, working with more active transactions etc all can have performance impact.

Speak Your Mind

*