August 28, 2014

Percona Server: Improve Scalability with Percona Thread Pool

By default, for every client connection the MySQL server spawns a separate thread which will process all statements for this connection. This is the ‘one-thread-per-connection’ model. It’s simple and efficient until some number of connections N is reached. After this point performance of the MySQL server will degrade, mostly due to various contentions caused by N threads that are trying to access shared resources: either system ones like CPU, IO, memory or MySQL specific: structures/locks/etc. To keep the system stable and avoid degradation in the performance we need to limit the number of active threads, and at the same time we do not want to limit number of the client connections. The ‘Thread Pool’ model helps us to achieve that. It allows mapping N client connections to M number of active threads (actually performing work) while demonstrate a smooth and stable throughput for the MySQL server.

There are several implementations of the Thread Pool model:
- Commercial: Oracle/MySQL provides thread_pool plugin as part of the Enterprise subscription
- Open Source: The MariaDB thread_pool developed by Vladislav Vaintroub

In Percona Server we have included the latter, yet we further enhanced and improved it.

To demonstrate how thread pool may help to improve scalability, we run sysbench/OLTP_RW workload up to 16,384 threads with the latest MySQL server, Percona Server and Percona Server with thread_pool setup for IO and CPU bound load on a Dell R720 server with 16Cores/32vCPU.

The current thread pool implementation of Percona server is built into the server, unlike Oracle’s commercial version which is implemented as a plugin. In order to enable Thread Pool with Percona Server, you simply need to specify ‘thread_handling=pool-of-threads’ in the my.cnf file (before startup/restart) and also adjust the number of thread_pool groups with the ‘thread_pool_size’ variable. You can do the latter after a server start. In our runs with thread_pool we used ‘thread_pool_size=36′.

IO bound: sysbench dataset 32 tables/12M rows each (~100GB), InnoDB buffer pool=25GB
thread_pool.p1.io_bound

In-memory/CPU bound: sysbench dataset 32 tables/12M rows each (~100GB), InnoDB buffer pool=100GB
thread_pool.p1.cpu_bound

As you can see in both scenarios above, after 1024 threads, the standalone server is not really capable with keeping throughput at the same level. However, with thread_pool enabled the throughput is quite stable and smooth up to 16384 client connections.

Conclusion: if you regularly go over > 512/1024 connections, it is definitely worth trying Percona’s thread pool implementation to protect your server from serious performance degradations due to server overload.

UPDATE: updated chart for IO bound scenario that includes results for regular MySQL server, MySQL server with innodb_thread_concurrency=36, Percona Server, Percona Server with innodb_thread_concurrency=36 and Percona Server with thread_pool_size=36:

thread_pool.p1.io_bound.updated.v2

Comments

  1. Go Percona!

  2. Andy says:

    Can you also benchmark Percona threadpool against MariaDB threadpool? It’d be interesting to see how they compare to each other.

  3. Andy, we did, and you’ll be able to read up on this (and be pleasantly surprised) in an upcoming blog post next week!

  4. Erich Kuersten says:

    Whoever is responsible for SEO optimization for this blog, you guys should relax a bit. It’s been optimized to the point where it’s hard to read. “percona server: improve percona scalability with percona threadpool go percona!”. I guess the article was supposed to explain when someone would want to use threadpool before the SEO guys touched it?

  5. @Erich; constructive feedback is always welcome.

  6. It is a very good news!
    Are there any criteria for the setting thread_pool_size?

  7. Tom Kaminski says:

    In the graphs, the x axis is labelled “Threads”. I think it should be “Client Connections”

  8. James Day says:

    What was innodb_thread_concurrency set to? In tests at high concurrency it’s easy to produce poor results by misconfiguring this with the unlimited default instead of an appropriate value in the 32-128 range.

  9. sh says:

    Also what are the variables for thread pool for MySQL set to like the priority and stalling. do you have any stats of the threads so we can see what was happening on the server to see the sharp dip in performance.

  10. James Day says:

    Another way to misconfigure the server to make a thread pool look good is to set table_open_cache too low, so it won’t have enough entries available for all of the concurrently open tables. What was table_open_cache set to? What were the values for Opened_tables and Opened_table_definitions at the end of the test? Misconfiguring this will typically cause a performance drop around the point where the number of connections exceeds this setting’s value. The default is 2000 in 5.6, 400 in 5.1, 64 in 5.1.

    In 5.6 and later you’d also be expected to set table_open_cache_instances to 256 to reduce contention, particularly if it happens that some tables happen to have the same hash value.

    As with innodb_thread_concurrency, anyone looking at thread pool performance reports should ask for this information before attempting to evaluate whether the thread pool is useful and how useful it is.

    James Day, MySQL Senior Performance Engineer, Oracle

  11. Mikhail,

    >It is a very good news!
    >Are there any criteria for the setting thread_pool_size?

    Usually most efficient value of thread_pool_size is in range between NCPU and NCPU+NCPU/2. NCPU = Number of CPU’s

  12. Tom,

    > In the graphs, the x axis is labelled “Threads”. I think it should be “Client Connections”

    Yes, you are right. That’s number of client connections.

  13. James,

    1) re: table_open_cache/table_open_cache_instances
    The values of table_open_cache and table_open_cache_instances are high enough in my tests, and I am confident that there is no contention due to the number of open tables.

    2) re: innodb_thread_concurrency
    While I agree that for completeness it would be interesting to have results with innodb_thread_concurrency (not only in my case but in the official benchmarks from Oracle as well), I would note that fundamentally innodb_thread_concurrency is not capable of resolving scalability problems caused by a large amounts of client connections.

    The innodb_thread_concurrency setting indeed helps to reduce concurrency inside InnoDB by reducing contention of the ‘hot’ resources: structures/locks/etc. However, at some point the amount of open connections/transactions becomes so large that even reduced concurrency does not help. We blogged about one example of this limitation here and here.

    Besides this, usage of innodb_thread_concurrency will lead to reducing parallelism, because the server in this case is not aware of the waits that happen during execution of the allowed connections, and also OS scheduling in this case is pretty much “blind”.

    Thread pool is a more advanced approach that from one side limits concurrency, and from the other side helps to utilize all “spare” resources inside the server.

    P.S: In order to cover the innodb_thread_concurrency case, I’ve rerun the test with that option set for an IO bound case, and hereby provided an additional/updated chart.

  14. Sh,

    >Also what are the variables for thread pool for MySQL set to like the priority and stalling. do you have any stats of the threads so we can see what was happening on the server to see the sharp dip in performance.

    For runs with Percona server all settings are default except thread_pool_size=36 and thread_pool_high_prio_mode that either statements or transactions.

    re: stats – if you mean what’s wrong with regular MySQL server – performance degradation happens mostly due to row level locks + some other aspects like length of transaction list, it scanning/etc.

  15. gpfeng says:

    I port the threadpool feature to our mysql branch, which is based on Percona server 5.5.18, but the result is not the same as yours, I suspect that I missed something.

    The problem I encountered is that: when the concurrency increase, the response time (measured by tcprstat) of the queries also increase, which result in TPS going down.

    > 128 client cocurrency:(QPS: 44k, response time: 1ms)
    ——– -QPS- -TPS——-threads—— ——–tcprstat(us)——–
    time | ins upd del sel iud| run con cre cac| count avg 95-avg 99-avg|
    18:02:03| 5686 7562 0 44284 13248| 24 130 0 0| 55901 1183 939 1103|
    18:02:04| 5278 7139 0 41960 12417| 22 130 0 0| 54505 1284 1023 1202|
    18:02:06| 5533 7329 0 43122 12862| 23 130 1 0| 51253 1302 1029 1217|
    18:02:07| 5593 7462 0 43667 13055| 23 130 0 0| 56731 1150 933 1086|
    18:02:08| 5698 7582 0 44710 13280| 24 130 0 0| 56359 1192 950 1120|

    > 512 concurrency:(QPS: 37k, response time: 6ms)
    ——– -QPS- -TPS——-threads—— ——–tcprstat(us)——–
    time | ins upd del sel iud| run con cre cac| count avg 95-avg 99-avg|
    18:03:37| 4916 6433 0 37057 11349| 24 514 0 0| 47210 7202 6410 6960|
    18:03:38| 4697 6388 0 37459 11085| 25 514 0 0| 52725 6836 6130 6620|
    18:03:40| 4678 6239 0 38117 10917| 25 514 0 0| 44760 7339 6500 7088|
    18:03:41| 4991 6721 0 38151 11712| 25 514 0 0| 47201 7121 6366 6896|
    18:03:42| 4839 6490 0 38517 11329| 23 514 0 0| 58659 6461 5820 6269|

    > 2048 concurrency: (QPS: 25k, response time: 50ms)
    ——– -QPS- -TPS——-threads—— ——–tcprstat(us)——–
    time | ins upd del sel iud| run con cre cac| count avg 95-avg 99-avg|
    18:06:31| 4040 7519 0 26081 11559| 31 2050 0 0| 30344 51700 45894 49976|
    18:06:32| 2685 3687 0 22367 6372| 28 2050 0 0| 29744 52785 47175 51223|
    18:06:33| 2854 3351 0 22751 6205| 29 2050 0 0| 30421 50823 45920 49460|
    18:06:35| 3270 4584 0 25197 7854| 32 2050 0 0| 31204 51514 45991 49752|
    18:06:36| 3353 4278 0 25924 7631| 28 2050 0 0| 30407 50894 45263 49047|

    So, can you tell me how is the response time going when client concurrency doubles in you test?

Speak Your Mind

*