Be careful with my findings, I appear to have compile in debug mode, I am redoing the benchmarks. Updated version here.
I recently had to work with many customers having large number of connections opened in MySQL and although I told them this was not optimal, I had no solid arguments to present. More than once, I heard: it is not a problem, those connections are Sleeping! In order to remedy to this situation, I decided to run a series of DBT2 benchmarks while the number of idle connections was varied from 0 to 1500. I was expecting an influence, because those idle threads are presents in internal MySQL list objects that need to be scanned and also the socket SELECT call needs to build a FD bitmap after every network activity and the size of the bitmap is proportional to the number of active threads. What I found is not a small impact at all.
For my daily work, I use 2 computers linked with a gigabit switch so I decided to use them for benchmarking. On the desktop, I installed Percona-Server-11.2 which I configured with the following Innodb settings:
1 2 3 4 5 6 7 | innodb_buffer_pool_size=1G innodb_log_file_size=64M innodb_log_buffer_size=8M innodb_flush_method=O_DIRECT innodb_flush_log_at_trx_commit=0 max_connections = 10000 open_files_limit = 32768 |
The desktop is running Ubuntu 10.04, has 8GB of RAM and a Core i5 CPU (dual core with HT enabled). Since my goal is to test concurrency, I decided to use only one warehouse for DBT2 which easily fits inside the buffer pool. Then, with innodb_flush_log_at_trx_commit=0, I ensured I was not benchmarking my disk.
On the laptop, I ran the following script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | #!/bin/bash for i in `seq 0 4` do numconn=$((i*20)) echo "Doing $numconn idle connections" php ./make_conn.php $numconn & cat dbt2.sql | mysql -h 10.2.2.129 -u root dbt2 sleep 120 ./run_mysql.sh -h 10.2.2.129 -u root -c 4 -t 600 -w 1 -s 1 -m 100 -e yes killall php sleep 2 done for i in `seq 1 15` do numconn=$((i*100)) echo "Doing $numconn idle connections" php ./make_conn.php $numconn & cat dbt2.sql | mysql -h 10.2.2.129 -u root dbt2 sleep 120 ./run_mysql.sh -h 10.2.2.129 -u root -c 4 -t 600 -w 1 -s 1 -m 100 -e yes killall php sleep 2 done |
Where 10.2.2.129 is the IP of the Desktop. This script uses a PHP script, make_conn.php, to generate the idle connections. This script is the following:
1 2 3 4 5 6 7 | <?php $connarray = array(); for ($i = 0;$i < $argv[1];$i++) { $connarray[$i] = mysql_connect('10.2.2.129', 'root','',TRUE); } sleep(1000); ?> |
Before each benchmark, the database is reinitialized to ensure consistent benchmarks. During all the benchmarks, the CPU load on the laptop was never above 10%. I also hacked a bit the “run_mysql.sh” script to allow more processing threads (100) instead of the max of 20. This is required for another series of benchmarks I’ll present soon, this time looking at the number active connections. For these benchmarks, I use pools of 4 connections, basically to match the number of available computing threads the Core i5 allows. I know by experience that this is about the max DBT2 NOTPM for small number of threads.
So, here are the results:
As you can see, the performance drop is shocking. At 1500 idle connections, the performance is 1.3% of the one with 0 idle. Even for as few as 20 idle connections, the drop is already of 40%. I also verified the laptop (running dbt2) was not slowed down by handling all those connections. I tried generating the idle connections from a third box and the results were the exact same. The conclusion is quite straightforward, idle connections hurts performance a lot!!! I am curious as to where MySQL is spending its time, I am planning to use profiling tools to identify the culprit but I had no time yet to do that. I am afraid the same scalability issues affects actives connections although these are also affected by concurrency issues. With new servers having 24+ cores, this phenomenon is seriously affecting performances.
You can limit the number of maximum concurrent connections to mysql server and proactively kill idle connections with HAProxy, check this :
http://flavio.tordini.org/a-more-stable-mysql-with-haproxy/comment-page-1
It would be interesting in making this test on MariaDB with threadpoll based connection handling (event based)…
Yves,
Great find ! It was for long well known the active connections significantly limit performance but it is good to have a data idle connections are not free ether.
thread-pool is one approach to this problem though I believe even with 1 thread per connection situation can be improved a lot by using epool or similar approaches which scale will with many inactive connections.
@Hervé I’ll look at the HAProxy post you suggest but just a sane use of connection pooling or good coding would help.
@Raine Indeed, with libEvent it might be much better.
@Yves Trudeau Really difficult to educate users in shared hosting..
this is an outrageous finding. having this information in your consulting toolbox surely is a value-add. looking forward to the profiler analysis.
This is great info. Thanks.
Would you ever port MariaDB’s thread pool to Percona Server?
For those who are wondering about MariaDB’s pool-of-threads implementation, that is not an option for Percona Server currently. It creates more problems than it solves. Mark Callaghan has written some good material on its shortcomings. However, some solution is surely needed — this was one of the things I mentioned in http://www.mysqlperformanceblog.com/2010/10/27/mysql-limitations-part-4-one-thread-per-connection/ (thanks Yves for providing the proof I didn’t have at that time either!)
I just reran a test with Oprofile on and here is the culprit:
2014925 43.9673 /opt/local/Percona-Server-11.2/libexec/mysqld /opt/local/Percona-Server-11.2/libexec/mysqld _checkchunk
417696 9.1145 /no-vmlinux /no-vmlinux /no-vmlinux
189604 4.1373 /opt/local/Percona-Server-11.2/libexec/mysqld /opt/local/Percona-Server-11.2/libexec/mysqld _sanity
43% of the time in _checkchunk which is a function in mysys/safemalloc.c, I am pretty surprized… I don’t see any loop in that function that could cause a longer execution. _sanity is also a function in the same file.
Yves, it looks like you’re benchmarking binaries compiled for debug (even with debug-full, as there is safemalloc inside).
Hi Yves,
When I tested similar benchmarks a few months ago(read only in-memory pk lookup queries from 30 persistent clients, and 0-2000 sleeping clients. I used two 8-core Nehalem E5540 boxes, Gigabit Ethernet, CentOS5.5, MySQL5.1.latest), I didn’t find noticeable performance difference. What if you test DBT-2 on modern servers, not on cheap desktops?
Yves, sorry, but your benchmark is totally bogus 🙁
As Vladislav Vaintroub said, you are testing a debug-compiled server with safemalloc enabled. This totally kills performance, as it uses a really expensive memory sanity check that has quadratic cost in the number of active malloc() allocations.
(Note that a “debug build” in MySQL is much more than just
gcc -g
, it enables lots of extra testing code that is very costly for performance, safemalloc is only one of them (but a particularly expensive one).You will have to re-check with a non-debug build.
Note btw. that there is no SELECT overhead for idle connections (with default of no threadpool). The select() only has the two server listening sockets (tcp and unix socket), each idle connection has its own waiting call in its thread. I am also curious if you know of any particular lists of idle connections that MySQL needs to traverse?
@Kristian Don’t worry, I don’t take it personally, I’ll recompile without debug and modify the post. As I was pretty shocked by the result, I am happy to find out it is something like debug enabled.
@Yves Trudeau, can comfirm from a client benchmark that reducing from 500 to 20 minimum pool size gave us improvement in avg response time on 24 cores box , killing the connections on the server side does not help because socket will still exist in the system. fixing the client code is just needed. Funnel on launchpad an Hyves contribution to mysql-proxy can also do the job with libevent.
@Kristian, you mentioned there are only 2 listening server sockets, that’s true for the “accept” socket call but the “select” has to consider a bitmap of all the established connections to detect activity. It is my understanding that the maintenance of this bitmap one of the main reason driving people toward epoll and libevent.
@Yves Trudeau, connections are not multiplexed inside normal mysql, i.e there is no select() or poll() that would be waiting for activity on some connections, instead server thread is stuck in recv() call, as long as client is idle.
Thread-per-connection wastes system resources (at least a thread plus virtual memory for threads stack). And if there are many very many active connections, number of threads == number of clients has an impact on scheduler, locks have higher chances to collide, leading to context switching.´
Traditionally, in Unix scalable server architectures were build around multiplexing with poll() and select() and worker threads. This model is better than thread-per-connection, but neither poll() nor select() did scale well. This is the problem you’re talking about in the last comment, and this is what epoll, kevent, solaris event ports or /dev/poll, Windows completion ports or libevent are aimed to fix .
But normal MySQL does not have problems with select()/poll() scalability, because MySQL server does not use multiplexing at all.
Really good work sir.
I knew network connections where an issue. I did some research on network timeout settings and published it on my blog (http://www.mysqlfanboy.com/2010/05/mysql-network-connections/). I’m thinking of springing off your work and investigating them again. Most of my fellow DBA told me MySQL had NO issue with network connections and I toned back my blog post accordingly.
Here is what I was looking at:
max_connections = ? # number of simultaneous client connections allowed
max_user_connections = ? # number of connections a user can make, 0 = Unlimited
wait_timeout = ? # seconds to waits for activity on non interactive connection
interactive_timeout = ? # seconds to waits for activity on interactive connection
connect_timeout = ? # on connect, seconds to waits for a handshake complete packet
max_connect_errors = ? # IO stops after this number of bad connections one good connect reset
max_allowed_packet = ? # How big is your BLOB?
It will be interesting to see how these numbers change the performance.
Hmmm. I’m building a quite complicated heavily used XML-RPC server which uses MySQL as its database. I noticed that MySQL connection establishment on each request limits the performance, so I am using a pool of already-established MySQL connections, also “unused” ones (by one thread) is reused later, so there are idle connections. Now I am thinking, what I can do. Building new MySQL connection on each XML-RPC request is expensive. Having too many idle connections can be expensive too, as far as I can see now. So what can be the ideal solution, do I need to make benchmarks with playing the pool size and find the optimal solution? It can be OK as far as the nature of the load on my XML-RPC server does not change for whatever reason. Hmmm …
I have same issue with idle connection on mysql.
But my case, i’m using php to connect to mysql and i’m using mysql_pconnect().
with p as persisted connection. when the problem hit i simply replace function mysql_pconnect() with mysql_connect()
hi,
i m using mysql 5.1 on windows platform,
and i want to know how many slave is possible of master in replication process in mysql.
what is maximum number to generate slaves in mysql replication
where is make_conn.php script