May 21, 2012

Configuring MySQL For High Number of Connections per Second

One thing I noticed during the observation was that there were roughly 2,000 new connections to MySQL per second during peak times. This is a high number by any account.

When a new connection to MySQL is made, it can go into the back_log, which effectively serves as a queue for new connections on operating system size to allow MySQL to handle spikes. Although MySQL connections are quite fast compared to many other databases it can become the bottleneck. For a more in depth discussion of what goes on during a connection and ideas for improvement, see this post by Domas]),

With MySQL 5.5 default back_log of 50 and 2000 connections created per second it will take just 0.025 seconds to fill the queue completely if requests are not served, which means even very short stall in main thread which is accepting connections will cause for some connection attempts to be refused.

What back_log variable specifies is second parameter to listen() operating system call. The maximum value on Linux is directed by tcp_max_syn_backlog sysctl parameter unless syncookies are enabled.

The simple solution in this case was to increase the value of back_log to one that could handle longer bursts Increasing back_log to 1000 in this case would give us enough of a queue to handle up to 500ms stall which is good enough for most cases, so it’s important to understand your application workload and tune specifically to your needs.

This got me thinking of what the disadvantages of setting this value equal to the OS limit (/proc/sys/net/ipv4/tcp_max_syn_backlog) and the only limitation would be that you could potentially have a large number of connections waiting on connection instead of failing quickly and potentially connecting to different server, but that can be fixed by setting the client connect timeout setting (which not a lot of people do).

Another important setting if you’re working with many connections per second is thread_cache which impacts cost of connections dramatically. You want it to be set to the value so no more than couple of threads are created every second during normal operation.

I would note if you’re having more than 1000 connections/sec you’re getting pretty close to what could be the limit and you should consider techniques to reduce number of connections. Namely persistent connections and connection pools might be good solution for many applications.

About Ryan Lowe

Ryan is a Principal Consultant and team manager at Percona. He has experience with many database technologies in industries such as health care, telecommunications, and social networking.

Comments

  1. Olivier Doucet says:

    Hi,

    I encountered the same problem a few weeks ago. Increasing back_log solved the problem. Unfortunately back_log variable is not dynamic and requires to stop mysqld.
    Also take a look at the following sysctl variables : net.core.netdev_max_backlog ; net.core.somaxconn

  2. Thanks Olivier !

  3. Vojtech Kurka says:

    Ryan, have you ever witnessed a case, when all connections to mysqld are made from 1 IP address?
    I mean the case, when you have a powerful webserver instance, making more than 1000 connections/sec, and it runs out of emphemeral ports (due to tcp WAIT_TIMEOUT handling). I know setting net.ipv4.tcp_tw_recycle = 1 solves the problem, but can cause other problems for other TCP connections on the same machine and is far less tested than the default setting. When using net.ipv4.tcp_tw_reuse = 1, we get a lot of TCP SYN retransmissions (with painful 3 sec timeout), the kernel probably cannot handle efficiently a large table of tcp connections.

    Thank you, Vojtech

  4. Vojtech,

    It is indeed common problem. Though when you have say just small number of web servers it makes sense to stick to persistent connections with MySQL as you normally can keep all of them open. It is when you get to hundreds of web servers you might not be able to have max_connections high enough.

    Other trick we have been using is to map more than one IP address to your MySQL server. if you connect by server name you can even make it without changing server configuration.

    Another important thing is to ensure ip_local_port_range is large enough. Something like net.ipv4.ip_local_port_range = 1024 65535 can be good idea.

    Other non standard and unsafe options which can work is to set tcp_fin_timeout to some small value, something like 10 seconds.

    Really if someone wants to put the good Linux tcp stack configuration parameters which worked well for them for large number of connections per second and caused no other problems it would be great to see.

  5. Vojtech Kurka says:

    Peter, thank you for the reply!

    We have already set the emphemeral port range to the max setting. So I assume we definitely need to start using persistent connections, I hope the php/mysqli implementation is stable enough for production.

  6. mysqlnd (which can be used in mysqli) have been around for a while. Many people using persistent connections quite well especially in simple applications where each user interaction will be only one/few queries.

    Now on back_log not being dynamic – as I mentioned it is parameter for listen call for main socket which server creates during start. I wonder if there is any way it can be adjusted after socket is created as re-creating socket would be disruptive

  7. Olivier Doucet says:

    Peter,
    There is a difference between recreating socket (a few milliseconds of downtime), and having to restart MySQL (can take several minutes to stop depending on configuration, and some time after restart to get it working smoothly).

  8. Will says:

    It looks as if a paragraph is missing from the start of this post.

    This change actually made a huge difference for us.

  9. Will says:

    It looks like a paragraph is missing from the start of this.

    Changing this setting to 500 greatly dropped connection errors we were having. Would increasing it further help, or is there really any negatives to increasing it?

  10. Will,

    You should be able to go higher to 2000 or so. The main negative with long queue is the request can spend long time in the queue and at certain point it just becomes better to time out.

    As Ryan mentions connection timeout makes it safe to go even with very high queue times. It is also a good practice to measure connection times.

    BTW what error message did you get for connection errors ? I’m interested in Error code.

  11. Will says:

    Peter,

    Looks like my comment got posted twice.

    This post actually appears to be written about an issue we were having. The error our app was receiving was simply “Can’t connect to mysql server”. We recently decreased our app’s mysql timeout, which is when this started to become more visible. Prior to that, we would have connections sit waiting for a response from the server until PHP execution timed out. There was no error or response sent by the server, only the PHP timeout.

  12. Will, Yes, most likely.
    When you have Can’t connect error message there should be some error code:

    [root@localhost ~]# mysql -h 10.1.1.1
    ERROR 2003 (HY000): Can’t connect to MySQL server on ’10.1.1.1′ (113)
    [root@localhost ~]# perror 113
    OS error code 113: No route to host

    In this case we have No route to host error. In case listen socket queue overflow you should be getting different error code instead of 113

  13. Olivier Doucet says:

    Hi,

    If I remember right, error when backlog is full is 4 (EINTR).

  14. Will says:

    Peter,

    I will look through our exception logs this evening or tomorrow to get that info.

  15. Will says:

    I just double checked, and yes, it is error 4.

    “OS error code 4: Interrupted system call”

  16. Thanks,

    This is interesting. I thought Interrupted System Call would happen on connect timeout (which is often triggered with sig_alarm) and failing to connect due to full backlog happening with other error code.

  17. Thanks for the clear explanation Ryan!

    After encountering connection issues near Christmas I came across a really old post from 2008 describing exactly our problem and pointing out the back_log setting needs to be increased (after they consulted Percona). After that I’ve been searching for a clear explanation what the back_log setting exactly does but I always ended up with the two lines supplied by the MySQL documentation.

    Obviously the real cause behind our problem is that we do get short bursts of many processes spawning on the webservers and not using persistent connections. However given our webserver pool size, the short ttl of the processes it would be very dangerous to actually go for persistent connections. We’ll just have to solve this issue in our new architecture and got to hang on till then. ;)

    It is also good to know that we can go as far as 2000. On one cluster we already set it to 200 and we still see issues popping up from time to time.

Speak Your Mind

*