Sometimes on very busy MySQL server you will see sporadic connection timeouts, such as Can’t connect to MySQL server on ‘mydb’ (110). If you have connects timed in your
application you will see some successful connections taking well over the second. The problem may start very slow and be almost invisible for long time, for example having one out of million
of connection attempts to time out… when as the load growths it may become a lot more frequent.

If you time the connect you will often see connection times are being close to 3 and 9 seconds. These are “magic” numbers which I remember from years ago, which correspond to SYN packet being dropped during connection attempt and being resent. 3 seconds corresponds to 1 packet being dropped and 9 seconds correspond to two. If this is happening it is possible you have network issues or more likely you have listen queue overflow. You can check if it is the case by running netstat -s and finding something like:

This means some SYN packets have to be dropped because kernel buffer of connection requests on LISTEN socket is overflow – MySQL is not accepting connections as quickly as it needs.
There are 2 tuning places you need to consider if this is what is happening.

First – Linux kernel net.ipv4.tcp_max_syn_backlog This is size of kernel buffer for all sockets.
Default I have on my kernel is 2048 though it might vary for different versions, you might need to increase it to 8192 or so if you have intense connection. I’ll explain the math below.

Second – is MySQL parameter back_log which has default value of just 50. You may want to set it to 1000 or even higher. You may also need to increase
net.core.somaxconn kernel setting which contains the maximum depth of listen queue allowed. The kernel I’m running has it set to just 128 which would be too low for many
conditions.

Now lets look more into the problem and do some Math. First lets look into how MySQL accepts connection. There is single main thread which is accepting connections coming to LISTEN
sockets. Once there is connection coming it it needs to create a new socket for incoming connection and create a new thread or take one out of the thread cache. From this point on MySQL processes network communication in multiple threads and can benefit from multiple cores but this work done by main thread does not.

Usually main thread is able to accept connections pretty quickly, however if it stalls waiting on mutex or doing any other work such as launching new thread takes a lot of time you can have the listen queue to overflow. Lets look at the database which accepts 1000 of connects/sec in average. This is a high number but you can see ones even higher. In most cases because of “random arrivals” nature of traffic you will see some seconds where as much as 3000 connections come in. Under such conditions the default back_log of 50 is enough just for 17 milliseconds, and if main thread stalls somewhere longer than, some SYN packets may be lost.

I would suggest sizing your tcp_max_syn_backlog and back_log value to be enough for at least 2 seconds worth of connection attempts. For example If I have 100 connects/sec which means I should plan for 300 connections using 3x for “peak multiplier”. This means they should be set to at least 600.

Setting it to cover much more than 2 seconds does not make much sense because if client does not get a response within 3 seconds it will consider SYN packet is lost and will send the new one anyway.

There is something else. If you’re creating 1000 of connections a second to MySQL Server you might be pushing your luck and at very least you’re using a lot of resources setting up and tearing down connections. Consider using persistent connections or connection pool at least for applications which are responsible for most of connections being created.

4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Alessandra

Thank you very interesting article. One of my servers is plagued by TIME_WAIT connections. And the netstat -s output values for those two lines are well over the numbers you show 3-10 higher.

Vojtech Kurka

There is one more possible reason for errors like “Can’t connect to MySQL server”. When there is a very busy client (i.e. powerful Apache instance), you can run out of free local TCP ports, because the closed connections are waiting in the TIME_WAIT state for 120 seconds.

Sander

Extremely helpful!

moobo

HI,
I am seriously facing connection abort very frequently in mysql server
which starts increasing as I restart mysql service.
Pl look into ini file and guide me.
Many Thanks!!!

version: MySQL Server 5.0
INI setting :

[client]
port=3306

[mysql]
default-character-set=latin1

[mysqld]
port=3306
log-bin=mysql-bin
server-id=2
replicate-wild-do-table=schema1.%
replicate-ignore-table=schema1.table1
basedir=”C:/Program Files/MySQL/MySQL Server 5.0/”
datadir=”D:/work/MySQL Datafiles/”
default-character-set=latin1
default-storage-engine=INNODB
sql-mode=”STRICT_TRANS_TABLES,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION”
max_connections=10000
query_cache_size=128M
table_cache=256
tmp_table_size=103M
thread_cache_size=15
log-slow-queries = “D:/work/MySQL Datafiles/wrl-server-knp-slow.log”
long_query_time = 2
wait_timeout = 1000
max_allowed_packet = 16M
back_log = 200
myisam_max_sort_file_size=100G
myisam_max_extra_sort_file_size=100G
myisam_sort_buffer_size=30M
key_buffer_size=129M
read_buffer_size=64K
read_rnd_buffer_size=256K
sort_buffer_size=256K

#*** INNODB Specific options ***
innodb_data_home_dir=”D:/work/MySQL Datafiles/”
innodb_additional_mem_pool_size=7M
innodb_flush_log_at_trx_commit=1
innodb_log_buffer_size=3499k
innodb_buffer_pool_size=339M
innodb_log_file_size=50M
innodb_thread_concurrency=10