Problem with kernel_mutex in MySQL 5.1 and MySQL 5.5 is known: Bug report. In fact in MySQL 5.6 there are some fixes that suppose to provide a solution, but MySQL 5.6 yet has long way ahead before production, and it is also not clear if the problem is really fixed.

Meantime the problem with kernel_mutex is raising, I had three customer problems related to performance drops during the last month.

So what can be done there ? Let’s run some benchmarks.

But some theory before benchmarks. InnoDB uses kernel_mutex when it starts/stop transactions, and when InnoDB starts the transaction, usually there is loop through ALL active transactions, and this loop is inside kernel_mutex. That is to see kernel_mutex in action, we need many concurrent but short transactions.

For this we will take sysbench running only simple select PK queries against 48 tables, 5,000,000 rows each.

Hardware is Cisco UCS C250 server. The workload is read-only and fully in memory.

There is the result for different threads (against Percona Server 5.5.17):

ThreadsThroughput, q/s
111178.34
227741.06
453364.52
892546.73
16144619.58
32164884.03
64154235.73
128147456.33
25668369.02
51240509.67
102422166.94

The peak throughput is 164884 q/s for 32 threads, and it declines to 68369 q/s for 256 threads, that is 2.4x times drop.

The reason, as you may guess, is kernel_mutex. How you can see it ? It is easy. In SHOW ENGINE INNODB STATUSG you will see a lot of lines like:

This problem is actually quite serious. In the real workloads I saw this happening with less than 256 threads, and not all production systems can tolerate 2x times drop of throughput in the peak times.

So what can be done there ?

In the first try, let’s recall that kernel_mutex (and all InnoDB mutexes) has complex handling with spin loops, and there are two variables that affects mutex loops: innodb_sync_spin_loops and innodb_spin_wait_delay. I actually think that tuning system with these variable is something closer to dance with drum than to scientific method, but nothing else helps, why not to try.

There we vary innodb_sync_spin_loops from 0 to 100 (default is 30):

ThreadsThroughputNA
111178.34
227741.06
453364.52
892546.73
16144619.58
32164884.03
64154235.73
128147456.33
25668369.02
51240509.67
102422166.94

I was surprised to see that with innodb_sync_spin_loops=100 we can improve to 145324 q/s , almost to peak throughput from first experiment.

With innodb_sync_spin_loops=100 the kernel_mutex is still the main point of contention, but InnoDB tries to prevent the current thread from pausing, and that seems helping.

Further experiments showed that 100 is not enough for 512 threads, and it should be increased to 200.

So there is final results with innodb_sync_spin_loops=200 for 1-1024 threads.

ThreadsThroughputThroughput spin 200
111178.3411288.42
227741.0628387.62
453364.5253575.52
892546.7392184.65
16144619.58143688.91
32164884.03164392.94
64154235.73154022.57
128147456.33152280.84
25668369.02150089.31
51240509.67127680.65
102422166.9461507.08

So playing with this variable we can double throughput to the level with 32-64 threads.
I am not really can explain how it does work internally, but I wanted to show one of possible ways
to deal with problem when you hit by kernel_mutex problem.

Further direction I want to try to limit innodb_thread_concurrency and also bind mysqld to less CPUs, and also it is interesting to see if MySQL 5.6.3 really fixes this problem.

17 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Baron Schwartz

For those who are curious, Vadim’s work here was partially in response to the mysterious kernel_mutex problem I had with a customer that turned out to be GDB-related: http://www.mysqlperformanceblog.com/2011/12/02/three-ways-that-the-poor-mans-profiler-can-hurt-mysql/

I am also not sure why raising the variable helped. On our phone call, Vadim and I discussed the variable and I guessed that lowering it would help and raising it would make it worse, because I thought that spinning was the problem 🙂 oprofile reports showed that ut_delay was consuming the vast majority of CPU time, and I thought that getting rid of the wasted work might potentially help. Wrong…

Raghavendra

So looks like in 5.6 the kernel mutex may have been finally let off — http://blogs.innodb.com/wp/?p=734 (They don’t seem to date their posts .. weird)

Baron Schwartz

That makes sense. I think that Mark Callaghan has mentioned this problem recently too.

Davi Arnaut

> And current implementation uses pthread_cond_broadcast to wake up threads.
> That means that all ( hundreds or thousands) threads, waiting on mutex, wake
> up all together at the same moment and trying to compete for mutex again.

pthread_cond_broadcast just requeues (FUTEX_REQUEUE) into the mutex wait list.
Perhaps the thundering herd you mention is at some other level?

Davi Arnaut

The point is that they are not all woken up at the same time/moment. When a condition is broadcasted, the threads waiting on the condition are just moved to the wait list of the mutex, where they are woken one by one.

Mark Callaghan

They are all scheduled to run so they are all going to run. Then they will busy-wait for 20 microseconds or more in the InnoDB mutex code and then a bit more in pthread code courtesy of PTHREAD_MUTEX_ADAPTIVE_NP. Then they will go back to sleep. When there are hundreds of them they will delay productive threads from being scheduled. They will also get cache lines in read-mode so that productive threads have to do cross-socket cache operations which leads to more latency. This is very inefficient.

Davi Arnaut

> This is what I refer to when I say that ALL threads wake.

Yes, eventually they will all wake up because they are waiting on the mutex. One thread will grab the mutex, and once it releases it, another thread is woken up.

What I was replying to is:

> wake up all together at the same moment

Which is not true for pthread_cond_broadcast. Again, if there are threads sleeping on the condition variable, they are re-queued into waiting on the mutex. If the mutex is unlocked, only the top-waiter is waked. Only one thread may lock a mutex, so there is simply no point is waking all threads.

References:

1. http://lwn.net/images/conf/rtlws11/papers/proc/p10.pdf See introduction.
2. http://repo.or.cz/w/glibc.git/blob/HEAD:/nptl/pthread_cond_broadcast.c

Davi Arnaut

> In InnoDB implementation after thread wakes it comes back to SPIN LOOP

Yes, but one important point, InnoDB only uses pthread synchronization objects to implement the wait queue of an InnoDB mutex. When the threads are on the wait queue, only one will be actually woken and this one grabs the _wait queue_ lock. Soon after being wake up, the thread releases the wait queue lock, which wakes up another and so on. Outside of the wait queue, what you said applies.

Peter Zaitsev

Setting innodb_sync_spin_loops is very interesting discussion because there is really no “right” answer – depending on what is the limiting mutex for your workload the different amount of spinning might make sense. Better solution would be to have this valuable to be set per mutex and adjusted automatically.

I believe it would be possible to design the system which would profile how long it takes to grab the mutex – say profiling one out of 1000 mutex get request. When based on distribution we can design how long it makes sense to wait. For example if we run long spin and can discover we either get the mutex after 10us of we spin till the end of time of 1000us we can decide to spin up to 20us or so which will deal with short locks of given mutex and switch to OS wait and stop wasting CPU if not.

James Day

You might want to look at innodb-adaptive-max-sleep-delay in MySQL 5.6. It makes innodb-thread-sleep-delay adaptive and is of particular value over 1024 threads in 5.6.

Sunny’s OOW presentation at https://oracleus.wingateweb.com/published/oracleus2011/sessions/20020/20020_Cho2577660.pdf mentions it on slide 28.

James Day, Oracle. This is my view only; for an official Oracle opinion consult a PR person.

Andy Carlson

Vadim,

I want to thank you for this informative post. I had a workload that I was working with a few years ago, that I could not get to perform well in innodb. It seemed like MySQL would attack one thread, and starve all the rest. I dug out the old code and data, and ran it with innodb_sync_spin_loops=64, and the workload performed much better.

Thanks again, and I will be watching for more posts from you in the future.

yangdehua

we also saw this problem

and what’s more , there is different in the manual of 5.1 and manual of 5.5

5.1 innodb_thread_concurrency the default value is 8 http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html#sysvar_innodb_thread_concurrency

5.5 the default value is 0 http://dev.mysql.com/doc/refman/5.5/en/innodb-parameters.html#sysvar_innodb_thread_concurrency

if we set innodb_thread_concurrency , the server ‘s load would get down.