Percona Resources

Software
Downloads

All of Percona’s open-source software products, in one place, to download as much or as little as you need.

Product
Documentation

A single source for documentation on all of Percona’s leading, open-source software.

View Documentation

Resource Hub

A single source for all resources

Solution Briefs

View All Resources

Financial Services

Driving Database Success

Read the Report

kernel_mutex problem. Or double throughput with single variable

Benchmarks Insight for DBAs MySQL

Subscribe to RSS Feed

kernel_mutex problem. Or double throughput with single variable

December 2, 2011

Vadim Tkachenko

Problem with kernel_mutex in MySQL 5.1 and MySQL 5.5 is known: Bug report. In fact in MySQL 5.6 there are some fixes that suppose to provide a solution, but MySQL 5.6 yet has long way ahead before production, and it is also not clear if the problem is really fixed.

Meantime the problem with kernel_mutex is raising, I had three customer problems related to performance drops during the last month.

So what can be done there ? Let’s run some benchmarks.

But some theory before benchmarks. InnoDB uses kernel_mutex when it starts/stop transactions, and when InnoDB starts the transaction, usually there is loop through ALL active transactions, and this loop is inside kernel_mutex. That is to see kernel_mutex in action, we need many concurrent but short transactions.

For this we will take sysbench running only simple select PK queries against 48 tables, 5,000,000 rows each.

Hardware is Cisco UCS C250 server. The workload is read-only and fully in memory.

There is the result for different threads (against Percona Server 5.5.17):

Threads	Throughput, q/s
1	11178.34
2	27741.06
4	53364.52
8	92546.73
16	144619.58
32	164884.03
64	154235.73
128	147456.33
256	68369.02
512	40509.67
1024	22166.94

The peak throughput is 164884 q/s for 32 threads, and it declines to 68369 q/s for 256 threads, that is 2.4x times drop.

The reason, as you may guess, is kernel_mutex. How you can see it ? It is easy. In SHOW ENGINE INNODB STATUSG you will see a lot of lines like:

--Thread 140370743510784 has waited at trx0trx.c line 1184 for 0.0000 seconds the semaphore:
Mutex at 0x2b0ccc8 '&kernel_mutex', lock var 1
waiters flag 0
--Thread 140370752542464 has waited at trx0trx.c line 1772 for 0.0000 seconds the semaphore:
Mutex at 0x2b0ccc8 '&kernel_mutex', lock var 1
waiters flag 0
--Thread 140088222295808 has waited at trx0trx.c line 1184 for 0.0000 seconds the semaphore:
Mutex at 0x2b0ccc8 '&kernel_mutex', lock var 1
waiters flag 0
--Thread 140370746922752 has waited at trx0trx.c line 1184 for 0.0000 seconds the semaphore:
Mutex at 0x2b0ccc8 '&kernel_mutex', lock var 1
waiters flag 0
--Thread 140088223500032 has waited at trx0trx.c line 1184 for 0.0000 seconds the semaphore:
Mutex at 0x2b0ccc8 '&kernel_mutex', lock var 1
waiters flag 0
--Thread 140088231528192 has waited at trx0trx.c line 795 for 0.0000 seconds the semaphore:
Mutex at 0x2b0ccc8 '&kernel_mutex', lock var 1
waiters flag 0
...

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

--Thread 140370743510784 has waited at trx0trx.c line 1184 for 0.0000 seconds the semaphore:

Mutex at 0x2b0ccc8 '&kernel_mutex', lock var 1

waiters flag 0

--Thread 140370752542464 has waited at trx0trx.c line 1772 for 0.0000 seconds the semaphore:

Mutex at 0x2b0ccc8 '&kernel_mutex', lock var 1

waiters flag 0

--Thread 140088222295808 has waited at trx0trx.c line 1184 for 0.0000 seconds the semaphore:

Mutex at 0x2b0ccc8 '&kernel_mutex', lock var 1

waiters flag 0

--Thread 140370746922752 has waited at trx0trx.c line 1184 for 0.0000 seconds the semaphore:

Mutex at 0x2b0ccc8 '&kernel_mutex', lock var 1

waiters flag 0

--Thread 140088223500032 has waited at trx0trx.c line 1184 for 0.0000 seconds the semaphore:

Mutex at 0x2b0ccc8 '&kernel_mutex', lock var 1

waiters flag 0

--Thread 140088231528192 has waited at trx0trx.c line 795 for 0.0000 seconds the semaphore:

Mutex at 0x2b0ccc8 '&kernel_mutex', lock var 1

waiters flag 0

...

This problem is actually quite serious. In the real workloads I saw this happening with less than 256 threads, and not all production systems can tolerate 2x times drop of throughput in the peak times.

So what can be done there ?

In the first try, let’s recall that kernel_mutex (and all InnoDB mutexes) has complex handling with spin loops, and there are two variables that affects mutex loops: innodb_sync_spin_loops and innodb_spin_wait_delay. I actually think that tuning system with these variable is something closer to dance with drum than to scientific method, but nothing else helps, why not to try.

There we vary innodb_sync_spin_loops from 0 to 100 (default is 30):

Threads	Throughput	NA
1	11178.34
2	27741.06
4	53364.52
8	92546.73
16	144619.58
32	164884.03
64	154235.73
128	147456.33
256	68369.02
512	40509.67
1024	22166.94

I was surprised to see that with innodb_sync_spin_loops=100 we can improve to 145324 q/s , almost to peak throughput from first experiment.

With innodb_sync_spin_loops=100 the kernel_mutex is still the main point of contention, but InnoDB tries to prevent the current thread from pausing, and that seems helping.

Further experiments showed that 100 is not enough for 512 threads, and it should be increased to 200.

So there is final results with innodb_sync_spin_loops=200 for 1-1024 threads.

Threads	Throughput	Throughput spin 200
1	11178.34	11288.42
2	27741.06	28387.62
4	53364.52	53575.52
8	92546.73	92184.65
16	144619.58	143688.91
32	164884.03	164392.94
64	154235.73	154022.57
128	147456.33	152280.84
256	68369.02	150089.31
512	40509.67	127680.65
1024	22166.94	61507.08

So playing with this variable we can double throughput to the level with 32-64 threads.
I am not really can explain how it does work internally, but I wanted to show one of possible ways
to deal with problem when you hit by kernel_mutex problem.

Further direction I want to try to limit innodb_thread_concurrency and also bind mysqld to less CPUs, and also it is interesting to see if MySQL 5.6.3 really fixes this problem.

Related

17 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

12 years ago

For those who are curious, Vadim’s work here was partially in response to the mysterious kernel_mutex problem I had with a customer that turned out to be GDB-related: http://www.mysqlperformanceblog.com/2011/12/02/three-ways-that-the-poor-mans-profiler-can-hurt-mysql/

I am also not sure why raising the variable helped. On our phone call, Vadim and I discussed the variable and I guessed that lowering it would help and raising it would make it worse, because I thought that spinning was the problem 🙂 oprofile reports showed that ut_delay was consuming the vast majority of CPU time, and I thought that getting rid of the wasted work might potentially help. Wrong…

0

12 years ago

So looks like in 5.6 the kernel mutex may have been finally let off — http://blogs.innodb.com/wp/?p=734 (They don’t seem to date their posts .. weird)

0

Vadim Tkachenko

Author

12 years ago

Baron,

I believe I know the reason why innodb_sync_spin_loops helps there.

The problem is old, and by some reason I was sure it is fixed already.

The problem is that InnoDB uses it’s own mutex implementation, which internally uses condition variables.
And current implementation uses pthread_cond_broadcast to wake up threads.
That means that all ( hundreds or thousands) threads, waiting on mutex, wake up all together at the same moment
and trying to compete for mutex again.

Increasing innodb_sync_spin_loops allows to delay entering into using condition variables, and allows to resolve
mutex only via spin_loop.

In this case using innodb_thread_concurrency also should help, and I am running experiments with it right now.

0

Vadim Tkachenko

Author

12 years ago

Raghavendra,

Removing kernel_mutex does not automatically fixes problem, as you will face another mutex after that.
So I would wait on the results before saying that problem is fixed.

0

12 years ago

That makes sense. I think that Mark Callaghan has mentioned this problem recently too.

0

Davi Arnaut

12 years ago

> And current implementation uses pthread_cond_broadcast to wake up threads.
> That means that all ( hundreds or thousands) threads, waiting on mutex, wake
> up all together at the same moment and trying to compete for mutex again.

pthread_cond_broadcast just requeues (FUTEX_REQUEUE) into the mutex wait list.
Perhaps the thundering herd you mention is at some other level?

0

Vadim Tkachenko

Author

Reply to Davi Arnaut

12 years ago

Davi,

I am not sure what you refer by FUTEX_REQUEUE to, but you caught me on curiosity so I overcame my laziness and went to
1. http://pubs.opengroup.org/onlinepubs/009604499/functions/pthread_cond_signal.html
it says:
“The pthread_cond_broadcast() function shall unblock all threads currently blocked on the specified condition variable cond.”

2. As I get used to that the documentation may be wrong, I wrote test cond.c ( actually taken from
http://waxway.blogspot.com/2011/07/awake-all-threads-pthreadcondbroadcast.html)

with following change:

pthread_mutex_lock(&cond_mutex);
pthread_cond_wait(&cond, &cond_mutex);
printf(“T WOKE: %x\n”, pthread_self());
pthread_mutex_unlock(&cond_mutex);

and on single “pthread_cond_broadcast” it prints:
T WOKE: bd143700
T WOKE: bc742700
T WOKE: bb340700
T WOKE: ba93f700
T WOKE: bbd41700

That is all 5 threads woke up.

0

Davi Arnaut

12 years ago

The point is that they are not all woken up at the same time/moment. When a condition is broadcasted, the threads waiting on the condition are just moved to the wait list of the mutex, where they are woken one by one.

0

Vadim Tkachenko

Author

12 years ago

I posted followup with innodb_thread_concurrency
http://www.mysqlperformanceblog.com/2011/12/02/kernel_mutex-problem-cont-or-triple-your-throughput/

0

Vadim Tkachenko

Author

12 years ago

Davi,

If I following:
pthread_mutex_lock(&cond_mutex);
pthread_cond_wait(&cond, &cond_mutex);
printf(“T WOKE: %x\n”, pthread_self());
pthread_mutex_unlock(&cond_mutex);
printf(“T WOKE 2: %x\n”, pthread_self());

I get:
T WOKE: 91339700
T WOKE 2: 91339700
T WOKE: 90938700
T WOKE 2: 90938700
T WOKE: 8f536700
T WOKE 2: 8f536700
T WOKE: 91d3a700
T WOKE 2: 91d3a700
T WOKE: 8ff37700
T WOKE 2: 8ff37700

on single pthread_cond_broadcast.

This is what I refer to when I say that ALL threads wake.

In InnoDB implementation after thread wakes it comes back to SPIN LOOP

Simplifying, InnoDB mutex looks like:

mutex_enter():
{

spin_loop: 
  from i:=1 to innodb_sync_spin_loops:
    try acquire mutex; i++; sleep(rand_time);

if mutex is not granted: pthread_cond_wait()
goto: spin_loop;
}

mutex_exit:
{
pthread_cond_broadcast()
}

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

mutex_enter():

{

spin_loop:

from i:=1 to innodb_sync_spin_loops:

try acquire mutex; i++; sleep(rand_time);

if mutex is not granted: pthread_cond_wait()

goto: spin_loop;

}

mutex_exit:

{

pthread_cond_broadcast()

}

That is all threads in random order comes to pthread_cond_wait, but
once mutex released, they all WAKE UP and starting loop again.

0

12 years ago

They are all scheduled to run so they are all going to run. Then they will busy-wait for 20 microseconds or more in the InnoDB mutex code and then a bit more in pthread code courtesy of PTHREAD_MUTEX_ADAPTIVE_NP. Then they will go back to sleep. When there are hundreds of them they will delay productive threads from being scheduled. They will also get cache lines in read-mode so that productive threads have to do cross-socket cache operations which leads to more latency. This is very inefficient.

0

Davi Arnaut

12 years ago

> This is what I refer to when I say that ALL threads wake.

Yes, eventually they will all wake up because they are waiting on the mutex. One thread will grab the mutex, and once it releases it, another thread is woken up.

What I was replying to is:

> wake up all together at the same moment

Which is not true for pthread_cond_broadcast. Again, if there are threads sleeping on the condition variable, they are re-queued into waiting on the mutex. If the mutex is unlocked, only the top-waiter is waked. Only one thread may lock a mutex, so there is simply no point is waking all threads.

References:

1. http://lwn.net/images/conf/rtlws11/papers/proc/p10.pdf See introduction.
2. http://repo.or.cz/w/glibc.git/blob/HEAD:/nptl/pthread_cond_broadcast.c

0

Davi Arnaut

12 years ago

> In InnoDB implementation after thread wakes it comes back to SPIN LOOP

Yes, but one important point, InnoDB only uses pthread synchronization objects to implement the wait queue of an InnoDB mutex. When the threads are on the wait queue, only one will be actually woken and this one grabs the _wait queue_ lock. Soon after being wake up, the thread releases the wait queue lock, which wakes up another and so on. Outside of the wait queue, what you said applies.

0

Admin

12 years ago

Setting innodb_sync_spin_loops is very interesting discussion because there is really no “right” answer – depending on what is the limiting mutex for your workload the different amount of spinning might make sense. Better solution would be to have this valuable to be set per mutex and adjusted automatically.

I believe it would be possible to design the system which would profile how long it takes to grab the mutex – say profiling one out of 1000 mutex get request. When based on distribution we can design how long it makes sense to wait. For example if we run long spin and can discover we either get the mutex after 10us of we spin till the end of time of 1000us we can decide to spin up to 20us or so which will deal with short locks of given mutex and switch to OS wait and stop wasting CPU if not.

0

12 years ago

You might want to look at innodb-adaptive-max-sleep-delay in MySQL 5.6. It makes innodb-thread-sleep-delay adaptive and is of particular value over 1024 threads in 5.6.

Sunny’s OOW presentation at https://oracleus.wingateweb.com/published/oracleus2011/sessions/20020/20020_Cho2577660.pdf mentions it on slide 28.

James Day, Oracle. This is my view only; for an official Oracle opinion consult a PR person.

0

Andy Carlson

12 years ago

Vadim,

I want to thank you for this informative post. I had a workload that I was working with a few years ago, that I could not get to perform well in innodb. It seemed like MySQL would attack one thread, and starve all the rest. I dug out the old code and data, and ran it with innodb_sync_spin_loops=64, and the workload performed much better.

Thanks again, and I will be watching for more posts from you in the future.

0

12 years ago

we also saw this problem

and what’s more , there is different in the manual of 5.1 and manual of 5.5

5.1 innodb_thread_concurrency the default value is 8 http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html#sysvar_innodb_thread_concurrency

5.5 the default value is 0 http://dev.mysql.com/doc/refman/5.5/en/innodb-parameters.html#sysvar_innodb_thread_concurrency

if we set innodb_thread_concurrency , the server ‘s load would get down.

0

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: Configuration Best Practices

May 15, 2024

Valkey/Redis: Configuration Best Practices

Insight for DBAs Insight for Developers Open Source

Valkey/Redis: The Hash Datatype

May 14, 2024

Valkey/Redis: The Hash Datatype

Insight for DBAs Insight for Developers Open Source

Valkey/Redis Replication and Auto-Failover With Sentinel Service

May 13, 2024

Valkey/Redis Replication and Auto-Failover With Sentinel Service

Insight for DBAs Insight for Developers Open Source

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

December 28, 2012

Miguel Angel Nieto

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

June 20, 2023

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

Cloud Insight for Developers Percona Software PostgreSQL

MySQL “Got an error reading communication packet”

May 16, 2016

MySQL “Got an error reading communication packet”