April 19, 2014

Can’t Create Thread: Errno 11 (A Tale of Two Ulimits)

Recently some of my fellow Perconians and I have noticed a bit of an uptick in customer cases featuring the following error message:

The canonical solution to this issue, if you do a bit of Googling, is to increase the number of processes / threads available to the MySQL user, typically by adding a line like this to /etc/security/limits.conf:

followed up by a restart of MySQL in a fresh user session. If you’re running RHEL/CentOS 5, Ubuntu, or versions of Fedora prior to Fedora 9, this solution will most likely solve your problem. But with RHEL/CentOS 6 (and derivatives) and Fedora 9 and later, you’ll likely find that this tweak no longer works.

In Fedora 9 and RHEL 6, a “bug fix” was introduced which was ostensibly intended to make forkbombing the system more difficult. [If you're interested in the actual RedHat Bugzilla entry, you can read more about it here.] The result of this so-called “fix” was the introduction of a new PAM limits file, 90-nproc.conf, which appeared in the /etc/security/limits.d folder and contained the following:

Just looking at this file, it seems innocuous enough, except that there’s one ever-so-small problem (aside from even knowing to look for this file in the first place!)

The 90-nproc.conf file OVERRIDES any process/thread limit changes that you may have made to /etc/security/limits.conf!

So even if you’ve properly adjusted /etc/security/limits.conf and set an explicit limit for nproc greater than 1024 for the “mysql” user, those limits will not be honored because of the existence of this file. If you actually want to increase the thread/process limit for the mysqld process under RHEL 6+ / Fedora 9+, you’ll actually need to do one of the following:

  1. Put your nproc changes directly in this file.
  2. Delete the file entirely (although it might get replaced by a future PAM upgrade).
  3. Add a new file in /etc/security/limits.d with a higher number (e.g., 91-mysql.conf) and set your limits in there – this is the solution that I’d recommend.

But wait, as they say during late-night television infomercials, there’s more!

One thing that people often forget when it comes to adjusting nproc/nofile limits in /etc/security/limits.conf or even this new 90-nproc.conf file is that these files are used ONLY to control the behavior of the PAM user limits module, pam_limits.so, which is only applicable to USER sessions. The implication here is that instances of services that are started at boot time by SysV initscripts are not affected by changes to these limits files.[1] Conversely, it also means that once you restart a process from within a user session, that service then becomes subject to any PAM-based limits. The end result is a sort of Heisenbug, wherein the actual nproc limits that apply to your MySQL instance are contingent upon how/when it was launched, and if you don’t restart your MySQL server very often, this issue could go undetected for a long time.

To illustrate, we take a standard CentOS 6.3 server that has had no changes of any kind made to /etc/security/limits.conf or /etc/security/limits.d/90-nproc.conf – it’s just a standard default installation. If we do a fresh reboot of the machine, allow MySQL to start up as a normal part of the boot process, and then check the process limits, we find that “Max processes” appears to have plenty of headroom. If max_connections is set to 1200, we should have no problem accepting 1200 simultaneous connections without running into a “can’t create thread” error at 1024.

But let’s say that a new version of MySQL comes out and we need to do an upgrade. Odds are that we’re not going to reboot the whole machine to do this upgrade; we’ll just install the fresh RPMs and restart mysqld. Or perhaps we had some other cause to restart the MySQL server; the actual reason doesn’t matter. But now take a look at what’s happened to the process/thread limit:

Now, once this server hits 1024 simultaneous connections, we’ll start hitting “can’t create thread.” If this restart came about as part of a MySQL version upgrade, we might incorrectly (but justifiably) think that we’d encountered a bug in the newly-upgraded version, only to be further confounded by the error’s failure to disappear upon rolling back to the previous “error-free” version. What makes it all that much worse is that once we encounter this issue, it would appear that the only way we can actually correct it is to bounce MySQL yet again, since the only way for changes to the PAM limits files to get refreshed are via the start of a new session (such as logging out and logging back in). If we have a high-traffic MySQL server with a very large InnoDB buffer pool, having to bounce mysqld a second time could mean further minutes or hours of unacceptable performance degradation while the buffer pool warms up. If only we had a way to modify these resource limits on the fly, without a MySQL server restart….

As it turns out, we do. On kernel 2.6.32 and later (and luckily for us, 2.6.32 just happens to be the kernel version in use by RHEL 6), there’s actually a very handy way to immediately increase the resource limits for a running process without the need for a restart and all of the potential unpleasantness that comes along with it.[2] We can simply run the following command, as root, replacing SOFT_LIMIT and HARD_LIMIT with our desired process/thread limits. For no limit, just use the word “unlimited”; otherwise these should be numeric values.

So what have we learned here?

If you’re running a Fedora 9/RHEL 6 derivative, you’re most likely affected by this issue, and the complete fix for it comes in three steps.

  1. Use the aforementioned “echo” trick to ensure that the running server’s limits are sufficient.
  2. Edit 90-nproc.conf (or create 91-mysql.conf) to set the desired process/thread limits to handle any server restarts.
  3. Ensure that you’ve logged out / logged back in before doing a restart; even if you make the changes in steps a and b, you still need to make sure that your “service mysql restart” is executed from a session that has picked up the modified limits.

[1] systemd, which became the Fedora standard a couple of versions ago and will likely be the new way of doing things in RHEL 7, appears to resolve the issue. See here for additional details (Thanks to Raghavendra Prabhu for the link.)
[2] Credit to Ovais Tariq for telling me about this trick.

About Ernie Souhrada

Ernie joined Percona in April 2012 as a Senior Consultant. In his previous lives, he has been everything from a Perl/Java developer to a Linux sysadmin, a MySQL DBA to a Cisco network engineer, and a security auditor to an IT engineering manager, many of these things all at the same time. When not working on MySQL, he might be found on the ski slope, at a psytrance festival, or at the nearest sushi bar.

Comments

  1. This reminds me of this bug:
    http://bugs.mysql.com/bug.php?id=62224

  2. Moritz says:

    Saved my ass just yesterday evening… Fantastic Timing! THANKS! :)

  3. Sheeri says:

    We ran into this problem at Mozilla when upgrading to Percona 5.1 – Oracle’s MySQL startup script (at least for MySQL 5.0) sources /etc/sysconfig/mysqld – which had a ulimit setting. However, Percona’s patched MySQL 5.1 has a different startup script, which does *not* source that script.

    So then we put the limits in /etc/security/limits.d/99-nproc-mysql.conf, and we ensure via puppet that the file stays there, even if there’s a PAM upgrade.

  4. Vojtech Kurka says:

    Thank you! Saved me a lot of time.

  5. Pavel says:

    So, i’ve do it all, create a new file called 91-mysql.conf in limits.d, but it no work.

    mysql soft nofile 10240
    mysql hard nofile 40960
    mysql soft nproc 10240
    mysql hard nproc 40960

    I still get

    [root@security]# cat /proc/31715/limits
    Limit Soft Limit Hard Limit Units
    Max processes 1024 385850 processes

    I put file 91-mysql.conf contents into limits.conf and get same 1024 value.

    What do i wrong? My OS Centos 6.4 version.
    No one of this value not implemented.

  6. @Pavel–

    After you added the 91-mysql.conf file, what did you do next? Did you just restart MySQL from the same session that you were logged in with, or did you start a new login session to do the restart? If you just created the file and then did a restart of MySQL, that doesn’t work. You’d actually need to do something to create a new user session – such as SSHing to localhost and then doing the restart. In other words, log out and then log back in, and then do a service mysql restart.

    Alternatively, you can use the trick in this post to modify the limits of the running mysqld process and then your changes in the 91-mysql.conf should be picked up next time you SSH into the box and restart MySQL.

  7. Pavel says:

    @Ernie Souhrada–

    Yes, i added file and just restart.

    Ok, i logout from server, connect ssh to server and restart:
    $ sudo service mysql2 restart
    [linko@g4 ~]$ ps -ax | grep mysql
    35328 pts/1 S 0:00 /bin/sh /usr/bin/mysqld_safe –defaults-file=/etc/my2.cnf –datadir=/var/lib/mysql2 –pid-file=/var/lib/mysql2/g4.dmz.local.pid
    35967 pts/1 Sl 0:00 /usr/sbin/mysqld –defaults-file=/etc/my2.cnf –basedir=/usr –datadir=/var/lib/mysql2 –plugin-dir=/usr/lib64/mysql/plugin –user=mysql –log-error=/var/log/mysqld2.log –open-files-limit=32768 –pid-file=/var/lib/mysql2/g4.dmz.local.pid –socket=/var/lib/mysql2/mysql.sock –port=3307

    $ sudo cat /proc/35967/limits
    Limit Soft Limit Hard Limit Units
    Max processes 1024 unlimited processes

    Next time i do another.
    1. ssh to server and stop mysqld
    2. logout
    3. login to server and start mysqld.

    And still get:
    Max processes 1024 unlimited processes

    I’ll try more some magic things now.

  8. Ernie S says:

    @Pavel–

    Ah, OK, I know what the issue is. You’re SSH’ing in and starting the server up as root. So those limits aren’t getting picked up because they are set only for the mysql user. Add this to 90-nproc.conf (or some other file in /etc/security/limits.d):

    root soft nproc 10240

    Then you should be good to go.

  9. Source: Boston bomb suspect says brother was brains behind attack

    Boston Marathon bombing suspect Dzhokhar Tsarnaev has told investigators his older brother Tamerlan was the driving force behind last week’s attack and that no international terrorist groups were behind them, a U.S. government source

  10. astrostl says:

    WONDERFULLY helpful post on a deep issue. Thanks, Ernie!

  11. didi says:

    Not so easy though:
    # grep proc /proc/8416/limits
    Max processes 95918 95918 processes
    # echo -n “Max processes=95917:95918″ > /proc/8416/limits
    bash: echo: write error: Invalid argument
    # uname -a
    Linux testbed 3.2.0-45-generic #70-Ubuntu SMP Wed May 29 20:12:06 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

  12. Ernie Souhrada says:

    Hm, yeah, I just tried it on my F18 system (kernel 3.9) and got the same error. Apparently this is a 2.6.x hack only, or maybe there is some other workaround for newer kernel versions that I’m not aware of.

  13. Ernie Souhrada says:

    For anyone who’s still following this post, the way to adjust the limits of a running process under kernel 3.x when you can’t write directly to /proc is to use “prlimit” from the util-linux-ng package.

    Ex:

    prlimit –nproc=5000:1032160 –pid pidof mysqld

  14. mathieu says:

    Just saved my ass, but the max number of process was a side show.

    The real problem was max number of files for the user. When I raised nproc, I could connect, but found that there were about 1000 connections that were killed but their threads were still running. Mysql wasn’t able to get any more file descriptors, preventing it from closing the connections. Raising nofile on the live process didn’t help, once it had already failed.

    BTW, if using the echo trick, you get this:
    bash: echo: write error: Invalid argument
    It really means you have an error in what you’re trying to write. Forgetting the -n to echo, or putting invalid numbers will do that.

  15. Nick says:

    I’ve been battling with this problem for a while now. A while back I setup a file in /etc/security/limits.d/ called 99-mysql.conf with an increased nproc and nofile limits. I patted myself on the back after doing so, went and had a coffee and forgot all about it.

    From time to time I’d still get errno 11 come through, but never had time to check it. The server always came right after a few seconds, so it wasn’t a major issue.

    It wasn’t until I found this post that I found out how to dig into it further. It seems that even with the 99-mysql.conf in place my nproc limit is still set to 1024 (as defined in 90-nproc.conf). What’s even more strange is if I blank out 90-nproc.conf the limit still exists (although I only logged off/logged on again, a reboot may be required).

    I found the only way to increase the nproc limit on mysqld was to use the echo trick. I’m in the process of rewriting the init script for mysqld to always enforce this nproc limit increase, but it seems like an ugly hack.

    Is there any better way to enforce the higher nproc limit, being that the limits.d directory seems useless.

Speak Your Mind

*