July 23, 2014

Should you have your swap file enabled while running MySQL ?

So you’re running dedicated MySQL Linux box with plenty of memory, so the good question arises if you should have swap file enabled or disable it ? I’ve seen production successfully running on boxes both with and without swap file so it is not the question of you must do it this or that way but rather understanding advantages of both approaches.

I also would like to hear what you do yourself, and why :)

The rationale behind disabling swap is what there is nothing you want to swap out on such box anyway and if you disable swap file kernel will not swap and possibly will be able to manage memory smarter knowing it does not need to look for pages to swap out or balance memory for reducing a cache or swapping something out.

And indeed if you run with swap disabled you would not have the issue of swapping happening on the box as there is just nowhere to swap.

So what is about enabled swap – if we assume the kernel is smart (as we did for second case) we will be able to save a bit of memory as there are some programs which are started but never never really used. These would be first valid candidates to swap out. Second if you have swap file you get a bit more flexibility. What happens if you miscounted something or there is a gotcha in your application and you ended up with a lots of connections creating large temporary tables and so MySQL running out of memory (and getting killed by kernel). It could not even be MySQL but some script cron script or something similar with same affect.

In practice there are additional issues for both of the configurations – if you keep swap enabled you may have hard time keeping MySQL in memory because Kernel would love to swap it out. Recent kernels become much better than few years ago but I still run in workloads which expose bad kernel behavior and swap buffer space out even with /proc/sys/vm/swappiness =0. You can lock MySQL in memory by using –memlock option but it opens its own can of worms.

If you keep swap disabled you can run into another problems. A lot of code in the kernel assumes there is a swap space and I’ve seen kernel starting to behave crazy when there is memory pressure and there is nothing to swap out. This may actually mean the buffer pool (or other caches) which you can safely use can be less if you have swap file disabled. Though I must note again this can be workload and kernel version specific.

Besides general considerations there are many case specific ones. For example if you would rather spend more memory (allowing more reserve for spikes) than to have slowdown caused by swapping activities in case memory overcommit occurs you may be better off without swap file. If on the contrary you want to use as much memory as possible to get last bit of performance and do not mind slowdowns if you mixedup as well as have tools to resolve them quickly (ie killing runaway queries) keeping swap file enabled may be right for you.

Myself I tend to keep swap enabled on MySQL server but of course ensure there is not swapping happening (si/so columns in VMSTAT are zero or close to zero).

If you want to keep swap file disabled that also fine but you should make sure your workload is stable so there are no large spikes in memory requirements, as well as you have processes and discipline to make sure it is kept stable.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. m.sucajtys says:

    Exactly.
    I’ve recovered init script from my backup and made patch against default init script:
    — /etc/init.d/mysqld 2011-05-10 00:17:48.000000000 +0200
    +++ /etc/init.d/mysqld 2011-05-21 10:05:01.000000000 +0200
    @@ -52,12 +52,13 @@
    if [ $ret -ne 0 ] ; then
    return $ret
    fi
    fi
    chown mysql:mysql “$datadir”
    chmod 0755 “$datadir”
    + ulimit -l 2883584
    # Pass all the options determined above, to ensure consistent behavior.
    # In many cases mysqld_safe would arrive at the same conclusions anyway
    # but we need to be sure.
    /usr/bin/mysqld_safe –datadir=”$datadir” –socket=”$socketfile”
    –log-error=”$errlogfile” –pid-file=”$mypidfile”
    –user=mysql >/dev/null 2>&1 &

  2. Chris Rigby says:

    Ahhh,

    yea, this is what I ended up doing too.

    It seems that if the init script spawns mysqld from mysqld_safe, then it uses root’s ulimit status (default 32) and ignores mysqld’s child processes.

  3. m.sucajtys says:

    hmmm…
    I was almost sure, that I left this in script.
    Probably after upgrade upgrade script was overwritten.
    I must look into backups and check it again.
    my limits configuration looks like this:
    cat /etc/security/limits.d/mysqld.conf
    # Enable locking Huge Pages for MySQL user
    root soft memlock 2883584
    root hard memlock 3932160
    #mysql soft memlock unlimited
    #mysql hard memlock unlimited

  4. Chris Rigby says:

    Where abouts in the script? I had a brief look in /etc/init.d/mysql but there’s alot of ulimit commands in various function, I didn’t want to mess with any of them or possibly override something I shouldn’t..
    I’d be grateful if you could provide a snippet around the ulimit command you’ve put in.

  5. m.sucajtys says:

    In CentOS (package from distro, not from Percona) i’ve inserted ulimit command into mysql startup script.

  6. Chris Rigby says:

    Hmm, I’m having the same issue as Clint Byrum had.
    I have percona 5.5 mysql server running under CentOS 5.6 on a system with 32 GB of memory:

    My sysctl.conf looks like this:

    # Memory optimization
    vm.nr_hugepages = 14336
    vm.hugetlb_shm_group = 27
    kernel.shmmax = 8589934592
    kernel.shmall = 4194304

    My /proc/meminfo looks like this:
    HugePages_Total: 14336
    HugePages_Free: 14336
    HugePages_Rsvd: 0
    Hugepagesize: 2048 kB

    My limits.conf looks like this:

    mysql soft memlock unlimited
    mysql hard memlock unlimited

    My my.cnf looks like this:

    large-pages
    innodb_buffer_pool_size=4G
    .
    If I su to mysql from root and run “ulimit -l” it, indeed comes back with “unlimited”.
    If I run mysqld from the mysql user manually, it boots and uses HugePages:
    110518 11:28:32 InnoDB: Initializing buffer pool, size = 4.0G
    110518 11:28:32 InnoDB: Completed initialization of buffer pool

    If I try and use the init script that was installed with the Percona RPM, I get:

    110518 11:33:40 InnoDB: Initializing buffer pool, size = 4.0G
    InnoDB: HugeTLB: Warning: Failed to allocate 4412407808 bytes. errno 12
    InnoDB HugeTLB: Warning: Using conventional memory pool

    Even though when I ps aux |grep mysql, I can see that mysqld is running under the “mysql” user.

    What gives?

  7. Ok, so the way we do it on our production machines may seem a little crazy, but here it is:

    - We don’t create a swap partition on disk
    - On boot, we create a small RAM disk and create a swapfile on it

    Why on earth do we do this? Well, we *definitely* don’t want Linux swapping the DB out onto disk. On even the most recent kernels, with the vm tuneables set properly, this can still occur and when it does, it really really sucks.

    But, as Peter said, the kernel sometimes does some crazy things when there’s no swap available. We’ve seen kswapd start thrashing and chew up most of a CPU core even though there was plenty of free RAM available. Some kernel developers have told me that you need to have at least a handful of MBs available, so we tested it to see using our RAM disk method.

    Sure enough, the problems went away and we are only using a tiny amount of RAM (like 32MB or something like that) for swap.

    Yes, I realize how crazy using RAM->RAM disk->swap file->swap sounds, but it works. :)

  8. I generally leave swap enabled. Memory shouldn’t be that tight.
    Disabling is not a good idea.

    It’s another thing to monitor though, and people need to get their math right with configs. And not share serious MySQL servers with other tasks on a machine….

  9. Seth says:

    I alway enable the swap partition on a production machine MySQL or not. I don’t make a dedicated partition though, I just make a file on disk. As long as there is plenty of memory in the system, it shouldn’t dig into swap unless it really needs it.

    You don’t want to get caught with no memory or swap and have the dreaded OOM killer start killing off things.

    Just make sure your swapiness is set correctly (default of 60 should be good) and a little bit of swap should be fine. I have machines up for several years that haven’t yet touched swap.

  10. Mike Wallace says:

    We definitely keep swap enabled, as much as anything as a safeguard against *something* going haywire, chewing through all of our system memory, and bringing down the machine. Our current production DBs aren’t in any danger at the moment – 5 of 8G free in the kernel, mysql using 35% continuously, but on other (non-database) machines, we’ve seen rogue processes eat all 16G RAM, all 16G swap, and boom, down go the machines.

    If things do start running outside normal parameters, alerts go off – I like the idea that if things go nuts enough, the swapfile can slow things down enough to allow us at least a chance of recovery.

    That said, we never actually use swap, and that’s the way I like it.

  11. safari says:

    Enable MySQL large-pages to use memory hugepages can help.

  12. peter says:

    Don,

    Thanks for the tip. Such small space would not save you from jump of resource consumption but you’re saying it saves kernel from the craziness.

  13. peter says:

    Safari – Indeed large pages is one of the ways to avoid having MySQL buffer swapped out

  14. @peter:

    Yeah, I don’t think once in 6 years we’ve had a resource consumption issue. It helps that our DBs are all single-purpose boxes, so it’s fairly easy to do the math to make sure you never OOM.

    But the kernel craziness is crazy – and solveable. :)

  15. Brian Aker says:

    I keep it disables, just because I would prefer something to crash over the machine going into swap.

    If the machine goes into swap… I may not be able to log into it. Better to let the process blow through memory and die.

    I also never look at environments where losing a single server is mission critical. I consider that crazy :)

    Cheers,
    -Brian

  16. safari says:

    Actually, we’ve never got swap out to disk with our Mysql servers:-D (Hugepages in our mysql case helps to reduce context switching a lots)
    But I once experienced the problem happened on an Oracle DB server, and hugepages helped to resolve it.

  17. terr0rist says:

    Use FreeBSD and you’ll never run into swap problems.

  18. QQLinux says:

    to terr0rist:
    FreeBSD one per thread only can use 512M as defaults.
    That’s not good too DB.

  19. Jens-Petter Salvesen says:

    I prefer to turn off swap and keep relatively low settings for the MySQL caching (key cache, speficially). That way, disk sectors get cached in OS memory (both key pages and data pages, according to what we access at the time) and you can set the sort buffers etc to fairly high values without risking running out of memory.

    Of course, this approach assumes we have memory to spare.

  20. I prefer to leave swap file enabled. Our servers have high load (3-5) almost 24/7 so the swap is good idea in that case.

  21. Parvesh says:

    I always prefer to have swap enabled on my boxes and just keep a check on it. I do believe that if swapping (or any other unwanted issue) is happening, the first place one should look is at the application and not the server.

    As safari mentioned, large-pages always come handy.

  22. Clint Byrum says:

    On our big central DB server, we only have swap enabled because of tmpfs usage for tmpdir. We use a 20G tmpfs partition for tmpdir, and while we generally do have about 20G of RAM available, making tempfiles go quite fast, it also must be used for connection buffers/etc… if the server gets a lot of concurrent connections, I’d rather have it swap out the tempfiles and service the requests, than crash MySQL, or lower max_connections and have to give a ‘max connects reached’ error.

    On a somewhat OT note, I see a lot of mention of large_pages here. I’ve spent the last hour trying to make them work on our new server that has 64G RAM… I keep getting these when starting though..(innodb_buffer_pool_size=31G):

    InnoDB: HugeTLB: Warning: Failed to allocate 33286012928 bytes. errno 12
    InnoDB HugeTLB: Warning: Using conventional memory pool
    Warning: Failed to allocate 440401920 bytes from HugeTLB memory. errno 12

    /proc/sys/vm/nr_hugepages is 25000 (which is what I set it to at boot time, and equals about 48GB) , page size is 2MB .. /proc/sys/kernel/shmmax is 32GB, shmall is 42GB.

    errno 12, btw, is just “cannot allocate memory”

    If anyone has experience with large_pages like this, please email me or post here. Thanks!

  23. safari says:

    Set memlock (max locked memory) for mysql user in /etc/security/limits.conf to allow this user using such large of memory. After doing that and reboot the server, if you still get the error, try to restart mysqld then you can see it works.

    You can find more information at http://www.puschitz.com/TuningLinuxForOracle.shtml#UsingVeryLargeMemory.

  24. Clint Byrum says:

    Awesome. That worked well, thanks safari. Reboot was not necessary to test the setting btw, just had to run ‘ulimit -l xxxx’, though I rebooted to make sure it stuck. Somebody really should write the same kind of article for MySQL. :)

    Are there any benchmarks published for large_pages? Seems like it would help a lot for a high degree of concurrency given MySQL’s history of memory allocation bottlenecks.

  25. speak says:

Speak Your Mind

*