It has taken a years to get a proper integration between operating system kernel, device driver and hardware to get behavior with caches and IO modes correctly. I remember us having a lot of troubles with fsync() not flushing hard drive write cache and so potential hard drives can be lost on power failure. Happily most of these are resolved now with “real hardware” and I’m pretty confident running Innodb with both default (fsync based) or O_DIRECT innodb_flush_method. Virtualization however adds yet another layer and we need to question again whenever IO really durable in virtualized environments. My simple testing shows this may not always be the case

I’m comparing O_DIRECT and fsync() single page writes to 1MB file using SysBench on Ubuntu, ext4 running on VirtualBox 4.0.4 running on Windows 7 on my desktop computer with pair of 7200 RPM hard drives in RAID1. Because there is no write cache I expect it to do no more than a bit over 100 writes per second as even in case there is no disk seek we need to wait for disk head to make a full round to do a rotation. I’m however getting rather bizarre results:

Using fsync()

Ignore response times here as it times only writes not fsync() calls…. 22 fsync requests per second is pretty bad though I assume It can be realistic with overhead.

Now lest see how it looks using O_DIRECT

I would expect rather similar results to the test with fsync() while we’re getting numbers 20 times better… and surely too good to be true. Meaning I can be sure the system is lying about write completion if we’re using O_DIRECT IO

What is my take away on this ? I did not have a time to research whenever the problem is related to VirtualBox or some configuration issue. Things may be working correctly in your case. The point is Virtualization adds complexity and there are at least some cases when you may be lied to about IO completion, so if you’re relying on system to be able to recover from power failure or VM crash make sure to test it carefully.

12 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Patrick Casey

This one of those reasons why I often recommend not virtualizing high volume database servers. It makes me feel like an old fuddy-duddy whenever I say it, because I really do think virtualization represents the future, but its so opache.

With physical hardware, I can understand *exactly* what the hardware is capable of, what it is doing, and what i can expect out of it. Once virtualized, there’s a host of other cofactors at work that can impact performance and throughput. To add insult to injury, most of these secondary factors can’t be monitored from within the virtual machine (the matrix problem), and a lot of the tools within the instance will give erroneous or misleading results because of the virtualization layer.

I’d virtualize lab machines or low volume production databases without a concern in the world, but with a busy production box I still prefer to run on bare metal.

Justin Swanhart

By default Virtualbox virtual machines ignore fsync requests. You can make it honor them though:
http://www.virtualbox.org/manual/ch12.html#id411470

If you are using VMware, then there is no way to guarantee flushes unless the HOST operating system is Microsoft Windows, or you use ESX server. If you use a Linux host then the virtual machines (Linux or otherwise) won’t having flushing guarantees.
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1008542

Richard Dale

What are the implications here for a Linux/Amazon EC2 instance?

nate

Any virtual server load testing in my opinion isn’t worth spending time on unless it’s on ESX (at least 3.5 or newer), and on a fibre channel SAN(not iSCSI, not local storage), use a decent queue depth too the default seem to be way too low (depending on your array of course). I use raw device maps on all of my databases so I can use SAN-controlled snapshots, I am pretty sure that takes out some extra overhead associated with VMFS as well(not my priority though, I want the SAN snapshots).

In fact testing on a poor virtualization platform may actually be bad, because it may scare people away from virtualization when a good platform with solid hardware really runs quite well when configured right (VMware is so easy to configure, it’s not too uncommon for inexperienced people to horribly mis-configure it for example massively over committing cpu cores with lots of SMP VMs or massively over committing memory causing swapping).

nate

Any virtual server load testing in my opinion isn’t worth spending time on unless it’s on ESX (at least 3.5 or newer), and on a fibre channel SAN(not iSCSI, not local storage), use a decent queue depth too the default seem to be way too low (depending on your array of course). I use raw device maps on all of my databases so I can use SAN-controlled snapshots, I am pretty sure that takes out some extra overhead associated with VMFS as well(not my priority though, I want the SAN snapshots).

In fact testing on a poor virtualization platform may actually be bad, because it may scare people away from virtualization when a good platform with solid hardware really runs quite well when configured right (VMware is so easy to configure, it’s not too uncommon for inexperienced people to horribly mis-configure it for example massively over committing cpu cores with lots of SMP VMs or massively over committing memory causing swapping).

Richard Dale

What are the implications here for a Linux/Amazon EC2 instance?

Justin Swanhart

By default Virtualbox virtual machines ignore fsync requests. You can make it honor them though:
http://www.virtualbox.org/manual/ch12.html#id411470

If you are using VMware, then there is no way to guarantee flushes unless the HOST operating system is Microsoft Windows, or you use ESX server. If you use a Linux host then the virtual machines (Linux or otherwise) won’t having flushing guarantees.
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1008542

Patrick Casey

This one of those reasons why I often recommend not virtualizing high volume database servers. It makes me feel like an old fuddy-duddy whenever I say it, because I really do think virtualization represents the future, but its so opache.

With physical hardware, I can understand *exactly* what the hardware is capable of, what it is doing, and what i can expect out of it. Once virtualized, there’s a host of other cofactors at work that can impact performance and throughput. To add insult to injury, most of these secondary factors can’t be monitored from within the virtual machine (the matrix problem), and a lot of the tools within the instance will give erroneous or misleading results because of the virtualization layer.

I’d virtualize lab machines or low volume production databases without a concern in the world, but with a busy production box I still prefer to run on bare metal.