Virtualization and IO Modes = Extra Complexity

It has taken a years to get a proper integration between operating system kernel, device driver and hardware to get behavior with caches and IO modes correctly. I remember us having a lot of troubles with fsync() not flushing hard drive write cache and so potential hard drives can be lost on power failure. Happily most of these are resolved now with “real hardware” and I’m pretty confident running Innodb with both default (fsync based) or O_DIRECT innodb_flush_method. Virtualization however adds yet another layer and we need to question again whenever IO really durable in virtualized environments. My simple testing shows this may not always be the case

I’m comparing O_DIRECT and fsync() single page writes to 1MB file using SysBench on Ubuntu, ext4 running on VirtualBox 4.0.4 running on Windows 7 on my desktop computer with pair of 7200 RPM hard drives in RAID1. Because there is no write cache I expect it to do no more than a bit over 100 writes per second as even in case there is no disk seek we need to wait for disk head to make a full round to do a rotation. I’m however getting rather bizarre results:

Using fsync()

pz@ubuntu:~/test$ sysbench --num-threads=1 --test=fileio --file-num=1 --file-test-mode=rndwr  --file-total-size=1M --max-requests=10000000 --max-time=60 --file-fsync-freq=1 run
sysbench 0.4.10:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Extra file open flags: 0
1 files, 1Mb each
1Mb total file size
Block size 16Kb
Number of random requests for random IO: 10000000
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 1 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random write test
Threads started!
Time limit exceeded, exiting...
Done.

Operations performed:  0 Read, 1343 Write, 1343 Other = 2686 Total
Read 0b  Written 20.984Mb  Total transferred 20.984Mb  (357.62Kb/sec)
   22.35 Requests/sec executed

Test execution summary:
    total time:                          60.0863s
    total number of events:              1343
    total time taken by event execution: 0.0808
    per-request statistics:
         min:                                  0.04ms
         avg:                                  0.06ms
         max:                                  0.34ms
         approx.  95 percentile:               0.06ms

Threads fairness:
    events (avg/stddev):           1343.0000/0.00
    execution time (avg/stddev):   0.0808/0.00

pz@ubuntu:~/test$ sysbench --num-threads=1 --test=fileio --file-num=1 --file-test-mode=rndwr --file-total-size=1M --max-requests=10000000 --max-time=60 --file-fsync-freq=1 run

sysbench 0.4.10: multi-threaded system evaluation benchmark

Running the test with following options:

Number of threads: 1

Extra file open flags: 0

1 files, 1Mb each

1Mb total file size

Block size 16Kb

Number of random requests for random IO: 10000000

Read/Write ratio for combined random IO test: 1.50

Periodic FSYNC enabled, calling fsync() each 1 requests.

Calling fsync() at the end of test, Enabled.

Using synchronous I/O mode

Doing random write test

Threads started!

Time limit exceeded, exiting...

Done.

Operations performed: 0 Read, 1343 Write, 1343 Other = 2686 Total

Read 0b Written 20.984Mb Total transferred 20.984Mb (357.62Kb/sec)

22.35 Requests/sec executed

Test execution summary:

total time: 60.0863s

total number of events: 1343

total time taken by event execution: 0.0808

per-request statistics:

min: 0.04ms

avg: 0.06ms

max: 0.34ms

approx. 95 percentile: 0.06ms

Threads fairness:

events (avg/stddev): 1343.0000/0.00

execution time (avg/stddev): 0.0808/0.00

Ignore response times here as it times only writes not fsync() calls…. 22 fsync requests per second is pretty bad though I assume It can be realistic with overhead.

Now lest see how it looks using O_DIRECT

pz@ubuntu:~/test$ sysbench --num-threads=1 --test=fileio --file-num=1 --file-test-mode=rndwr --file-extra-flags=direct --file-total-size=1M --max-requests=10000000 --max-time=60 run
sysbench 0.4.10:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1

Extra file open flags: 16384
1 files, 1Mb each
1Mb total file size
Block size 16Kb
Number of random requests for random IO: 10000000
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random write test
Threads started!
Time limit exceeded, exiting...
Done.

Operations performed:  0 Read, 33900 Write, 339 Other = 34239 Total
Read 0b  Written 529.69Mb  Total transferred 529.69Mb  (8.8278Mb/sec)
  564.98 Requests/sec executed

Test execution summary:
    total time:                          60.0019s
    total number of events:              33900
    total time taken by event execution: 37.5364
    per-request statistics:
         min:                                  0.10ms
         avg:                                  1.11ms
         max:                                259.69ms
         approx.  95 percentile:               5.31ms

Threads fairness:
    events (avg/stddev):           33900.0000/0.00
    execution time (avg/stddev):   37.5364/0.00

pz@ubuntu:~/test$ sysbench --num-threads=1 --test=fileio --file-num=1 --file-test-mode=rndwr --file-extra-flags=direct --file-total-size=1M --max-requests=10000000 --max-time=60 run

sysbench 0.4.10: multi-threaded system evaluation benchmark

Running the test with following options:

Number of threads: 1

Extra file open flags: 16384

1 files, 1Mb each

1Mb total file size

Block size 16Kb

Number of random requests for random IO: 10000000

Read/Write ratio for combined random IO test: 1.50

Periodic FSYNC enabled, calling fsync() each 100 requests.

Calling fsync() at the end of test, Enabled.

Using synchronous I/O mode

Doing random write test

Threads started!

Time limit exceeded, exiting...

Done.

Operations performed: 0 Read, 33900 Write, 339 Other = 34239 Total

Read 0b Written 529.69Mb Total transferred 529.69Mb (8.8278Mb/sec)

564.98 Requests/sec executed

Test execution summary:

total time: 60.0019s

total number of events: 33900

total time taken by event execution: 37.5364

per-request statistics:

min: 0.10ms

avg: 1.11ms

max: 259.69ms

approx. 95 percentile: 5.31ms

Threads fairness:

events (avg/stddev): 33900.0000/0.00

execution time (avg/stddev): 37.5364/0.00

I would expect rather similar results to the test with fsync() while we’re getting numbers 20 times better… and surely too good to be true. Meaning I can be sure the system is lying about write completion if we’re using O_DIRECT IO

What is my take away on this ? I did not have a time to research whenever the problem is related to VirtualBox or some configuration issue. Things may be working correctly in your case. The point is Virtualization adds complexity and there are at least some cases when you may be lied to about IO completion, so if you’re relying on system to be able to recover from power failure or VM crash make sure to test it carefully.

12 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Patrick Casey

13 years ago

This one of those reasons why I often recommend not virtualizing high volume database servers. It makes me feel like an old fuddy-duddy whenever I say it, because I really do think virtualization represents the future, but its so opache.

With physical hardware, I can understand *exactly* what the hardware is capable of, what it is doing, and what i can expect out of it. Once virtualized, there’s a host of other cofactors at work that can impact performance and throughput. To add insult to injury, most of these secondary factors can’t be monitored from within the virtual machine (the matrix problem), and a lot of the tools within the instance will give erroneous or misleading results because of the virtualization layer.

I’d virtualize lab machines or low volume production databases without a concern in the world, but with a busy production box I still prefer to run on bare metal.

Justin Swanhart

13 years ago

By default Virtualbox virtual machines ignore fsync requests. You can make it honor them though:
http://www.virtualbox.org/manual/ch12.html#id411470

If you are using VMware, then there is no way to guarantee flushes unless the HOST operating system is Microsoft Windows, or you use ESX server. If you use a Linux host then the virtual machines (Linux or otherwise) won’t having flushing guarantees.
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1008542

Richard Dale

13 years ago

What are the implications here for a Linux/Amazon EC2 instance?

nate

13 years ago

Any virtual server load testing in my opinion isn’t worth spending time on unless it’s on ESX (at least 3.5 or newer), and on a fibre channel SAN(not iSCSI, not local storage), use a decent queue depth too the default seem to be way too low (depending on your array of course). I use raw device maps on all of my databases so I can use SAN-controlled snapshots, I am pretty sure that takes out some extra overhead associated with VMFS as well(not my priority though, I want the SAN snapshots).

In fact testing on a poor virtualization platform may actually be bad, because it may scare people away from virtualization when a good platform with solid hardware really runs quite well when configured right (VMware is so easy to configure, it’s not too uncommon for inexperienced people to horribly mis-configure it for example massively over committing cpu cores with lots of SMP VMs or massively over committing memory causing swapping).

Peter Zaitsev

Author

13 years ago

Nate,

I’m not doing any performance benchmarks here. I’m just noticing the IO speeds which are too good to be true, so at least in some cases you can get data loss in case of crash.

Richard – I can’t say anything about Amazon EC2 based on these tests. The problem with Amazon EC2 is you can’t really test what happens in case of power failure on the physical box. though most deployments assume instance is dead in such cases anyway and switch to replica or something

Peter Zaitsev

Author

13 years ago

Patrick,

Yeah. I think there is nobody is saying what virtualization will Increase performance. All discussions and work is rather on reducing overhead.

I kind of put Virtualization in the same bucket as SAN. There are reasons to use it, such as convenience, manageability, efficient use of resources, just do not put performance in the list.

Peter Zaitsev

Author

13 years ago

Patrick,

Yeah. I think there is nobody is saying what virtualization will Increase performance. All discussions and work is rather on reducing overhead.

I kind of put Virtualization in the same bucket as SAN. There are reasons to use it, such as convenience, manageability, efficient use of resources, just do not put performance in the list.

Peter Zaitsev

Author

13 years ago

Nate,

I’m not doing any performance benchmarks here. I’m just noticing the IO speeds which are too good to be true, so at least in some cases you can get data loss in case of crash.

nate

13 years ago

Richard Dale

13 years ago

What are the implications here for a Linux/Amazon EC2 instance?

Justin Swanhart

13 years ago

By default Virtualbox virtual machines ignore fsync requests. You can make it honor them though:
http://www.virtualbox.org/manual/ch12.html#id411470

Patrick Casey

13 years ago

I’d virtualize lab machines or low volume production databases without a concern in the world, but with a busy production box I still prefer to run on bare metal.

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Virtualization and IO Modes = Extra Complexity

Related

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: Sets and Sorted Sets

Securing Your MySQL Database: Essential Best Practices

Mastering Database Monitoring: Running PMM in High Availability Mode

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Virtualization and IO Modes = Extra Complexity

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: Sets and Sorted Sets

Securing Your MySQL Database: Essential Best Practices

Mastering Database Monitoring: Running PMM in High Availability Mode

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation