April 19, 2014

btrfs – probably not ready yet

Every time I have a conversation on SSD, someone mentions btrfs filesystem. And usually it is colored as a solution that will solve all our problems, improve overall performance and SSD in particular, and it is a saviour. Of course it caught my curiosity and I decided to perform a benchmark similar to what I did on ext4 filesystem over Intel 520 SSD.
I was prepared for surprises, as even on formatting stage, mkfs.btrfs says that filesystem is EXPERIMENTAL. In case with filesystems I kind of agree with Stewart, so question #1, what you should ask deciding on what filesystem to use, is “Was this filesystem used in a production more than 5 years?”, so from this point, btrfs has a long way ahead.

How you can get btrfs? Actually it is quite easy if you are on CentOS/RedHat/Oracle Linux 6.2.
Oracle provides Unbreakable Enterprise Kernel, which includes btrfs, so you can get it with this kernel. And installation is quite easy and straightforward, just follow instructions.

So, to numbers. Workload and benchmark are exactly the same as in my previous benchmark, and I perform runs only for 10 and 20GB buffer pool, as it is enough to understand picture. The previous run was done on ext4, so if we repeat the same on btrfs, it will allow us to compare the results.

I format btrfs with default options, and mount it with -o ssd,nobarrier options.

Throughput results:

We can see that btrfs not only provides worse throughput (5x!), but it is also less stable.

Response time:

The same happens with response time. Actually 95% response time is about 10x worse with btrfs.

And response time, timeline:

We can see that btrfs is very far from providing a stable response time.

I guess the conclusion is obvious, and I think it is fine for a filesystem that is in the EXPERIMENTAL state.
Most likely it is some bug or misconfiguration that does not allow btrfs to show all its potential.
I just will consider all talks of btrfs characteristic as premature and will wait until it is more stable
before running any more experiments with it.

Benchmarks specification, hardware, scripts and raw results are available in the full report for Intel 520 SSD.


About Vadim Tkachenko

Vadim leads Percona's development group, which produces the Percona Server and Percona XtraBackup. He is an expert in solid-state storage, and has helped many hardware and software providers succeed in the MySQL market.

Comments

  1. Very interesting metrics! I can’t say I’m surprised given how young BTRFS still is, but this is definitely a good starting point for seeing where the performance breakdown is. With that in mind, I’d be curious to see how ZFS compares to ext4, XFS, and BTRFS overall. Specifically, I’m wondering if the performance would be comparable to XFS (plus the added bonus of manageability and snapshots). I suppose another question is what performance difference would be visible by running ZFS running under an Illumos system vs. a Linux (probably RHEL-based) system. Would you be up to putting together some benchmarks on ZFS? :)

  2. Xavier De Cock says:

    Have you tried with this option? of course, you’ll loose most of the “advantages” of btrfs, but , it’s the “only way” to get a database on btrfs as of now (without involving vaudoo magic)

    nodatacow
    Do not copy-on-write data for newly created files. This also turns off checksumming! IOW, nodatacow implies nodatasum. datacow is used to ensure the user either has access to the old version of a file, or to the newer version of the file. datacow makes sure we never have partially updated files written to disk. nodatacow gives slight performance boost by directly overwriting data (like ext[234]), at the expense of potentially getting partially updated files on system failures. Performance gain is usually < 5% unless the workload is random writes to large database files, where the difference can become very large. NOTE: switches off compression !

    Source : https://btrfs.wiki.kernel.org/index.php/Mount_options

  3. Nicolas Berens says:

    More interesting would be if you just set the database files to nocow and leave the rest to “normal” btrfs operation (so that, for example, metadata is getting still copy-on-write’ed)

    What, imho would also be interesting is a benchmark over a btrfs raid 10 with compression (i qouldn’t expect a “real-raid” performance but this might be interesting)

  4. Andy says:

    Which fs would you recommend for MySQL? XFS or ext4 or something else?

  5. Olivier Doucet says:

    Hi Vadim,
    There are huge improvements in BTRFS speed in btrfs-next, which is merged with Kernel 3.4 (released three days ago). You may test with it and see results.

  6. Carl Johnstone says:

    brtfs will probably never be as fast as ext4 – the design goals for brtfs are completely different.

    It’s all about data redundancy and never losing your data. Everything is checksummed, files are copied when written to so you can roll back to old versions, and lots of other nifty features. There’s a cost to this extra overhead, and if you want the features you pay the cost – if not stick to something else.

    If speed is all that matters, then just run ext2 – or just stick InnoDB on a raw partition and leave the filesystem out completely. I mean in the case of a system crash you’ve got more chance of losing data – but that’s the trade-off for raw speed.

  7. Hans says:

    > It’s all about data redundancy and never losing your data.
    are you kidding me? google for “parent transid verify failed” and look the results for past 2-3 years till now…

  8. Carl Johnstone says:

    Hans – yeah experimental file-systems have bugs, and your point regarding the design goals of brtfs?

  9. Andy,

    you may see discussion ext4 vs xfs there
    http://www.mysqlperformanceblog.com/2012/03/15/ext4-vs-xfs-on-ssd/

    In short: at this point of time, my opinion, that ext4 can provide better throughput.
    But feature wise you may decide for yourself.

  10. This kinda strikes me as an apples-oranges comparison after reading the comments assuming COW was enabled? Maybe a comparison with ext4 on LVM with a snapshot volume enabled? Personally, I don’t use snapshots for performance reasons, but this might be an interesting comparison.

  11. Hans says:

    @Carl: yes, i point to the design BUGs of brtfs! btrfs need a btrfsck unlike zfs! why?

  12. Timothy,

    I think it is fine to compare COW but in this case one would need to disable double write in Innodb as it is not needed if file system does provide technology to guaranty no partial page writes instead.

    I do not agree on apples and oranges though. ZFS BTRFS and similar filesystem “promise” great features for very low costs on SSD. You can write data in new places and “seeks” on reads do not matter for SSD

  13. The general rule for database performance is: use XFS.
    There have been exceptions (see Vadim’s post on how around some relatively recent kernels ext4 has been a better choice). The number of huge database installations that will only run on XFS is non-trivial (anything else would kill their performance enough to possibly kill the business).

    While there are no copy on write options for BTRFS, ZFS and the like, the number of people who are going to use the default mkfs and mount options is *HUGE*.

  14. Wout Mertens says:

    http://cd34.com/blog/scalability/ext4-xfs-and-btrfs-benchmark-redux/ does some quick bonnie++ tests on the 3.4 btrfs version, looks like it’s comparable to the other filesystems now?

  15. nate says:

    > @Carl: yes, i point to the design BUGs of brtfs! btrfs need a btrfsck unlike zfs! why?

    Yes!

    I use this same logic when purchasing a car. I always make sure that my new cars never have a spare tire. Cars with spare tires are designed to have flat tires, so why would I want that?

  16. Hans says:

    @nate you have so mutch spare tires because you have to purchase a new car on every power outage or kernel panic…

  17. Antho says:

    Since RHEL is mentioned above, I have to note that it’s hard to tell what version of btrfs is in the kernel, but RHEL’s kernel is *still* numbered 2.6.x, and their practice is generally to lag years behind everyone else on what they ship. So yeah saying some kernel version or btrfs version is all well and good, but of limited use to those of us who have no choice in kernel version.

  18. Seppo Yli-Olli says:

    FWIW if we ever want Btrfs to work well with databases, there needs to be an attitude change towards how space is used. CoW is only really a problem if you don’t allocate all your space beforehand. There’s calls like posix_fallocate and whatnot to do this. If you resize your disks by keeping writing, there will be problems

Speak Your Mind

*