August 1, 2014

RAID vs SSD vs FusionIO

In benchmarks passion (see my two previous posts) I managed to setup all three devices (RAID was on board; Intel X25-E SSD connected to HighPoint controller, FusionIO card) on our working horse Dell PowerEdge R900 (btw, to do that I had to switch from CentOS 5.2 to Ubuntu 8.10, as CentOS was not able to start with attached SSD card to HighPoint controller) and along with other tests I ran tpcc-like IO-bound workload on all devices.

For tests I used MySQL 5.4/InnoDB, and all other parameters are the same from previous posts (100W, buffer_pool 3GB). Filesystem – XFS mounted with nobarrier option.

Graphical results are here

and average results:

RAID10 – 7439.850 TPM
SSD – 10681.050 TPM
FusionIO – 17372.250 TPM

However what should be noted – both SSD and FusionIO are run in “non-durable” mode, that is you may lose some transactions in case of power outage (see my post http://www.mysqlperformanceblog.com/2009/03/02/ssd-xfs-lvm-fsync-write-cache-barrier-and-lost-transactions/).

While results for SSD (note it is single device, in comparison to RAID 10 on 8 disks) and FusionIO are impressive, it is worth to consider price/performance parameter.

Here is my very rough calculation:
For RAID 10 we use 8 73GB SAS 2.5″ 15K RPM disks, with price 190$ per disks it gives us 1520$ for 292GB useful space, or ~ 5.2$ per GB.
For SSD I can get 32GB card for 390$, which is ~12.1$ per GB
For FusionIO I really not sure what is price (it was given as only for tests), but quick googling gave me 30$ per GB, so for 160GB card gives 4800$.

Now simple dividing TPM on price of IO system, we have
RAID 10 – 4.8 TPM / $
SSD – 27 TPM / $
FusionIO – 3.6 TPM / $

Please note that price of transaction is not the main criteria to consider, as total TCO for systems with SSD may be much cheaper (considering you need less servers, less space, less power). Also worth to consider that SSD is only 32GB space and to have the same space as FusionIO we need 4 cards (but it still will be cheaper than FusionIO), but it also may improve performance as such setup will be able to handle IO requests in parallel.

About Vadim Tkachenko

Vadim leads Percona's development group, which produces Percona Clould Tools, the Percona Server, Percona XraDB Cluster and Percona XtraBackup. He is an expert in solid-state storage, and has helped many hardware and software providers succeed in the MySQL market.

Comments

  1. peter says:

    When you have the storage two needs you have are IOs and Space (ignoring seq transfer rate for a second) which makes it a bit hard to compare, in particular because of density and scaling concerns.
    For example how much storage can you get with Intel SSDs ? RAID5 from 8 64G drives will give you 448G of space in standard 2U chassis. You can probably double it if you have chassis supporting much more drives. With FusionIO as I understand there are 320G cards and you should be able to put couple of them in the same 2U chassis getting comparable storage. If you’re looking for directly attach drives using 300G 10K RPMs you can get about 5x space as with Intel drives if space is a concern.

    It would be very interesting to see how raw IO, filesystems as well as MySQL scale with multiple Intel SSD drives of FusionIO cards. I have a feeling it is not going to scale too good as few technologies are adapted to the extreme IO rates Flash based devices can offer.

  2. pat says:

    What I can’t help wonder is if a lot of the “tricks” that a modern database like innodb uses might actually be counterproductive on a device with fairly uniform access times like an SSD.

    Does the double-write buffer help you or hinder you in a case where access time to all sectors is equal?

    Do things like optimistic pre-fetching (to reduce random IOs) help or hinder you on an SSD where the relative cost of random IOs isn’t as high?

    I have to wonder, if you designed your database from the ground up on the assumption that you’d have an SSD in back of it, would you end up with something significantly different from what we see today?

  3. Vadim says:

    pat,

    I think you are right.
    If look at Mark’s presentation on Percona Conference http://www.percona.com/ppc2009/PPC2009_Life_of_a_dirty_pageInnoDB_disk_IO.pdf
    you may see a lot of tricks are done by InnoDB just to fight with spinning HDD and random IO.

    I guess biggest part of that logic can be throw away and replaced by simple operation on SSD devices. We are going to do some work in that direction.

  4. peter says:

    Pat,

    You’re right – it can be very interesting thing to tune Innodb for SSD devices. In fact I think we just need an option to specify if we’re working with SSD or normal device because a lot of optimizations can be different.

    For example I do not think flushing neighboring pages is a good idea on SSD, also you can do read ahead from random positions as well as from sequential. The double write there is not for performance – it comes as overhead for most of higher end storage – it is there for protection – which may not be needed for copy on write file systems which are often in use with SSD.

  5. Matt Y says:

    Vadim,

    Any notes on reliability, performance over long runs, and Other impact on the servers. You maybe under NDA and cannot answer which is fine, but I have heard that Fusion has certain interesting characteristics under certain load. I.e. The driver takes up a ton of memory ( heard things that suggested it maybe linear to the amount of data on disk ), also heard that over x amount time performance started to dwindle significantly, also interested to see performance as the drive fills… having talked to a lot of storage folks over the last year you overhear things so all these could just be rumors, but fusion decided not to give me access to a drive to find out for sure:) Any additional details would be awesome.

  6. Vadim says:

    Matt,

    I am not under NDA in this case, so I can share any finding and ideas :)

    As I understood by default FusionIO is not in “reliable” mode, that’s you may lose your few last writes in case of power outage. There is new drivers where you can turn on “reliable” mode, but I did not test it yet.

    As for performance over long run you may be right. In documentation I saw you can trade “space” for performance, that’s you say – that only 120GB (or even less) space available, and other 40GB allocated for FusionIO internal needs.

    Also I saw strange effect – after some long runs FusionIO got non-adequate slow with about only maximum 40 IOS. FusionIO supports said that it was broken card and I got replacement.

    I have no more details about card, as I am playing with it for few days only.

    I actually can give you access to the box with card, so you could run some tests you want, but after we do some performance runs of XtraDB, we have to catch up 5.4 :)

  7. Pat, I’m the least expert of anyone commenting here, but it’s my opinion that a database truly designed for SSD simply doesn’t exist.

    Also realize that SSD is a Flash device with a hard-drive-emulation in front of it (loosely speaking). In my opinion, the future will be more like “I’ve got a vast pool of memory” than “I’ve got memory and hard drive” or “memory plus SSD.” SSD might only be an interim technology as we edge towards something really different. A lot of things need to be questioned. What do we need filesystems for? What is I/O and what is memory access, and why do we differentiate the two?

    It’s not just databases — operating systems probably aren’t ready, either. But this is status quo: software historically lags hardware, by and large.

    Back to my Perl coding.

  8. Mark Callaghan says:

    Doesn’t XtraDB have an option to disable flushing of all dirty pages in an extent when one of them has an LSN that is too old? If so, when will we get performance results from Vadim on SSD to determine whether that is useful? It will reduce the overall write rate.

  9. To be honest these benchmarks are somewhat pointless.

    Who would run a database without durability?

    It’s better to run innodb flush tx at log commit=2 rather than have the disk subsystem do this.

    You can lose some transactions but at least when you restart you don’t have data corruption.

  10. Didier Spezia says:

    Very interesting benchmark. On a Schooner presentation (http://www.percona.com/ppc2009/PPC2009_Schooner_Percona_Presentation.pdf), it is stated (slide 12), that PCIe flash cards (a la FusionIO) “use lots of server processor cycles and memory for garbage collection, write coalescing, wear leveling, mapping”. So in other words, using SSD drives with a good RAID controller is supposed to lead to a more balanced machine. I’m just curious: did you really measure such effects or is is just marketing fluff? For instance, is the extra CPU consumed with FusionIO can be correlated with the extra TPM you get?

  11. Ben says:

    Small note: Your RAID 10 assumes you pay only for the drives — not the RAID controller. So it may be more realistic to add about $200. Therefore:

    8 x $190 = $1520 for disks + $200 for controller = $1790
    $1720 / 292GB useful space = 5.9¢ per GB

    Still cheap in comparison.

  12. Carl Johnstone says:

    How do SATA disks compare? I realise they’re going to have a lower performance – but that comes at a much lower price than SAS disks…

  13. SSD Kevin says:

    From my first hand experience with the iodrive; they are not ready for prime time. The cards are fast… but have a substantial chance for metadata corruption, as well as the worries of card failure (see post from Vadim). There is so much redundancy in a SAN, it is tough to overlook as you put valuable data on these cards. I am going to hold on and leave these in the lab until the SSD market matures.

  14. james braselton says:

    HI THERE CLICK FREE HAS A NEW CLICK FREE TRAVLER THAT IS A SSD BACK UP 16 32 OR 664 GB

  15. Pounce says:

    Ben, I think you meant $5.90 per GB, not 5.9¢ per GB.

  16. DangerD says:

    Seems like some folks are on to this SSD optimisation already.

    http://rethinkdb.com

  17. jftan says:

    if u test with 300G data, the result is very different. im sure
    random read will be the bottleneck.

Speak Your Mind

*