April 25, 2014

Testing the Micron P320h

The Micron P320h SSD is an SLC-based PCIe solid-state storage device which claims to provide the highest read throughput of any server-grade SSD, and at Micron’s request, I recently took some time to put the card through its paces, and the numbers are indeed quite impressive.

For reference, the benchmarks for this device were performed primarily on a Dell R720 with 192GB of RAM and two Xeon E5-2660 processors that yield a total of 32 virtual cores. This is the same machine which was used in my previous benchmark run. A small handful of additional tests were also performed using the Cisco UCS C250. The operating system in use was CentOS 6.3, and for the sysbench fileIO tests, the EXT4 filesystem was used. The card itself is the 700GB model.

So let’s take a look at the data.

With the sysbench fileIO test in asynchronous mode, read performance is an extremely steady 3202MiB/sec with almost no deviation. Write performance is also both very strong and very steady, coming in at a bit over 1730MiB/sec with a standard deviation of a bit less than 13MiB/sec.

realssd-asyncIO

 

When we calculate in the fact that the block size in use here is 16KiB, these numbers equate to over 110,000 write IOPS and almost 205,000 read IOPS.

When we switch over to synchronous IO, we find that the card is quite capable of matching the asynchronous performance:

syncIO-throughput

Synchronous read reaches peak capacity somewhere between 32 and 64 threads, and synchronous write tops out somewhere between 64 and 128 threads. The latency numbers are equally impressive; the next two graphs show 95th and 99th-percentile response time, but there really isn’t much difference between the two.

syncIO-latency

At 64 read threads, we reach peak performance with latency of roughly 0.5 milliseconds; and at 128 write threads we have maximum throughput with latency just over 3ms.

How well does it perform with MySQL? Exact results vary, depending upon the usual factors (read/write ratio, working set size, buffer pool size, etc.) but overall the card is extremely quick and handily outperforms the other cards that it was tested against. For example, in the graph below we compare the performance of the P320h on a standard TPCC-MySQL test to the original FusionIO and the Intel i910 with assorted buffer pool sizes:

tpcc-mysql-devicecompare

 

And in this graph we look at the card’s performance on sysbench OLTP:

sysbench-oltp-ext4xfs

It is worth noting here that EXT4 outperforms XFS by a fairly significant margin. The approximate raw numbers, in tabular format, are:

-EXT4XFS
13GiB BP220007500
25GiB BP170009000
50GiB BP2100011000
75GiB BP2500015000
100GiB BP3100019000
125GiB BP3600025000

In the final analysis, there may or may not be faster cards out there, but the Micron P320h is the fastest one that I have personally seen to date.

About Vadim Tkachenko

Vadim leads Percona's development group, which produces the Percona Server and Percona XtraBackup. He is an expert in solid-state storage, and has helped many hardware and software providers succeed in the MySQL market.

Comments

  1. BenBradley says:

    This looks awesome. Would you still want to RAID these for reliability though? Be interesting to see if there’s any drop-off with a pair of these in a RAID 1.

  2. Andy says:

    1) Would you recommend EXT4 over XFS for MySQL on SSD? What about Btrfs or ZFS? Seems like with SSD the newer file systems could be more appropriate.

    2) How does P320h compare to Virident? You benchmarked Virident a while ago (http://www.mysqlperformanceblog.com/2010/06/15/virident-tachion-new-player-on-flash-pci-e-cards-market/) and found it to be significantly faster than FusionIO as well.

  3. Has anyone explained why performance can be a lot better with EXT4?

  4. Mark,

    Thanks. This is what I’m wondering too… The difference is so much significant especially considering with O_DIRECT the majority of the stuff what we expect file system to do is perform IO mapping from physical to direct addresses. I also wonder whenever it can be also caused by different alignment by default.

    Vadim – when you’re saying “Original FusionIO” do you mean the first generation IoDrive ever released ? I think specific model number would really help to understand it.

    What I also find interesting is why at 25G buffer pool it provides the highest variance among different cards while on other sizes it seems like the lowest.

    Anyway results are truly amazing. It is great to leave at times when amount of IO you can push to the storage on the single node is measured in Gigabytes/sec :)

  5. Dimitri says:

    Vadim,

    indeed the difference between XFS and EXT4 is pretty surprising.. – did you monitor by chance the memory usage during your tests?.. – I’m asking about because in the past I’ve observed that EXT4 still cached data even under O_DIRECT, while XFS not at all. This may also probably explain why there is near no difference on OLTP @EXT4 results when BP size is between 13GB and 50GB..

    interesting also if on 13GB BP there was really x3 times more I/O ops on EXT4 than on XFS (as it should be fully IO-read-bound here (I’m supposing your db size is big (as it was not mentioned) and OLTP_RW scenario is “standard”?)..

    Rgds,
    -Dimitri

  6. I can answer a few of these….

    @Andy–
    On point 1, I suspect that it depends a lot on your kernel. There was a performance regression in XFS in the pre-CentOS 6.4 kernels which made EXT4 faster, but this was supposedly fixed in the latest 2.6.32-358 (I think) kernel. However, I’ve run into a couple of situations where the latest version of XFS (on CentOS 6.4) crashes the box with a kernel panic – so the developers may have replaced one issue with another. DISCLAIMER: I have not attempted to reproduce said issue, it may have been coincidental. As to the others, I don’t think Btrfs is production-ready yet. Don’t know enough about ZFS on Linux to comment on that one.

    On point 2, the p320h is faster than Virident. You can compare numbers here:
    http://www.mysqlperformanceblog.com/2013/03/21/testing-the-virident-flashmax-ii/

    @Dimitri–
    The data size for these tests was roughly 250GiB, standard oltp_rw scenario.

  7. Golan Zakai says:

    would you like to benchmark ZD4RM88-FH-1.6T OCZ card ?

    I have a machine ready for testing if you wish.

  8. Tim Linerud says:

    But at only 700 GB, there are other PCIe Flash Storage devices with much larger capacities, and similar performance. What is the endurance of this Micron card?

  9. Thanks for sharing this interesting perspective. In our lab results as well, we found Micron p320h to be a much better performing SSD than several others that we have been using including FusionIO and Virident. We have been able to speed up MySQL database in a application transparent and cost effective manner using SSD caching (as opposed to an all-SSD solution which requires data / table migration or relocation on SSD).

    Vadim, can you pls share some insights on read/write ratio, working set size, buffer pool size, etc. specifically for your experiments /graphs shared above?

Speak Your Mind

*