August 27, 2014

Benchmarking single-row insert performance on Amazon EC2

I have been working for a customer benchmarking insert performance on Amazon EC2, and I have some interesting results that I wanted to share. I used a nice and effective tool iiBench which has been developed by Tokutek. Though the “1 billion row insert challenge” for which this tool was originally built is long over, but still the tool serves well for benchmark purposes.

OK, let’s start off with the configuration details.

Configuration

First of all let me describe the EC2 instance type that I used.

EC2 Configuration

I chose m2.4xlarge instance as that’s the instance type with highest memory available, and memory is what really really matters.

As for the IO configuration I chose 8 x 200G EBS volumes in software RAID 10.

Now let’s come to the MySQL configuration.

MySQL Configuration

I used Percona Server 5.5.22-55 for the tests. Following is the configuration that I used:

You can see that the buffer pool is sized at 55G and I am using 4 buffer pool instances to reduce the contention caused by buffer pool mutexes. Another important configuration that I am using is that I am using “estimate” flushing method available only on Percona Server. The “estimate” method reduces the impact of traditional InnoDB log flushing, which can cause downward spikes in performance. Other then that, I have also disabled query cache to avoid contention caused by query cache on write heavy workload.

OK, so that was all about the configuration of the EC2 instance and MySQL.

Now as far as the benchmark itself is concerned, I made no code changes to iiBench, and used the version available here. But I changed the table to use range partitioning. I defined a partitioning scheme such that every partition would hold 100 million rows.

Table Structure

The table structure of the table with no secondary indexes is as follows:

While the structure of the table with secondary indexes is as follows:

Also, I ran 5 instances of iiBench simultaneously to simulate 5 concurrent connections writing to the table, with each instance of iiBench writing 200 million single row inserts, for a total of 1 billion rows. I ran the test both with the table purchases_noindex which has no secondary index and only a primary index, and against the table purchases_index which has 3 secondary indexes. Another thing I would like to share is that, the size of the table without secondary indexes is 56G while the size of the table with secondary indexes is 181G.

Now let’s come down to the interesting part.

Results

With the table purchases_noindex, that has no secondary indexes, I was able to achieve an avg. insert rate of ~25k INSERTs Per Second, while with the table purchases_index, the avg. insert rate reduced to ~9k INSERTs Per Second. Let’s take a look at the graphs have a better view of the whole picture.

Note, in the above graph, we have “millions of rows” on the x-axis and “INSERTs Per Second” on the y-axis.
The reason why I have chosen to show “millions of rows” on the x-axis so that we can see the impact of growth in data-set on the insert rate.

We can see that adding the secondary indexes to the table has decreased the insert rate by 3x, and its not even consistent. While with the table having no secondary indexes, you can see that the insert rate is pretty much constant remaining between ~25k to ~26k INSERTs Per Second. But on the other hand, with the table having secondary indexes, we can see that there are regular spikes in the insert rate, and the variation in the rate can be classified as large, because it varies between ~6.5k to ~12.5k INSERTs per second, with noticeable spikes after every 100 million rows inserted.

I noticed that the insert rate drop was mainly caused by IO pressure caused by increase in flushing and checkpointing activity. This caused spikes in write activity to the point that the insert rate was decreased.

Conclusion

As we all now there are pros and cons to using secondary indexes. While secondary indexes cause read performance to improve, but they have an impact on the write performance. Well most of the apps rely on read performance and hence having secondary indexes is an obvious choice. But for those applications that are write mostly or that rely a lot on write performance, reducing the no. of secondary indexes or even going away with secondary indexes causes a write throughput increase of 2x to 3x. In this particular case, since I was mostly concerned with write performance, so I went ahead to choose a table structure with no secondary indexes. Other important things to consider when you are concerned with write performance is using partitioning to reduce the size of the B+tree, having multiple buffer pool instances to reduce contention problems caused by buffer pool mutexes, using “estimate” checkpoint method to reduce chances of log flush storms and disabling the query cache.

About Ovais Tariq

Ovais Tariq works as a Support Engineer for Percona. Before joining Percona, he worked for different types of companies ranging from a startup to a big media organization. He has worked in different roles, including that of a software engineer, development manager, DBA and consultant.

Comments

  1. Examining the insert buffer during the insertions on purchases_index should be interesting. You are seeing periodic drops in performance which might be the result of the insert buffer filling up and then decreasing performance until some space is free’ed again. XtraDB lets you accelerate the rate at which insert buffer pages are flushed to disk (innodb_ibuf_accel_rate) which might be interesting to tweak during your test. Mark Callaghan over at Facebook has seen big performance difference on insertion rate when increased insert buffer flushing was enabled (using the FB patch, not XtraDB but difference should be similar)

  2. Andy says:

    This benchmark doesn’t test the IO performance of EBS though as you’re not flushing to disks on each commit.

    What numbers do you get when you set innodb_flush_log_at_trx_commit to 1?

  3. Partitioning probably reduces the stress on the insert buffer. Does XtraDB/Percona Server have an option to use more IO for insert buffer merges when the ibuf gets too big?

  4. Interesting results, and well presented. Would you mind sharing the command line you used for your iiBench clients? Also, did you run a test with innodb_flush_log_at_trx_commit = 1?

  5. Bradley C Kuszmaul says:

    I think I can explain the performance you are seeing.

    But first I’d like to note that this experiment isn’t really running the iiBench problem. The point of iiBench problem is to measure the cost of index maintenance. This experiment isn’t measuring the cost of index maintenance. Transaction-id is auto-incremented, and so partitioning on transaction-id makes the insertions easy. Basically you fill in 100 million rows, then start a new partition and fill that in. During the time that you are modifying a partition it fits in main memory (and would have fit even with only a few GB of main memory). The cost of querying this database will be 10 times higher with this partitioning, however since you’ll have to look in each of the ten partitions to query one of the indexes. If you keep going, making the problem be 10B rows, the cost of querying will go up by another factor of ten (it will be 100 disk seeks to answer a random query, assuming you can get each index query down to one disk seek.) I claim that this schema is not properly indexed.

    Here’s a theory that seems to explain the performance you are seeing. The periodic performance variation corresponds to filling up a partition and starting a new one. (If I read your graph correctly, you didn’t graph the performance on the first 100M rows). The minimum performance troughs occur just as the partition switches. So performance gets worse and worse as a partition gets bigger and bigger, then we start a new partition, and Inno starts catching up and the performance starts getting better, and then the new partition starts getting too big, so Inno starts slowing down again.

    The factor of 4 speedup that occurs when you get rid of the secondary keys is simply because you are writing 4 times fewer B-tree values. The primary key and each secondary key each incur nearly the same cost, and the performance difference is basically that factor of 4.

    The variance is lower with only the primary key because the entire table fits in main memory. Each row is on the order of 30 bytes in the primary table, and so a billion rows is about 30GB. Since we are inserting on an auto-increment key, I would expect Inno to fill the B-tree pretty efficiently. With a 55GB buffer pool, we would expect the primary table to simply fit. But with all 4 indexes the data itself is 2 or 3 times bigger, and the B-tree nodes are filled perhaps only 3/4 full, making the database a total of, say, 4 times bigger. So around 300M or 400M rows, I would expect Inno starts hitting disk. Perhaps you could verify that by watching IOstat. Tim’s measurements (http://www.tokutek.com/2012/01/1-billion-insertions-%E2%80%93-the-wait-is-over/) on a newer version of iiBench showed that with 16GB of RAM, Inno hits the memory wall at about 100M rows, which matches my prediction that at 400M rows and 55GB RAM, Inno would hit a memory wall.

    Given that Memory sizes are perhaps 10 times bigger than they were 4 years ago when the iiBench project started, today the challenge would be to index 10B rows, not 1B rows. And it’s still a good challenge.

    The URL for iibench is slightly incorrect. The correct URL for version you used is http://tokutek.com/downloads/iiBench-1.0.3.1.tar.gz
    One of the advantages of the newer version of iiBench (the python version that Mark wrote http://bazaar.launchpad.net/~mdcallag/mysql-patch/mytools/annotate/head%3A/bench/ibench/iibench.py) is that it measures query performance too, so if you fail to actually index the data, the benchmark will notice.

  6. @Andy,

    I am not testing IO performance and instead trying to optimize writes, also the durability requirements are not that stringent and I can afford to loose 1 second worth of data, hence I did not test with innodb_flush_log_at_trx_commit=1.

    I would also not recommend running on EBS with innodb_flush_log_at_trx_commit=1, because IO times on EBS are not very consistent.

  7. @Time Callaghan,

    I did not test with innodb_flush_log_at_trx_commit=1 because the durability requirements are not that stringent. And following is how I invoked iiBench:

    iibench -T $tbl_name -D iiBench -r 200000000 -M 1 -s 1000000 -t 100000 -I 1 -a -S

  8. Mark,

    Probably what you are referring to is provided by the configuration option “innodb_ibuf_accel_rate” available in Percona (as mentioned by Justin). Increasing/decreasing the value of innodb_ibuf_accel_rate, you will increase/decrease the insert buffer activity.

  9. Justin,

    I would be doing another benchmark and use ‘innodb_ibuf_accel_rate’ together with tuning other IO related parameters that I can tune to good affect, and I will post my results.

  10. Any particular reason for choosing 4 1GB log files instead of 2 2GB?

  11. Aaron,

    No particular reasons, it doesn’t matter if you have 4 1G or 2 2GB log files, as they are all concatenated together by InnoDB and written to in a circular fashion.

  12. Steph says:

    Hi there,

    Completely tangential to your test but I think worth saying anyway, so here goes!

    With all AWS instances EBS throughput is ultimately governed by the HBA connection to the SAN. Even on an m2.4xlarge (where I believe you are the only guest) running any redundant RAID scheme will impact your throughput as the redundant writes still have to pass over the same wire. So if you have a 1Gbps backplane RAID 0 will write at a maximum throughput of 125MBps, RAID1 (2 mirrors) 64MBps, RAID1 (3 mirrors) 48MBps and so on, irrespective of how ‘fast’ the SAN can actually go.

  13. Hi Steph,

    RAID 10 gives me a good compromise between throughput and redundancy, and RAID 10 also allows me to replace the volumes that are not performing well and are seeing higher then usual latency (this is something that you have to be aware of because I have seen cases where the array performance degrades because of a EBS volume that is seeing high latency and hence an increase in service time). However, in the case of this benchmark, we weren’t really hitting any IO related bottlenecks, so it wouldn’t really have mattered if I chose RAID 0 or RAID 10.

  14. How do you think the new SSD instance will perform? Given that the 2x1GB of SSD disk is local to the VM perhaps this will improve?

    Also note that there is 10GbE too – which may improve EBS throughput too.

    Look forward to some real performance figures.

  15. Justin Gronfur says:

    Is there a distinct reason that you used innodb_file_per_table as opposed to the default single file? Just curious as we are rebuilding some of our database servers and are debating between those options. We have approximately 350,000 tables with a very high insertion rate… would innodb_file_per_table give us better performance?

  16. Justin,

    Whether innodb_file_per_table is going to help you or not depends on a couple of factors. If you are using ext3 then a single big tablespace can cause problems because of inode serialisation issue in ext3. If you are using xfs, you do not have to worry about that. Do you drop tables often? If you do, with innodb_file_per_table you can quickly reclaim space however, with a single big tablespace space is not given up by InnoDB. Do you plan on moving some of your databases to a different storage, if yes then you need innodb_file_per_table. Now 350,000 tables is a large number of tables, having innodb_file_per_table=1 could mean a lot of .ibd files, if they all belong to the same database you could see some issues with many files in a single directory, however, if you have the tables divided among multiple databases then you do not need to be concerned about that.

    So in the end it really depends on your usage and other factors some which I have listed in my response.

  17. Richard,

    SSD backed instances will mean a big improvement with respect to IO capacity available to the instance, so it does offer improvements when talking about the insert benchmark that I performed. In many parts of this benchmark, insert speed is limited by how quick the system can do checkpoints and how quickly it can flush pages in the background. So definitely if you have fast storage the stalls because of checkpointing activity should be smaller, and distant as compared to being more frequent with slow storage. But of course there are other internal limitations inside InnoDB that affect write performance, for example index level locks, etc, SSD backed storage is not going to help with that. But overall I think, insert performance should see any improvement when done using SSDs.

    I will see if I can find out time to run the similar benchmark on SSD backed EBS volumes.

  18. DanialR says:

    Hi
    i wished you compare read or SELECT query

Speak Your Mind

*