Amazon’s Relational Database Service (RDS) is a cloud-hosted MySQL solution. I’ve had some clients hitting performance limitations on standard EC2 servers with EBS volumes (see SSD versus EBS death match), and one of them wanted to evaluate RDS as a replacement. It is built on the same technologies, but the hardware and networking are supposed to be dedicated to RDS, not shared with the general usage of AWS as you get on normal EC2 servers with EBS.

I benchmarked the largest available RDS instance, which is listed as “High-Memory Quadruple Extra Large DB Instance: 68 GB of memory, 26 ECUs (8 virtual cores with 3.25 ECUs each), 64-bit platform, High I/O Capacity.” I used sysbench’s oltp benchmark, with 400,000,000 rows. This creates a table+index data size approximately twice as big as memory, so the workload should be somewhat IO-bound.

My goal for this benchmark is long-term performance, but as a side project, I thought it would be interesting to measure the single-threaded insert throughput as sysbench ran the “prepare” phase and filled the table with 400 million rows. Here is the chart of rows inserted per minute (click on the image for a bigger view):

We can deduce a few things from this.

  1. The overall downward slope of the line is steady enough to show that we did not cross a dramatic memory-to-disk threshold, as famously happens in B-Tree inserts (see InnoDB vs TokuDB for example). This doesn’t mean that we weren’t IO-bound; it might only mean that we were IO-bound the whole time waiting on fsync operations. But we didn’t go from a solely in-memory bottleneck to solely on-disk.
  2. The insert performance is quite variable, more so than I would like to see. My intuition is that there are some severe I/O slowdowns.
  3. I should have gathered more statistics and finer-grained samples, say, every 5 seconds instead of every minute, and some samples of more data such as SHOW INNODB STATUS. But I was on the client’s time and I wasn’t going to spend time redoing it — I did not see that it would really benefit them.
  4. Finally, a single-threaded insert workload is not very revealing. To understand the sustained write performance of an RDS instance, we need a multi-threaded long-term insert benchmark such as IIBench.

In the next post in this series, we will see how the Amazon RDS instance performed at various thread counts on the OLTP benchmark.

Update Vadim and Peter have rightly pointed out that I shouldn’t have published this result without being able to explain exactly what was happening on the server. I will reproduce this test, capture more measurements about what was going on, and post a follow-up before I continue with the actual sysbench benchmark results.

37 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Jim B.

50K rows per second? Isn’t that kindof amazingly awesome? Does sysbench ‘prepare’ bulk insert or something? Even so.

Jamie McCarthy

The key figure I’d be interested in is write ops/sec. With innodb_flush_log_at_trx_commit = 1, how many update transactions can you commit per second, and what does iostat’s “w/s” say while you’re doing them? Can you even do those tests using RDS?

Jim B.

3M rows/minute = 50K/second
400M rows in ~135 minutes = ~50K/second

Peter Zaitsev

Baron,

I should note what with Insert the “working set” is not going to be the whole table, The Auto increment key insert go in the “end” of the primary key. The other keys in sysbench are also relatively clustered. So this should not be random IO bound workload. It should be mainly CPU bound. If you want IO bound workload you have to have some keys which are random, for example if you ad SHA1(id) as a key it would become io bound on insert at some time in the future.

Jonathan Marston

Have you looked at benchmarking the EC2 High Performance Cluster Compute instance type?

http://aws.amazon.com/ec2/hpc-applications/

Not available for RDS (yet) but this would look to be a good path to optimize MySQL performance for in the amazon environment. App servers and replication slaves could all be running on HPC instances in the same placement group.

Patrick Casey

Does anybody have any (recent) experience with highly parallel inserts into INNODB?

I’m working on a project now where I have to write out several hundred million rows into an innodb database in the shortest possible time frame (long story as to why).

On my current hardware/software configuration (Mysql 5.051a/INNODB), I seem to cap out something in the innodb kernel around 8 parallel inserting threads. Any more than that and I actually get (slightly) negative throughput.

I checked the obvious stuff :

innodb_thread_concurrency is set to 32
vmstat shows 6-7 cores spinning and about 1 core worth of IO wait
I have 8 idle CPU cores.
IOStat shows I’m not capped (busy, but not pegged)
Log files are quite large (2G)

I’ve read some posts (on this blog I think), that older versions of innodb have concurrency issues around 8 cores and up, but frankly I’ve not seen that issue before with my typical, read heavy, workload; was wondering if I’ve finally run up against some of the internal database limits that things like xtra-db are aiming to correct?

Patrick Casey

Thanks Baron,

I need to look into whether or not its practical to upgrade. Be easier of course if I could guarantee beforehand “yeah, verily, if we upgrade it will be 3.14159 times faster”, but, of course, I can’t :). I’ll have to lab it out and see what I find.

Its one of thoses cases though where I’m honestly curious. Usually I’m pretty good at figuring out where the bottleneck is, and in this case I’m stumped, not IO saturation, not CPU saturation, not an obvious db concurrency limit, all of which leaves me waving my hands and rambling on about OS wait states and various other speculative things.

Basically, I just like to understand how this stuff words and I’m off my mental map on this one so was hoping somebody here had a hint or two :).

Peter Boros

Patrick: do you really have 16 cores, or you have 8 cores with hyperthreading? Also, partitioning would help you if the table is large, index maintenance will be cheaper.

Patrick Casey

8 physical cores with hyperthreading; its a pair of intel L5520s. I’m not showing CPU saturation through. 35-40% User CPU, maybe 5-10% System and then 50% or so idle. On other workloads (16 threads doing queries with large memory sort selects), I can drive CPU saturation up to 100%, so I know its possible to use ’em all.

We’re actually already partitioning the table into 10 chunks, which seems to be close to optimal. More chunks actually slows us down in this use case.

Patrick Casey

Not sure if I should be embarrassed or not, but I’d never heard of that tool before. Think I’m going to take a crack at it; I’ll let folks know if I find anything interesting.

Christian

Just another thing…i just searched for similar benchmarks…but with the same “quadrupel super cool server” they just handle up to 7000 tpmC.

And you got it 1000 times faster? Or did I miss sth?

Christian

Hello Baron,

pretty amazing benchmark results..

Can you estimate when part2 of this article (multithreaded benchmarking) will be available?

Would it be even possible to build a cluster of cloud-instances…because I’m looking for a solution which can handle approx. 800 million inserts per hour…so I would need 5 instances for that task.

Patrick Casey

Not sure if I should be embarrassed or not, but I’d never heard of that tool before. Think I’m going to take a crack at it; I’ll let folks know if I find anything interesting.

Patrick Casey

8 physical cores with hyperthreading; its a pair of intel L5520s. I’m not showing CPU saturation through. 35-40% User CPU, maybe 5-10% System and then 50% or so idle. On other workloads (16 threads doing queries with large memory sort selects), I can drive CPU saturation up to 100%, so I know its possible to use ‘em all.

We’re actually already partitioning the table into 10 chunks, which seems to be close to optimal. More chunks actually slows us down in this use case.

Peter Boros

Patrick: do you really have 16 cores, or you have 8 cores with hyperthreading? Also, partitioning would help you if the table is large, index maintenance will be cheaper.

Patrick Casey

Thanks Baron,

I need to look into whether or not its practical to upgrade. Be easier of course if I could guarantee beforehand “yeah, verily, if we upgrade it will be 3.14159 times faster”, but, of course, I can’t :) . I’ll have to lab it out and see what I find.

Its one of thoses cases though where I’m honestly curious. Usually I’m pretty good at figuring out where the bottleneck is, and in this case I’m stumped, not IO saturation, not CPU saturation, not an obvious db concurrency limit, all of which leaves me waving my hands and rambling on about OS wait states and various other speculative things.

Basically, I just like to understand how this stuff words and I’m off my mental map on this one so was hoping somebody here had a hint or two :) .

Patrick Casey

Does anybody have any (recent) experience with highly parallel inserts into INNODB?

I’m working on a project now where I have to write out several hundred million rows into an innodb database in the shortest possible time frame (long story as to why).

On my current hardware/software configuration (Mysql 5.051a/INNODB), I seem to cap out something in the innodb kernel around 8 parallel inserting threads. Any more than that and I actually get (slightly) negative throughput.

I checked the obvious stuff :

innodb_thread_concurrency is set to 32
vmstat shows 6-7 cores spinning and about 1 core worth of IO wait
I have 8 idle CPU cores.
IOStat shows I’m not capped (busy, but not pegged)
Log files are quite large (2G)

I’ve read some posts (on this blog I think), that older versions of innodb have concurrency issues around 8 cores and up, but frankly I’ve not seen that issue before with my typical, read heavy, workload; was wondering if I’ve finally run up against some of the internal database limits that things like xtra-db are aiming to correct?

Jonathan Marston

Have you looked at benchmarking the EC2 High Performance Cluster Compute instance type?

http://aws.amazon.com/ec2/hpc-applications/

Not available for RDS (yet) but this would look to be a good path to optimize MySQL performance for in the amazon environment. App servers and replication slaves could all be running on HPC instances in the same placement group.

Peter Zaitsev

Baron,

I should note what with Insert the “working set” is not going to be the whole table, The Auto increment key insert go in the “end” of the primary key. The other keys in sysbench are also relatively clustered. So this should not be random IO bound workload. It should be mainly CPU bound. If you want IO bound workload you have to have some keys which are random, for example if you ad SHA1(id) as a key it would become io bound on insert at some time in the future.

Jim B.

3M rows/minute = 50K/second
400M rows in ~135 minutes = ~50K/second

Jamie McCarthy

The key figure I’d be interested in is write ops/sec. With innodb_flush_log_at_trx_commit = 1, how many update transactions can you commit per second, and what does iostat’s “w/s” say while you’re doing them? Can you even do those tests using RDS?

Pedro Werneck

I see no improvement from RDS with innodb_flush_log_at_trx_commit set to 2 or 0. My benchmarks have the same results, regardless of the setting. Does RDS fools the flush in some way or has the setting disabled somehow?