Amazon’s Relational Database Service (RDS) is a cloud-hosted MySQL solution. I’ve had some clients hitting performance limitations on standard EC2 servers with EBS volumes (see SSD versus EBS death match), and one of them wanted to evaluate RDS as a replacement. It is built on the same technologies, but the hardware and networking are supposed to be dedicated to RDS, not shared with the general usage of AWS as you get on normal EC2 servers with EBS.
I benchmarked the largest available RDS instance, which is listed as “High-Memory Quadruple Extra Large DB Instance: 68 GB of memory, 26 ECUs (8 virtual cores with 3.25 ECUs each), 64-bit platform, High I/O Capacity.” I used sysbench’s oltp benchmark, with 400,000,000 rows. This creates a table+index data size approximately twice as big as memory, so the workload should be somewhat IO-bound.
My goal for this benchmark is long-term performance, but as a side project, I thought it would be interesting to measure the single-threaded insert throughput as sysbench ran the “prepare” phase and filled the table with 400 million rows. Here is the chart of rows inserted per minute (click on the image for a bigger view):
We can deduce a few things from this.
- The overall downward slope of the line is steady enough to show that we did not cross a dramatic memory-to-disk threshold, as famously happens in B-Tree inserts (see InnoDB vs TokuDB for example). This doesn’t mean that we weren’t IO-bound; it might only mean that we were IO-bound the whole time waiting on fsync operations. But we didn’t go from a solely in-memory bottleneck to solely on-disk.
- The insert performance is quite variable, more so than I would like to see. My intuition is that there are some severe I/O slowdowns.
- I should have gathered more statistics and finer-grained samples, say, every 5 seconds instead of every minute, and some samples of more data such as SHOW INNODB STATUS. But I was on the client’s time and I wasn’t going to spend time redoing it — I did not see that it would really benefit them.
- Finally, a single-threaded insert workload is not very revealing. To understand the sustained write performance of an RDS instance, we need a multi-threaded long-term insert benchmark such as IIBench.
In the next post in this series, we will see how the Amazon RDS instance performed at various thread counts on the OLTP benchmark.
Update Vadim and Peter have rightly pointed out that I shouldn’t have published this result without being able to explain exactly what was happening on the server. I will reproduce this test, capture more measurements about what was going on, and post a follow-up before I continue with the actual sysbench benchmark results.
50K rows per second? Isn’t that kindof amazingly awesome? Does sysbench ‘prepare’ bulk insert or something? Even so.
The key figure I’d be interested in is write ops/sec. With innodb_flush_log_at_trx_commit = 1, how many update transactions can you commit per second, and what does iostat’s “w/s” say while you’re doing them? Can you even do those tests using RDS?
Jim, that is inserts per minute, not per second.
Jamie, RDS doesn’t provide access to iostat. You can only connect to the machine via mysql. I should have monitored the machine with Cloud Watch to get better metrics, although it still isn’t as good as I’d like.
3M rows/minute = 50K/second
400M rows in ~135 minutes = ~50K/second
Jim — sorry, I replied too hastily and didn’t pay attention to your comment. You’re right.
Baron,
I should note what with Insert the “working set” is not going to be the whole table, The Auto increment key insert go in the “end” of the primary key. The other keys in sysbench are also relatively clustered. So this should not be random IO bound workload. It should be mainly CPU bound. If you want IO bound workload you have to have some keys which are random, for example if you ad SHA1(id) as a key it would become io bound on insert at some time in the future.
Peter, yes, absolutely.
Have you looked at benchmarking the EC2 High Performance Cluster Compute instance type?
http://aws.amazon.com/ec2/hpc-applications/
Not available for RDS (yet) but this would look to be a good path to optimize MySQL performance for in the amazon environment. App servers and replication slaves could all be running on HPC instances in the same placement group.
Does anybody have any (recent) experience with highly parallel inserts into INNODB?
I’m working on a project now where I have to write out several hundred million rows into an innodb database in the shortest possible time frame (long story as to why).
On my current hardware/software configuration (Mysql 5.051a/INNODB), I seem to cap out something in the innodb kernel around 8 parallel inserting threads. Any more than that and I actually get (slightly) negative throughput.
I checked the obvious stuff :
innodb_thread_concurrency is set to 32
vmstat shows 6-7 cores spinning and about 1 core worth of IO wait
I have 8 idle CPU cores.
IOStat shows I’m not capped (busy, but not pegged)
Log files are quite large (2G)
I’ve read some posts (on this blog I think), that older versions of innodb have concurrency issues around 8 cores and up, but frankly I’ve not seen that issue before with my typical, read heavy, workload; was wondering if I’ve finally run up against some of the internal database limits that things like xtra-db are aiming to correct?
Jonathan, I have not benchmarked on the HPC clusters, and have no immediate plans to do so. Part of the reason is the cost. You can rack up a steep bill pretty quickly doing these things. We usually wait until a client wants to know the answer and is willing to pay for it — that’s our business, after all. I don’t have budget allocated for doing it pro bono. That said, I’ve put my foot in my mouth with this half-done blog post on insert throughput, without measuring half the things I need to be able to explain the behavior. As a result, I’m going to have to eat a little humble pie and spin up another RDS instance on Percona’s dime to repeat it, and do it right this time.
Patrick, I can’t tell from your description what the problem you’re seeing is, but it sounds typical of a number of possibilities, and as a general rule, 5.0.51a is abysmally bad at those types of things compared to late 5.1 releases with the InnoDB plugin or 5.5 (or Percona Server). I would honestly not suggest trying to optimize 5.0.anything if there is any way at all to upgrade to something newer. Or at least use Percona Server 5.0 — we were able to backport a lot of the improvements (though not all of them).
Thanks Baron,
I need to look into whether or not its practical to upgrade. Be easier of course if I could guarantee beforehand “yeah, verily, if we upgrade it will be 3.14159 times faster”, but, of course, I can’t :). I’ll have to lab it out and see what I find.
Its one of thoses cases though where I’m honestly curious. Usually I’m pretty good at figuring out where the bottleneck is, and in this case I’m stumped, not IO saturation, not CPU saturation, not an obvious db concurrency limit, all of which leaves me waving my hands and rambling on about OS wait states and various other speculative things.
Basically, I just like to understand how this stuff words and I’m off my mental map on this one so was hoping somebody here had a hint or two :).
Patrick: do you really have 16 cores, or you have 8 cores with hyperthreading? Also, partitioning would help you if the table is large, index maintenance will be cheaper.
8 physical cores with hyperthreading; its a pair of intel L5520s. I’m not showing CPU saturation through. 35-40% User CPU, maybe 5-10% System and then 50% or so idle. On other workloads (16 threads doing queries with large memory sort selects), I can drive CPU saturation up to 100%, so I know its possible to use ’em all.
We’re actually already partitioning the table into 10 chunks, which seems to be close to optimal. More chunks actually slows us down in this use case.
Patrick, I think “poor man’s profiler” is probably the tool for this situation.
Not sure if I should be embarrassed or not, but I’d never heard of that tool before. Think I’m going to take a crack at it; I’ll let folks know if I find anything interesting.
Patrick, there’s also a shiny version of it in Aspersa, along with nice docs on how to use.
Christian, I found some results that I didn’t understand, and delayed publishing more results until I did further benchmarking and validation. Unfortunately this ran me into the conference season, which is still ongoing, and will be followed immediately by a vacation. So it’s going to be a while, but I will finish and publish the results. Mind that this is an insert-only benchmark, and the other benchmarks you refer to sound like some other type of benchmark.
Just another thing…i just searched for similar benchmarks…but with the same “quadrupel super cool server” they just handle up to 7000 tpmC.
And you got it 1000 times faster? Or did I miss sth?
Hello Baron,
pretty amazing benchmark results..
Can you estimate when part2 of this article (multithreaded benchmarking) will be available?
Would it be even possible to build a cluster of cloud-instances…because I’m looking for a solution which can handle approx. 800 million inserts per hour…so I would need 5 instances for that task.
Patrick, there’s also a shiny version of it in Aspersa, along with nice docs on how to use.
Not sure if I should be embarrassed or not, but I’d never heard of that tool before. Think I’m going to take a crack at it; I’ll let folks know if I find anything interesting.
Patrick, I think “poor man’s profiler” is probably the tool for this situation.
8 physical cores with hyperthreading; its a pair of intel L5520s. I’m not showing CPU saturation through. 35-40% User CPU, maybe 5-10% System and then 50% or so idle. On other workloads (16 threads doing queries with large memory sort selects), I can drive CPU saturation up to 100%, so I know its possible to use ‘em all.
We’re actually already partitioning the table into 10 chunks, which seems to be close to optimal. More chunks actually slows us down in this use case.
Patrick: do you really have 16 cores, or you have 8 cores with hyperthreading? Also, partitioning would help you if the table is large, index maintenance will be cheaper.
Thanks Baron,
I need to look into whether or not its practical to upgrade. Be easier of course if I could guarantee beforehand “yeah, verily, if we upgrade it will be 3.14159 times faster”, but, of course, I can’t . I’ll have to lab it out and see what I find.
Its one of thoses cases though where I’m honestly curious. Usually I’m pretty good at figuring out where the bottleneck is, and in this case I’m stumped, not IO saturation, not CPU saturation, not an obvious db concurrency limit, all of which leaves me waving my hands and rambling on about OS wait states and various other speculative things.
Basically, I just like to understand how this stuff words and I’m off my mental map on this one so was hoping somebody here had a hint or two .
Patrick, I can’t tell from your description what the problem you’re seeing is, but it sounds typical of a number of possibilities, and as a general rule, 5.0.51a is abysmally bad at those types of things compared to late 5.1 releases with the InnoDB plugin or 5.5 (or Percona Server). I would honestly not suggest trying to optimize 5.0.anything if there is any way at all to upgrade to something newer. Or at least use Percona Server 5.0 — we were able to backport a lot of the improvements (though not all of them).
Jonathan, I have not benchmarked on the HPC clusters, and have no immediate plans to do so. Part of the reason is the cost. You can rack up a steep bill pretty quickly doing these things. We usually wait until a client wants to know the answer and is willing to pay for it — that’s our business, after all. I don’t have budget allocated for doing it pro bono. That said, I’ve put my foot in my mouth with this half-done blog post on insert throughput, without measuring half the things I need to be able to explain the behavior. As a result, I’m going to have to eat a little humble pie and spin up another RDS instance on Percona’s dime to repeat it, and do it right this time.
Does anybody have any (recent) experience with highly parallel inserts into INNODB?
I’m working on a project now where I have to write out several hundred million rows into an innodb database in the shortest possible time frame (long story as to why).
On my current hardware/software configuration (Mysql 5.051a/INNODB), I seem to cap out something in the innodb kernel around 8 parallel inserting threads. Any more than that and I actually get (slightly) negative throughput.
I checked the obvious stuff :
innodb_thread_concurrency is set to 32
vmstat shows 6-7 cores spinning and about 1 core worth of IO wait
I have 8 idle CPU cores.
IOStat shows I’m not capped (busy, but not pegged)
Log files are quite large (2G)
I’ve read some posts (on this blog I think), that older versions of innodb have concurrency issues around 8 cores and up, but frankly I’ve not seen that issue before with my typical, read heavy, workload; was wondering if I’ve finally run up against some of the internal database limits that things like xtra-db are aiming to correct?
Have you looked at benchmarking the EC2 High Performance Cluster Compute instance type?
http://aws.amazon.com/ec2/hpc-applications/
Not available for RDS (yet) but this would look to be a good path to optimize MySQL performance for in the amazon environment. App servers and replication slaves could all be running on HPC instances in the same placement group.
Peter, yes, absolutely.
Baron,
I should note what with Insert the “working set” is not going to be the whole table, The Auto increment key insert go in the “end” of the primary key. The other keys in sysbench are also relatively clustered. So this should not be random IO bound workload. It should be mainly CPU bound. If you want IO bound workload you have to have some keys which are random, for example if you ad SHA1(id) as a key it would become io bound on insert at some time in the future.
Jim — sorry, I replied too hastily and didn’t pay attention to your comment. You’re right.
3M rows/minute = 50K/second
400M rows in ~135 minutes = ~50K/second
Jim, that is inserts per minute, not per second.
Jamie, RDS doesn’t provide access to iostat. You can only connect to the machine via mysql. I should have monitored the machine with Cloud Watch to get better metrics, although it still isn’t as good as I’d like.
The key figure I’d be interested in is write ops/sec. With innodb_flush_log_at_trx_commit = 1, how many update transactions can you commit per second, and what does iostat’s “w/s” say while you’re doing them? Can you even do those tests using RDS?
I see no improvement from RDS with innodb_flush_log_at_trx_commit set to 2 or 0. My benchmarks have the same results, regardless of the setting. Does RDS fools the flush in some way or has the setting disabled somehow?