A while ago I started a series of posts showing benchmark results on Amazon EC2 servers with RAID’ed EBS volumes and MySQL, versus RDS machines. For reasons that won’t add anything to this discussion, I got sidetracked, and then time passed, and I no longer think it’s a good idea to publish those blog posts in the format I was planning. Instead, I want to write an overview of these two approaches to hosting MySQL in the Amazon cloud.

In general, MySQL performance overall on EC2 and EBS isn’t always great in comparison to what you can get on physical hardware, even low-to-medium sized servers. It’s not that it’s terrible (in most cases), but it’s not always great. There are specific use cases in which it’s perfectly acceptable and even good, but the range of cases isn’t as broad as what you can push your own servers to deliver.

Here’s why: you’re limited in the number and speed of CPU cores you can get, and I/O performance can be highly variable. You can mask I/O problems on some workloads by making I/O not matter, but that doesn’t always work, and it relies on memory, which only goes so big in the Amazon cloud. And you’re on a platform where some resources are shared, and that makes the performance of those resources tend to vary a lot.

None of these characteristics is a bad thing in itself. It’s just that MySQL can’t tolerate these weaknesses very well in some cases. I’m not trying to say you shouldn’t use the Amazon cloud to host your databases. You just need to know how the circumstances differ from physical hardware, and whether that is important.

Here are some observations you can consider. They don’t cover all cases, but they are things to consider.

  • The Amazon cloud platform can work very well when you don’t need high concurrency from your MySQL database. When you do, the limitations on the core counts and effective CPU speeds can be a bottleneck. My experience and intuition are that an actual concurrency of 8 to 12 is your effective ceiling in today’s largest EC2 instances. That is concurrency inside the database, not at the application level. You can consider this as equivalent to the number of threads inside InnoDB, or to the Threads_running SHOW STATUS counter, without much loss of generality.
  • The Amazon cloud platform with a RAID of EBS volumes can work well when you do not demand much I/O, either in operations per second or in throughput. If you require many operations per second, then it is likely that some of them will be affected by the higher variability of response times in EBS as compared to directly attached disks or SSD devices. InnoDB doesn’t handle this very well, because I/O tends to be a blocking operation in InnoDB, and this can cause stalls ranging from micro-stalls to quite severe stalls as I/O requests stack up and other things block against them. If you create a RAID volume over many EBS volumes, you are more likely to be affected by this, because the more EBS volumes you have, the higher probability that one of them will perform unpredictably, and you are essentially bound by the weakest link in the chain, due to the way InnoDB handles I/O. If you demand a lot of throughput from your storage, then you are likely to be affected by the shared nature of EBS storage, which can provide variable bandwidth at different times.
  • It is widely speculated on the Internet, and seems reasonable from my own observations, that Amazon RDS is built on EC2 machines and EBS RAID volumes under the hood. I have no inside information to share on that speculation, but for all intents and purposes, it seems valid. However, the versions of MySQL that Amazon runs on RDS may not be the most advanced available. My benchmarks generally showed that I was not able to outstrip RDS 5.1’s performance with the combination of stock MySQL 5.1, EC2, and a 10-volume EBS RAID. However, I was able to get better performance with Percona Server 5.1, probably due to improvements such as our I/O algorithms and reduced mutex contention. The performance improvements I got with Percona Server were similar to the performance improvements I saw versus stock MySQL 5.1. This leads me to a second unverifiable speculation, that Amazon RDS’s version of MySQL is probably closer to stock MySQL than it is to Percona Server. I haven’t run extensive benchmarks against RDS 5.5, Percona Server 5.5, and MySQL 5.5.

These observations lead to the following guidelines.

  • Whether your database is read-heavy or write-heavy, you generally need the actively accessed set of data (what we call the working set) to fit in the InnoDB buffer pool, which means you need it to fit in memory. The largest instances currently have a bit less than 70GB of memory, so you can consider something like 60 or so GB to be the biggest you can push your working set size. If it spills over that size, you might cause a lot of I/O, and there is a chance that will cause MySQL to perform slowly or unpredictably.
  • For write-heavy databases, you are generally bound by either the working set size versus the buffer pool size, or the sheer volume of I/O activity you’re generating, whichever comes first. And you could find that you generate more I/O than you can push to EBS, even if the working set size is not larger than the buffer pool size.

As a result, if you’re going to build a big database in the Amazon cloud platform, bigger than a single instance can hold or with more write activity, you need to plan to shard from the beginning. That’s just what you have to do, at least at this point in time. (Nothing remarkable about this — the same is true for databases that are bigger than a single physical server.) Tangent: at this point I expect a certain cloud database provider beginning with an X to insert a plug into this post’s comments. I haven’t evaluated their technology, so I can’t comment on it. I’m sure their funders would like us to evaluate them on a paid basis and report to our readers on the result. We do that for many companies 🙂 OK, end of tangent.

However, if you are careful with your physical and logical design, you can make certain workloads, such as insert-heavy, work better in the Amazon cloud platform. But this is a delicate balance. It requires tricks, such as partitioning so all the inserts go into one partition, whose indexes fit in memory. The more elaborate you get with this — such as putting your transaction log files onto the local disks, for example — the more operational complexity and cost you have, so that’s something to think about.

The physical and logical database design influences greatly how much memory and disk resources are required. The application’s access patterns can be just as powerful a lever. Thus, careful design can be extremely beneficial in getting a lot more from your database server.

On a business level, consider the benefits and drawbacks of RDS versus building the equivalent system yourself. RDS is nice in that it’s managed for you. You don’t have to do a lot of system administration work with RDS; you outsource that to Amazon, and you just do the database administration work. This can be a big relief, and it’s not a bad value for the money compared to building servers with EC2 and EBS. However, sometimes you might like more control over it yourself, such as the ability to customize your server version, or to manipulate the database files directly. The cost, of course, is that the sysadmin work is now your job.

Finally, there are a number of advantages to working in the AWS cloud. Others have pointed these out much more thoroughly than the drawbacks, in my opinion. But I need to at least mention the existence of key advantages at the technical level. These include EBS volume snapshots, for example. They work much better than LVM snapshots, in terms of impact to the system’s performance and ability to mount them on other machines. This is really nice for making replicas and backups, for example. I could name a bunch of other nice properties, but I think that’s not directly on-topic for this post.

The bottom line is that there is not a huge performance difference between EC2+EBS+MySQL versus RDS, in most cases, unless you use Percona Server; but it’s still not orders-of-magnitude different. So my experience is that you can decide between build-your-own and database-as-a-service based on your business needs, considering factors such as the availability of staff to manage the machines. On the technical side, don’t expect either architecture to knock your socks off with its performance, but if you can fit your working set of data into the buffer pool (with careful physical, logical, and application design) and you’re not so write-heavy that you’re doing a lot of I/O, performance can be quite acceptable or even very good.

Do you have experience running MySQL in the Amazon cloud to share with other readers? I welcome your comments, as always.

14 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Liran Zelkha

Not just the X friends 😉
I totally agree with your results – at ScaleBase we’ve seen the same – if you database is over 50-60GB in size – you should probably shard it – or move to a physical environment.

Wasif

Nice article Baron, covering almost every performance aspect of EC2 and RDS. I think one can reduce the risk of unpredictable performance of EC2/EBS by using the maximum available instance (assuming the biggest instance will run on one physical machine). Am I right?

RDS has another good feature of multi-availability especially for those who doesn’t have enough technical staff to keep things online but this increases the latency.

Peter Zaitsev

Liran,

I think it is also good question of the cost and inconvenience of dealing with many servers/instances. With good IO subsystem and decent amount of memory I often see physical hardware handling 2TB+ databases with MySQL. That sharded by even 200GB mark would require 10 instances which is less than optimal.

Avi Kapuya

I think you may find this benchmark on RDS and Xeround DB’s very interesting, and with Xeround, there is no need to do sharding, it is distributed by nature. That’s how high concurrency with low latency is achieved.
http://xeround.com/cloud-database-comparison/xeround-vs-amazon-rds-benchmark/

Lon

Did you RAID your EBS volumes using 5 or 0 or something else? I’ve seen some benchmarks that offer vast improvements to IO using RAID 0 EBS volumes. Just curious if that was part of your testing.

Thanks!

Daniel Kadosh

,
Why not just RAID 0 (striping), since EBS volumes are already mirrored?
Just curious if you’ve heard from any of your customers on Amazon if they’ve had an EBS volume actually fail.

Here’s a quote from http://aws.amazon.com/ebs/
“Because Amazon EBS servers are replicated within a single Availability Zone, mirroring data across multiple Amazon EBS volumes in the same Availability Zone will not significantly improve volume durability.”

Wasif

Baron,

Did you get a chance to play with Provisioned IOPS for EBS? How did you find this new offering?

Nuodb

NuoDB provides standards-based drivers for JDBC, ODBC and ADO.NET. Additionally, there are drivers developed by the community which are open source and available via GitHub

zhujzhuo

The highest iops for RDS is 4000,and EC2 EBS is 30000, Did you get a chance to play with Provisioned IOPS for EBS ,and RDS ?
Range of Provisioned IOPS Range of Storage Range of IOPS to Storage (GB) Ratio
MySQL 1000 – 30,000 IOPS 100 GB – 3 TB 3:1 – 10:1

zhujzhuo

sorry ,the Previous information is wrong,

The highest iops for RDS is 30000,and EC2 EBS is 4000, Did you get a chance to play with Provisioned IOPS for EBS ,and RDS ?
Range of Provisioned IOPS Range of Storage Range of IOPS to Storage (GB) Ratio
MySQL 1000 – 30,000 IOPS 100 GB – 3 TB 3:1 – 10:1