So during preparation of XtraDB template for EC2 I wanted to understand what IO characteristics we can expect from EBS volume ( I am speaking about single volume, not RAID as in my previous post). Yasufumi did some benchmarks and pointed me on interesting behavior, there seems several level of caching on EBS volume.

Let me show you. I did sysbench random read IO benchmark on files with size from 256M to 5GB with step 256M. And, as Morgan pointed me, I previously made first write, to avoid first-write penalty:

for reference script is:

And raw results (for m.large instance, though for m.xlarge it was similar) are available on page
https://spreadsheets.google.com/ccc?key=0AjsVX7AnrCYwdFlBVW9KWVJGUGFqeVdpUHY0Y0VXYXc&hl=en, see Sheet “256_5GB filesize”.

Results in graph are:

randrd_sizes
So can you see several levels of results
256M-1.25G , 1.5G – 2.25G, 2.5G + .

With 1.5G-2.25G we see performance comparable with RAID10 on 4 disks, and with
2.5G+ results are similar for single HDD performance.

So we may guess the schema of storage is
schema

So running InnoDB on database bigger 2.5G, you may expect performance as from single HDD, and you may consider some RAID setup, see my previous post
EC2/EBS single and RAID volumes IO benchmark

5 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
wizardofcrowds

This is somewhat very similar to what I found but much more primitive way. Here is a thread on EC2 developer forum.

http://developer.amazonwebservices.com/connect/thread.jspa?messageID=110072

AndrewC@AWS said “There is no throttling per se on EBS; however, some of the system components are shared resources. You may experience contention, which can reduce your performance from the theoretical maximum. In this particular case, your first set of writes are serviced by writing into a cache. Eventually, the cache is full and then you are bound by the throughput of the underlying disk arrays.”

Peter Zaitsev

Vadim,

Indeed. I also have seen cache on EC2 EBS. I do not think however it is something like 4 disk RAID you mention – it is all shared infrastructure to start with. I’d expect there is some cache which have certain performance – note the response time you’re seeing are well below 5ms you would see from physical spinning drive.

This is indeed challenge for “cloud” envinronment which is both shared as well as loosely specified – because you do not know how much cache you’re dealing with you do not know if workload you’re running is “cached” or not, this means you can’t predict how performance will drop with data growth.

Ijonas Kisselbach

Your link to your previous post on “EC2/EBS single and RAID volumes IO benchmark” doesn’t seem to work.

Gabriela

Right away I am going to do my breakfast, when having my breakfast coming
again to read additional news.