April 20, 2014

Are you designing IO bound or CPU bound application ?

This topic may look boring and obvious but it is extremely important for MySQL Performance Optimization. In fact I probably have to touch it in every second MySQL Consulting work or even more frequently.

IO Bound workload is quite different from CPU bound one, which happens when your working set (normally only fraction of your database) fits in memory. What is fast when data is in memory can be extremely slow if it does not. For example if you have the query which analyzes 10000 rows it often would take fraction of the second with fully in memory workload, however if you would need to go to the disk, lets say even only in 10% of the cases and so perform 1000 possibly random reads you will have query taking at least 5-10 second, or more under the load which is already way more than you should target for web applications.

So designing your application think what kind of application are you designing ? Can you make it CPU bound/In memory ? If yes the whole class of the problems may not exist and you might be able to use solutions which are easier to implement. But beware if you design your application as CPU bound and when it scales so much you can’t afford to get enough memory any more, you might have very sharp performance drop and complex changes may be required to get your application back to speed.

The cases when going from CPU bound to IO bound hits the most is there a lot of rows are analyzed – count queries, group by, order by without indexes, search queries etc. Basically as soon as you have more than 100 rows analyzed by the query and these rows are “random access” to large tables (so chances they would require physical IO are high) I would highlight this query as possibly having performance problems.

Also do not look just at “typical” case – in many cases worst 5% would be responsible for majority of performance problems.

Let me illustrate it on simple case. Assuming you have Web application which has some form of messaging between users. You may want to display to the user number of unread messages as well as use total number of messages in the mailbox at least to draw “pager”. Easy solution is to do select count(*) from messages where user_id=134 or use SQL_CALC_FOUND_ROWS flag for your main select query. If you’re having CPU bound application it is as much as you might need to go especially if you have some form of caching on top of that. For IO bound application you however will run in trouble pretty soon even with 1000 messages in mailbox may start slowing things down.

Now you may get few percent of very active users who will get both extreme number of messages in their mailbox and will spend a lot of time on the site – generating much more load than average user, so their contribution to the load my be significant plus you do not want to piss of your most loyal users with slow page load times.

So for IO bound applications you will need to add counts for all messages, read messages etc, make sure they are updated (ie use triggers) and make sure all selects use index for ORDER BY .. LIMIT.

For IO bound applications Clustering (data locality) also becomes very important – if Innodb tables are used having simply auto_increment id on messages would likely be much slower than (user_id,sub_id) combined primary key as this one will cluster messages for same user_id and normally allow to fetch them all using only few physical IOs.

You may argue you still will have the problems with such design in CPU bound case – yes you will but it will happen with 100.000 messages not 100 messages as in IO bound case which is large enough for many application classes not to think about it.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. p says:

    hi,

    i enjoy reading your site. i read all these articles talking about mysql replication and how you can send writes to the master and reads to the slave. but i never see anything related on how to actually do this. is this something that is handeled by a mysql configuration or does this have to be coded in the application? please let me know i’ve been trying to find the answer to this for weeks.

    thanks

  2. James Day says:

    A beautiful example of this happened at Wikipedia. The article version history was originally stored in date of edit order. Showing 50 rows of history starting 1000 versions ago would take 1050 seeks. Histories can be 50,000 or more versions long, so it could be 50,000 seeks for one page view. It was changed to have article ID as the first part of the primary key, so it became many revisions per page and adjacent pages for nearby revisions. It became an insignificant load issue and also very easy to cache effectively, significantly cutting the working set size.

    Then it was changed to do something better than LIMIT for the selection of older revisions.

    This sort of thing is one reason why even a simple message board could benefit from InnoDB with good primary key choice, so all posts in a topic are automatically physically adjacent.

  3. Filip says:

    I have one question about data clustering. I tought that in innodb records are clustered when query ALTER TABLE tab1 ENGINE=innodb is executed (in case when multiple primary key is defined)

    Do I need execute sometimes this query or records are clustered automaticaly on-the-fly with insert/update statements?

  4. Alexey says:

    Peter,
    Don’t you think that InnoDB is not suitable for IO bound load by design?

  5. peter says:

    Alexey, Why is that ?

    On the contrary I think Innodb is the storage engine to use for IO bound workload. IO Bound means large database size and large database size means very long repair time for MyISAM tables in case of crash. This if of course if you do not have read only load or you do not care about your data and do not repair tables after crash.

  6. peter says:

    James yes Wikipedia is great example and any forum, blog messaging can benefit from data clustering which Innodb has to offer.

  7. peter says:

    Regarding Master/Slave question – this is something you have to do manually and it is application dependent. I should write some more about possible techniques though.

  8. peter says:

    Filip,

    Yes Innodb always clusters data by primary key so once you change storage engine to Innodb you data will become clustered by primary key for good or bad.

    You may also run OPTIMZE TABLE on it after conversion if original data was not in PK order – this will give better layout.

  9. Breezes says:

    Hi, peter. Suppose an application is IO bound, how can I find out which table/index is the main cause?

  10. peter says:

    Breezes,

    Unfortunately there is no stats in MySQL at this point which would show amount/time of IO waits for particular objects, it may come though in some later MySQL version.

    I usually use our slow query profiling tool and patch to log all queries with their execution times to summ it up and find queries taking most of resources. Optimizing these usually gives most impact.

Speak Your Mind

*