January 30, 2007

Linux IO Schedulers and MySQL

Posted by peter |

Found a great article about Linux IO Schedulers today which is quite interesting. It goes in details about schedulers and explains in which of workloads which of schedulers is best.

The interesting thing this article points out is – there are multiple versions of each of the schedulers, while name remains the same. This means unless you really know mapping between kernel versions and scheduler versions it is very hard to evaluate benchmark results.

This could be noticed by benchmarks we’ve done over years. Long time ago “AS” scheduler could be several times slower than deadline for MySQL workloads such as SysBench or DBT2 when it went down to 30% difference and in the last runs we’ve done difference was not really significant.

This article also points out benchmarking IO schedulers you should look at more numbers than aggregate bandwidth – you also better to measure per client bandwidth as well as max latency as this is what can be the problem. Take a look at these old results for example. It also means you’d better to perform IO scheduler benchmarks on mixed load with different of task, for example mixing OLTP with some reporting queries if you really want to see the difference.

From the article it looks like CFQ should be good choice for databases and it is also found to work pretty well by some benchmarks we’ve done. The only question if it is doing as good as it could – In the docs it is mentioned it uses “per process” scheduling while MySQL is single process but single thread – does each thread gets its own queue in reality or is it shared ?

We should look into this when we’ll run more benchmarks for IO Schedulers.

Making MySQL Replication Parallel

Posted by peter |

Kevin Burton writes about making MySQL Replication Parallel. Many of us have been beaten by the fact MySQL Replication is single threaded so in reality it is only able to use only single CPU and single disk effectively which is getting worse and worse as computers are getting “wider” these days with multi-core CPUs.

Kevin proposes to execute queries in parallel and it is generally good idea, the problem is however implementing it right without changing MySQL Replication semantics – which is – Slave database state corresponds to master database state at certain point in time. It is delayed but threads reading from the slave never will see state of the database which never existed on master.

As I commented in Kevins blog the problem is very simple to illustrate – assume you have 2 queries modifying 2 different tables, query A and query B. On the Master query A completed first and B followed it. On the slave we execute them in parallel so query B may complete before query A causing database stage which never existed on the master. Of course the idea could be to wait on final commit stage and commit queries A and B in order defined by Master but it brings to the plate other problems such as possible deadlocks between queries if they are complex transactions.

It should be however not as bad if we only look at single queries or transactions which do not have any overlap in terms of tables.

For some users commit order for independent queries may be unimportant so this restriction could be weakened to only make sure there is a “barrier” between queries which are possibly dependent on each other, such as reading or writing to the same tables.

There is other possible solution it is to allow multiple threads inside the server to share same transactional/lock context. In this case replication could accumulate number of queries execute them in parallel and then commit all at once.

None of these however are easy trick which I would expect to come quite soon.

On other hand if support for Multi-Master is implemented for many applications Parallel Replication could be implemented simply by filtering transactions and writing to number of binary logs.

If you’re “Scaling Out” you may just treat single server as it is few servers, so place several independent pieces on it, for example if different databases. Now if you could setup filtering so updates for each of them is written to its own binary log file and setup multi-master replication so slave can read all of them in parallel you can get replication parallel enough for many application without serious code complications.

If MySQL would not implement it it might be nice feature to hack into community tree.