Kevin Burton writes about making MySQL Replication Parallel. Many of us have been beaten by the fact MySQL Replication is single threaded so in reality it is only able to use only single CPU and single disk effectively which is getting worse and worse as computers are getting “wider” these days with multi-core CPUs.

Kevin proposes to execute queries in parallel and it is generally good idea, the problem is however implementing it right without changing MySQL Replication semantics – which is – Slave database state corresponds to master database state at certain point in time. It is delayed but threads reading from the slave never will see state of the database which never existed on master.

As I commented in Kevins blog the problem is very simple to illustrate – assume you have 2 queries modifying 2 different tables, query A and query B. On the Master query A completed first and B followed it. On the slave we execute them in parallel so query B may complete before query A causing database stage which never existed on the master. Of course the idea could be to wait on final commit stage and commit queries A and B in order defined by Master but it brings to the plate other problems such as possible deadlocks between queries if they are complex transactions.

It should be however not as bad if we only look at single queries or transactions which do not have any overlap in terms of tables.

For some users commit order for independent queries may be unimportant so this restriction could be weakened to only make sure there is a “barrier” between queries which are possibly dependent on each other, such as reading or writing to the same tables.

There is other possible solution it is to allow multiple threads inside the server to share same transactional/lock context. In this case replication could accumulate number of queries execute them in parallel and then commit all at once.

None of these however are easy trick which I would expect to come quite soon.

On other hand if support for Multi-Master is implemented for many applications Parallel Replication could be implemented simply by filtering transactions and writing to number of binary logs.

If you’re “Scaling Out” you may just treat single server as it is few servers, so place several independent pieces on it, for example if different databases. Now if you could setup filtering so updates for each of them is written to its own binary log file and setup multi-master replication so slave can read all of them in parallel you can get replication parallel enough for many application without serious code complications.

If MySQL would not implement it it might be nice feature to hack into community tree.

1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Jon

This is a narrow use case, but I could see it being useful. Let replication be single threaded per schema. Add the ability to configure MySQL to not permit multi schema transactions and then allow multiple replication threads, each configured to deal with transactions for a single schema.