Brian Aker recently published good write up about using MySQL replication. The piece I find missing however is good description of warning about limits of this approach as well as things you need to watch out for. You can call me person with negative mind but I tend to think about limits as well. So what would be my list of limits and watch outs in this case ?

Limited write scaling. As Brian writes this works best for read intensive envinronments. Write scaling however can be real problem. Not moderate increase of write ratio can dramatically reduce performance but you also need to watch out for slave being able to keep up at all. Remember writes to slave are serialized so if you master happens to be 8 core system with 20 hard drives you may be in trouble even if writes only consume 10% of resources as almost only one core and only one disk will be working on slave side.

Now let me clarify what I mean by moderate increasing of write load can cause the trouble. Assume you have same master and your write load loading master 60%, almost same 60% of resources used by the slaves to handle writes. You might have 10 slaves so read load might not be the issue. Now if your write load growths 50% – Master and slaves will start use 90% of their resources for replication (forget for the second replication being single thread, which just makes things worse), which means only 10% will remain for anything else. So your “read” performance will drop 4 times if write load growth by only 50%.

Delays in replication MySQL replication is fast which means in average delay is small. Typically microseconds. However if there happens to be some bulk query on the master, even on completely unrelated table, replication lag may increase dramatically. Besides monitoring which Brian mentioned three other approaches could be used. First is to use master for real time reads when you want to have 100% guaranty you’re reading latest data. Second to use MASTER_POS_WAIT on the slave(s) to make sure update made it to them – this one has large overhead so should be used only for absolutely critical updates. Third one is to use cache – something like memcached, besides pure caching it can be used to store object version information so you can quickly check if version you read from the slave is current one or if you need to do master read.

Waste of space Disk is cheap… if it is not a lot of high performance storage. If you would be replicating 1TB of data in 20 copies I guess you would understand me 🙂 For sake of high availability you probably need 2-3 copies. If you need significantly more slaves and they need a lot of storage you might need other solution.

Waste of cache Memory is much faster than disk and you generally want to have data you’re accessing to be cached in memory by MySQL cache or OS cache. As slaves replicate the same data and often queried in random or round robin fashion their cache would have about same content, so if you would have 10 16GB boxes your “effective cache” would be still about 14GB not even close 140GB as total amount of cache memory you have. Using memcached or similar solutions on the top of things for caching as well as mapping of queries to different slaves helps but only up to certain point. Different solutions are needed if that becomes the problem.

So am I against simple replication ? Not really this is great solution which offers great MySQL Performance in many cases. It is good however to understand if this solution is right for you so you do not have to jump to other schemes without planning it. The simplicity which this solution delivers is more important than its problems for many small-medium sized application.

So in which case I would go using simple master slave replication ?

Small database size. Small meaning working set fits in memory. This reduces waste of storage, removes problem with cache duplication (as each box can cache everything) as well as normally makes write load to consume small portion of resources.

Light write load This is needed to avoid scalability problems of single thread and resource waste. I would call load light if it takes no more than 30% of resources if you only leave single disk and single core on the master. So you it can grow well enough before gibing you trouble.

Few replicas – If you have 3-5 replicas it is fine. Up to 10 probably OK but more than that may mean other solutions would work better. Sometimes even 100 replicas can be OK if database is small and load is virtually read only. you may however want to structure replication with median servers to avoid master overload in such case.

No long queries – As I already wrote these cause delay in replication even if they work on completely different (but replicated) data sets. Sometimes you do not care about stale data so you’re fine, somethimes you can work it around by skipping replication on of tables affected by long running queries.

That is all for today but far from being all what can be said about MySQL Replication.

15 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Lukas

Shouldnt large transactions (that is those that make a lot of changes) have increasing delay as well?

Apachez

An alternative to mysql replication might be to use a filesystem replication instead such as http://www.drbd.org

Kevin Burton

“As slaves replicate the same data and often queried in random or round robin fashion their cache would have about same content, so if you would have 10 16GB boxes your “effective cacheâ€? would be still about 14GB not even close 140GB as total amount of cache memory you have.”

YES. Thank you! I was waiting form SOMEONE to bring up the problem with MySQL not using a larger effective cache.

If MySQL implemented a MyISAM row cache and implemented direct support for Memcached you could literally use all this memory and you’d see a HUGE boost in performance.

Were’nt you JUST saying the other day that the Linux page cache wasn’t very high performance?

Kevin Burton

I know of one major site which uses DRDB (with Postgres) and they seem to like it. It seems that it can only be used for failover though. Note that the IO for DRDB happens BEFORE the transaction completes (within the kernel) so if the master crashes you can promote your backup master. (or at least should be able to do it).

Another point…. yes large transactions (on MyISAM) will use locks longer but this can be faster than doing smaller transactions. Bulk inserting records with mulitiple values is about 11x faster in my app…

Kevin

Kevin Burton

Another thing I just realized… Rumor has it that MySQL 5.1 will support NDB backed by disk. This is essentially what I’m waiting for because not everyone can afford to throw EVERYTHING in memory. In this situation there would be no cache sharing between MySQL instances so maybe my memcached row cache idea is a good one.

James Day

Kevin, LiveJournal uses DRBD with MySQL, for failover master server pairs. InnoDB and if one dies the other is started up, does file system recovery, then does InnoDB recovery.

Sometimes the cache efficiency can be addressed as Wikipedia does it. Splits groups of languages into sets of servers. Within each set, all servers get all writes via replication. Each language in the group is assigned to one or more slaves for the read part of the load. Because each slave is serving and caching only a portion of the work the effective cache size ends up closer to the sum of the cache sizes of all of the slaves. It made a major performance difference.

James Day, Support Engineer, MySQL AB and first DBA, Wikipedia.

Ira

I’m trying to sort out the different options, but I’m no DBA, and I am not sure I got them all on my list and where to start comparing…

There’s regular replication, built-into the system, which can be aided by Maatkit or Google MMM. Is that what you call “simple replication”? what’s the “non-simple” way?
There’s the problematic DRBD way of doing it but it seems cumbersome and limited, but still chosen by some, I can’t find any docs about the pros of this approach.
Maybe other systems I’m not aware of?

Are there any others to consider? is there a definite-best howto for building a master-slave duo? I sort of understand how the failover works, just not sure about how the fallen master becomes slave and resyncs and all… Any chance you would be posting a lower level introduction to this? I can’t find one site that lists them all, with either a list of the differences or pros and cons one needs to know about.

Thanks!
Ira

Brian Wright

Because replication is a serial playback of SQL statements, one query can delay replication until it completes. The main issue with this is that the master allows updating unique databases simultaneously (or as concurrent as the operating system will allow). Because replication is serial playback only and runs only one query at a time, this can lead to delays and backlogs and it can also prevent a slave from catching up in a timely fashion.

The MySQL team needs to redesign replication to scan ahead from the binlog and groups the queries into queues of like databases. Then, allow up to X threads all pulling from the queues simultaneously to update each unique database (only one thread pulling from each specific database queue). This technique effectively replicates the simultaneous query nature of the master’s original queries and allows for much more timely replication on the slave.

This approach prevents a single query from holding back the entire replica. The issues would be how to resolve if a query fails (crashed table, aborted or other issue). My suggestion would be to write the query to a log file (or a failed queue) and state why the query failed. Should an error stop replication? Perhaps that’s a user setting and also depends on the failure type (i.e., crashed table). I’ve never liked that replication stops with aborted queries on the master. I understand the reasoning behind it, but an aborted query should warn the DBA and allow replication to continue… or least have a user configuration setting to let the DBA choose which to fail and which to skip.

The other issue is how to resolve dependencies between tables and data (table A requires data from table B, but only after table B has been updated in the proper order). On the master, though, there’s really no guarantee you’re going to get the most up-to-date data from a SELECT as an UPDATE query could follow right behind your SELECT. So, this may not be a major issue. As I said, I’d prefer to have multiple replication queries running on the slave simultaneous rather than as today (running them serially). Multithreaded replication ensures the slave is always up-to-date in a timely fashion.

Brian Wright

Peter,

I think we need to define what ‘guaranteed consistency’ really means in terms of a slave. Does it mean making sure everything comes into the slave in the same exact order it was placed into the master’s binlog? Or, does it mean that each query gets properly inserted into the database successfully and timely? Or does it mean both?

Clearly, a query ends up in the master’s binlog based on, hopefully, when the query succeeds. However, that said, there appears to be no dependency tracking on the order in which the query was placed into the binlog. The order they are placed into the binlog, then, appears to be based strictly on the order of arrival + success. Clearly, doing a serial playback on the slave recognizes some semblance of order, but this is not optimized for speed or performance. It only takes one very long slow query to throw replication behind. The question then, is it more important to have replication out of sync for the sake of alleged consistency, or is it more important to keep replication up-to-date in spite of slow queries?

Granted, there might be other data behind a slow query that may be dependent on completion of that slow query. Again, this can be solved by grouping these dependent queries together behind the slow query and executing them in order. It’s fairly easy to recognize a dependent query simplistically.. 1) the query arrived after another query and 2) it references the same table as that previous query. Unrelated and non-dependent queries shouldn’t have to wait behind an extremely slow, unrelated query. Note that there may be system processes outside of the database that are dependent on that unrelated data that has been delayed. Is it right then, for alleged consistency purposes, to delay those non-related queries?

In other words, is the slave’s database considered consistent by being outdated compared to the master because of a slow query (or a series of them)? For this reason, I believe a redefinition of consistent is also in order when discussing a slave’s database.

Thanks.