Comments on: Thoughts on MySQL Replication http://www.mysqlperformanceblog.com/2006/07/07/thoughts-on-mysql-replication/ Everything about MySQL Performance Sat, 21 Nov 2009 05:23:57 -0800 http://wordpress.org/?v=2.8.4 hourly 1 By: Brian Wright http://www.mysqlperformanceblog.com/2006/07/07/thoughts-on-mysql-replication/comment-page-1/#comment-362715 Brian Wright Fri, 17 Oct 2008 17:04:30 +0000 http://www.mysqlperformanceblog.com/2006/07/07/thoughts-on-mysql-replication/#comment-362715 Peter, I think we need to define what 'guaranteed consistency' really means in terms of a slave. Does it mean making sure everything comes into the slave in the same exact order it was placed into the master's binlog? Or, does it mean that each query gets properly inserted into the database successfully and timely? Or does it mean both? Clearly, a query ends up in the master's binlog based on, hopefully, when the query succeeds. However, that said, there appears to be no dependency tracking on the order in which the query was placed into the binlog. The order they are placed into the binlog, then, appears to be based strictly on the order of arrival + success. Clearly, doing a serial playback on the slave recognizes some semblance of order, but this is not optimized for speed or performance. It only takes one very long slow query to throw replication behind. The question then, is it more important to have replication out of sync for the sake of alleged consistency, or is it more important to keep replication up-to-date in spite of slow queries? Granted, there might be other data behind a slow query that may be dependent on completion of that slow query. Again, this can be solved by grouping these dependent queries together behind the slow query and executing them in order. It's fairly easy to recognize a dependent query simplistically.. 1) the query arrived after another query and 2) it references the same table as that previous query. Unrelated and non-dependent queries shouldn't have to wait behind an extremely slow, unrelated query. Note that there may be system processes outside of the database that are dependent on that unrelated data that has been delayed. Is it right then, for alleged consistency purposes, to delay those non-related queries? In other words, is the slave's database considered consistent by being outdated compared to the master because of a slow query (or a series of them)? For this reason, I believe a redefinition of consistent is also in order when discussing a slave's database. Thanks. Peter,

I think we need to define what ‘guaranteed consistency’ really means in terms of a slave. Does it mean making sure everything comes into the slave in the same exact order it was placed into the master’s binlog? Or, does it mean that each query gets properly inserted into the database successfully and timely? Or does it mean both?

Clearly, a query ends up in the master’s binlog based on, hopefully, when the query succeeds. However, that said, there appears to be no dependency tracking on the order in which the query was placed into the binlog. The order they are placed into the binlog, then, appears to be based strictly on the order of arrival + success. Clearly, doing a serial playback on the slave recognizes some semblance of order, but this is not optimized for speed or performance. It only takes one very long slow query to throw replication behind. The question then, is it more important to have replication out of sync for the sake of alleged consistency, or is it more important to keep replication up-to-date in spite of slow queries?

Granted, there might be other data behind a slow query that may be dependent on completion of that slow query. Again, this can be solved by grouping these dependent queries together behind the slow query and executing them in order. It’s fairly easy to recognize a dependent query simplistically.. 1) the query arrived after another query and 2) it references the same table as that previous query. Unrelated and non-dependent queries shouldn’t have to wait behind an extremely slow, unrelated query. Note that there may be system processes outside of the database that are dependent on that unrelated data that has been delayed. Is it right then, for alleged consistency purposes, to delay those non-related queries?

In other words, is the slave’s database considered consistent by being outdated compared to the master because of a slow query (or a series of them)? For this reason, I believe a redefinition of consistent is also in order when discussing a slave’s database.

Thanks.

]]>
By: peter http://www.mysqlperformanceblog.com/2006/07/07/thoughts-on-mysql-replication/comment-page-1/#comment-362549 peter Fri, 17 Oct 2008 07:48:03 +0000 http://www.mysqlperformanceblog.com/2006/07/07/thoughts-on-mysql-replication/#comment-362549 Brian, I guess a lot of talks are done about this multi thread replication but it is far from trivial when you want it to work for general case. First failed queries - you can now set replication to skip errors but this is very dangerous thing to do, unless you know what exactly you're speaking as it is very easy to get the copy which is completely out of sync with no error messages. Now grouping queries by tables requires dependency tracking (though it is much easier with row level replication) and it also changes how data on the slave can look like. Right now there is a consistence guaranty - the data on the slave always looks like data on the master sometime in the past. If you have multiple (per table per database etc threads) it is possible to see the data state which never existed on the master (when one thread runs ahead of the other) There is solution to this too - to have syncronized commits which make data visible in batches to have things consistent but this requires multi-thread transactions and will not work with all storage engines. Brian,

I guess a lot of talks are done about this multi thread replication but it is far from trivial when you want it to work for general case.

First failed queries – you can now set replication to skip errors but this is very dangerous thing to do, unless you know what exactly you’re speaking as it is very easy to get the copy which is completely out of sync with no error messages.

Now grouping queries by tables requires dependency tracking (though it is much easier with row level replication) and it also changes how data on the slave can look like.

Right now there is a consistence guaranty – the data on the slave always looks like data on the master sometime in the past. If you have multiple (per table per database etc threads) it is possible to see the data state which never existed on the master (when one thread runs ahead of the other)

There is solution to this too – to have syncronized commits which make data visible in batches to have things consistent but this requires multi-thread transactions and will not work with all storage engines.

]]>
By: Brian Wright http://www.mysqlperformanceblog.com/2006/07/07/thoughts-on-mysql-replication/comment-page-1/#comment-362481 Brian Wright Fri, 17 Oct 2008 00:17:27 +0000 http://www.mysqlperformanceblog.com/2006/07/07/thoughts-on-mysql-replication/#comment-362481 Because replication is a serial playback of SQL statements, one query can delay replication until it completes. The main issue with this is that the master allows updating unique databases simultaneously (or as concurrent as the operating system will allow). Because replication is serial playback only and runs only one query at a time, this can lead to delays and backlogs and it can also prevent a slave from catching up in a timely fashion. The MySQL team needs to redesign replication to scan ahead from the binlog and groups the queries into queues of like databases. Then, allow up to X threads all pulling from the queues simultaneously to update each unique database (only one thread pulling from each specific database queue). This technique effectively replicates the simultaneous query nature of the master's original queries and allows for much more timely replication on the slave. This approach prevents a single query from holding back the entire replica. The issues would be how to resolve if a query fails (crashed table, aborted or other issue). My suggestion would be to write the query to a log file (or a failed queue) and state why the query failed. Should an error stop replication? Perhaps that's a user setting and also depends on the failure type (i.e., crashed table). I've never liked that replication stops with aborted queries on the master. I understand the reasoning behind it, but an aborted query should warn the DBA and allow replication to continue... or least have a user configuration setting to let the DBA choose which to fail and which to skip. The other issue is how to resolve dependencies between tables and data (table A requires data from table B, but only after table B has been updated in the proper order). On the master, though, there's really no guarantee you're going to get the most up-to-date data from a SELECT as an UPDATE query could follow right behind your SELECT. So, this may not be a major issue. As I said, I'd prefer to have multiple replication queries running on the slave simultaneous rather than as today (running them serially). Multithreaded replication ensures the slave is always up-to-date in a timely fashion. Because replication is a serial playback of SQL statements, one query can delay replication until it completes. The main issue with this is that the master allows updating unique databases simultaneously (or as concurrent as the operating system will allow). Because replication is serial playback only and runs only one query at a time, this can lead to delays and backlogs and it can also prevent a slave from catching up in a timely fashion.

The MySQL team needs to redesign replication to scan ahead from the binlog and groups the queries into queues of like databases. Then, allow up to X threads all pulling from the queues simultaneously to update each unique database (only one thread pulling from each specific database queue). This technique effectively replicates the simultaneous query nature of the master’s original queries and allows for much more timely replication on the slave.

This approach prevents a single query from holding back the entire replica. The issues would be how to resolve if a query fails (crashed table, aborted or other issue). My suggestion would be to write the query to a log file (or a failed queue) and state why the query failed. Should an error stop replication? Perhaps that’s a user setting and also depends on the failure type (i.e., crashed table). I’ve never liked that replication stops with aborted queries on the master. I understand the reasoning behind it, but an aborted query should warn the DBA and allow replication to continue… or least have a user configuration setting to let the DBA choose which to fail and which to skip.

The other issue is how to resolve dependencies between tables and data (table A requires data from table B, but only after table B has been updated in the proper order). On the master, though, there’s really no guarantee you’re going to get the most up-to-date data from a SELECT as an UPDATE query could follow right behind your SELECT. So, this may not be a major issue. As I said, I’d prefer to have multiple replication queries running on the slave simultaneous rather than as today (running them serially). Multithreaded replication ensures the slave is always up-to-date in a timely fashion.

]]>
By: Ira http://www.mysqlperformanceblog.com/2006/07/07/thoughts-on-mysql-replication/comment-page-1/#comment-355862 Ira Wed, 17 Sep 2008 12:09:54 +0000 http://www.mysqlperformanceblog.com/2006/07/07/thoughts-on-mysql-replication/#comment-355862 I'm trying to sort out the different options, but I'm no DBA, and I am not sure I got them all on my list and where to start comparing... There's regular replication, built-into the system, which can be aided by Maatkit or Google MMM. Is that what you call "simple replication"? what's the "non-simple" way? There's the problematic DRBD way of doing it but it seems cumbersome and limited, but still chosen by some, I can't find any docs about the pros of this approach. Maybe other systems I'm not aware of? Are there any others to consider? is there a definite-best howto for building a master-slave duo? I sort of understand how the failover works, just not sure about how the fallen master becomes slave and resyncs and all... Any chance you would be posting a lower level introduction to this? I can't find one site that lists them all, with either a list of the differences or pros and cons one needs to know about. Thanks! Ira I’m trying to sort out the different options, but I’m no DBA, and I am not sure I got them all on my list and where to start comparing…

There’s regular replication, built-into the system, which can be aided by Maatkit or Google MMM. Is that what you call “simple replication”? what’s the “non-simple” way?
There’s the problematic DRBD way of doing it but it seems cumbersome and limited, but still chosen by some, I can’t find any docs about the pros of this approach.
Maybe other systems I’m not aware of?

Are there any others to consider? is there a definite-best howto for building a master-slave duo? I sort of understand how the failover works, just not sure about how the fallen master becomes slave and resyncs and all… Any chance you would be posting a lower level introduction to this? I can’t find one site that lists them all, with either a list of the differences or pros and cons one needs to know about.

Thanks!
Ira

]]>
By: Thoughts on MySQL Replication http://www.mysqlperformanceblog.com/2006/07/07/thoughts-on-mysql-replication/comment-page-1/#comment-289642 Thoughts on MySQL Replication Tue, 29 Apr 2008 06:03:53 +0000 http://www.mysqlperformanceblog.com/2006/07/07/thoughts-on-mysql-replication/#comment-289642 [...] 2008-04-29             作者:hywl51 发表于:2008-04-29 14:02:42 最后更新于:2008-04-29 14:02:42 版权声明:可以任意转载,但请务必以超链接形式标明文章原始出处和作者信息。http://www.sitearchitect.cn/?p=691 Thoughts on MySQL Replication [...] [...] 2008-04-29             作者:hywl51 发表于:2008-04-29 14:02:42 最后更新于:2008-04-29 14:02:42 版权声明:可以任意转载,但请务必以超链接形式标明文章原始出处和作者信息。http://www.sitearchitect.cn/?p=691 Thoughts on MySQL Replication [...]

]]>
By: peter http://www.mysqlperformanceblog.com/2006/07/07/thoughts-on-mysql-replication/comment-page-1/#comment-1080 peter Mon, 17 Jul 2006 06:10:59 +0000 http://www.mysqlperformanceblog.com/2006/07/07/thoughts-on-mysql-replication/#comment-1080 James, You're right. DRBD makes great addition to MySQL replication, however it is not alternative if you're also looking for increased performance, not only high availability. Partitioning on read workload is good way to increase cache efficiency. In Wikipedia case it is not hard as languages are pretty much independent. As you however already mention you assign group of slaves for single language... this is where you've got to have this problem again. The reason it worked well for Wikipedia is likely - single language is small enough so it fits well in memory in single server, in this case it does not matter much. If your working set is 10G it does not matter if your effective cache size is 12G or 120G. I'm quite curious how further growth would be archived. I guess you will still end up with partitioning and having different languages in different replication groups otherwise at some point in time replication will not be able to keep up. This would of course require very large amount of contributors to provide such write traffic. Same about language - if working set of single language would grow to 100G or more before it would become typical size of memory on the server there may be troubles. In general even though Wikipedia is very popular web site it is conceptually very good for scaling. The traffic must be reads in astinishing proportion, with much of it comming from anonymous users which can be served by caching proxies not even hitting web servers. Language provide very good partitioning as there is very limited dependence between them. Working set for each language is small. James,

You’re right. DRBD makes great addition to MySQL replication, however it is not alternative if you’re also looking for increased performance, not only high availability.

Partitioning on read workload is good way to increase cache efficiency. In Wikipedia case it is not hard as languages are pretty much independent. As you however already mention you assign group of slaves for single language… this is where you’ve got to have this problem again. The reason it worked well for Wikipedia is likely – single language is small enough so it fits well in memory in single server, in this case it does not matter much. If your working set is 10G it does not matter if your effective cache size is 12G or 120G.

I’m quite curious how further growth would be archived. I guess you will still end up with partitioning and having different languages in different replication groups otherwise at some point in time replication will not be able to keep up. This would of course require very large amount of contributors to provide such write traffic.

Same about language – if working set of single language would grow to 100G or more before it would become typical size of memory on the server there may be troubles.

In general even though Wikipedia is very popular web site it is conceptually very good for scaling. The traffic must be reads in astinishing proportion, with much of it comming from anonymous users which can be served by caching proxies not even hitting web servers. Language provide very good partitioning as there is very limited dependence between them. Working set for each language is small.

]]>
By: James Day http://www.mysqlperformanceblog.com/2006/07/07/thoughts-on-mysql-replication/comment-page-1/#comment-1055 James Day Fri, 14 Jul 2006 19:22:32 +0000 http://www.mysqlperformanceblog.com/2006/07/07/thoughts-on-mysql-replication/#comment-1055 Kevin, LiveJournal uses DRBD with MySQL, for failover master server pairs. InnoDB and if one dies the other is started up, does file system recovery, then does InnoDB recovery. Sometimes the cache efficiency can be addressed as Wikipedia does it. Splits groups of languages into sets of servers. Within each set, all servers get all writes via replication. Each language in the group is assigned to one or more slaves for the read part of the load. Because each slave is serving and caching only a portion of the work the effective cache size ends up closer to the sum of the cache sizes of all of the slaves. It made a major performance difference. James Day, Support Engineer, MySQL AB and first DBA, Wikipedia. Kevin, LiveJournal uses DRBD with MySQL, for failover master server pairs. InnoDB and if one dies the other is started up, does file system recovery, then does InnoDB recovery.

Sometimes the cache efficiency can be addressed as Wikipedia does it. Splits groups of languages into sets of servers. Within each set, all servers get all writes via replication. Each language in the group is assigned to one or more slaves for the read part of the load. Because each slave is serving and caching only a portion of the work the effective cache size ends up closer to the sum of the cache sizes of all of the slaves. It made a major performance difference.

James Day, Support Engineer, MySQL AB and first DBA, Wikipedia.

]]>
By: peter http://www.mysqlperformanceblog.com/2006/07/07/thoughts-on-mysql-replication/comment-page-1/#comment-1008 peter Mon, 10 Jul 2006 06:47:23 +0000 http://www.mysqlperformanceblog.com/2006/07/07/thoughts-on-mysql-replication/#comment-1008 Kevin, Oh. I still forgot to comment on your bulk inserts thing. Yes that is kind of typical challange of throughtput and latency. There are very many cases when improving one hurts another and you need to find proper ballance. Especially with MyISAM and good sized bulk_insert_buffer_size performance improvement can be dramatic. Sometimes I've got 100 times increase (using 100.000 values per insert statement or so) - The tricks are however to increse max_allowed_packet as well and keep key_buffer_size larger than bulk_insert_buffer_size otherwise performance could go down. Kevin,

Oh. I still forgot to comment on your bulk inserts thing. Yes that is kind of typical challange of throughtput and latency. There are very many cases when improving one hurts another and you need to find proper ballance.

Especially with MyISAM and good sized bulk_insert_buffer_size performance improvement can be dramatic. Sometimes I’ve got 100 times increase (using 100.000 values per insert statement or so) – The tricks are however to increse max_allowed_packet as well and keep key_buffer_size larger than bulk_insert_buffer_size otherwise performance could go down.

]]>
By: peter http://www.mysqlperformanceblog.com/2006/07/07/thoughts-on-mysql-replication/comment-page-1/#comment-1007 peter Mon, 10 Jul 2006 06:42:40 +0000 http://www.mysqlperformanceblog.com/2006/07/07/thoughts-on-mysql-replication/#comment-1007 Kevin, Yes this problem is often overlooked for some reason. MyISAM row cache in memcached could be interesting and actually not that hard to implement. We've recently implemented IO via mmap instead of read/write so writethrough cache for memcached should not be that hard. If would not do it writeback at least initially as memcached is not designed to be overly highly available. It is possible to do some clustering with updating multiple nodes but it would reduce performance. Other thing - technically distributed file systems should do well for this purpose, of course if they optimize cache to avoid caching same data by same nodes. I have not benchmarked it however - should do once have a chance. Speaking about NDB cluster - yes it has data on the disk in MySQL 5.1 - I however have not benchmarked it that seriously. I also guess it would take at least a year before it will be ready for serious production use. Speaking about Linux page cache efficiency - yes, thanks for catching me. However it is all relative - for single disk you can do about 100-150 random IOs per second. For Page cache you can do some 1000 times more which is gread, however when data is in process memory you can do some 10 times more operations still. So if your load is IO bound it is most important to get it cached ether in OS memory or in MySQL cached. If you're getting CPU bound you should look for caching data in process memory. Kevin,

Yes this problem is often overlooked for some reason. MyISAM row cache in memcached could be interesting and actually not that hard to implement. We’ve recently implemented IO via mmap instead of read/write so writethrough cache for memcached should not be that hard. If would not do it writeback at least initially as memcached is not designed to be overly highly available. It is possible to do some clustering with updating multiple nodes but it would reduce performance.

Other thing – technically distributed file systems should do well for this purpose, of course if they optimize cache to avoid caching same data by same nodes. I have not benchmarked it however – should do once have a chance.

Speaking about NDB cluster – yes it has data on the disk in MySQL 5.1 – I however have not benchmarked it that seriously. I also guess it would take at least a year before it will be ready for serious production use.

Speaking about Linux page cache efficiency – yes, thanks for catching me. However it is all relative – for single disk you can do about 100-150 random IOs per second. For Page cache you can do some 1000 times more which is gread, however when data is in process memory you can do some 10 times more operations still.

So if your load is IO bound it is most important to get it cached ether in OS memory or in MySQL cached. If you’re getting CPU bound you should look for caching data in process memory.

]]>
By: peter http://www.mysqlperformanceblog.com/2006/07/07/thoughts-on-mysql-replication/comment-page-1/#comment-1006 peter Mon, 10 Jul 2006 06:30:09 +0000 http://www.mysqlperformanceblog.com/2006/07/07/thoughts-on-mysql-replication/#comment-1006 Apachez, DRBD is great solution but for a bit different problem. It also does replication f course but unlike with MySQL replication only one of the nodes can be used at the same time. So it gives you high availability but not much of extra performance. DRBD however could be used together with MySQL Replication to increase High Availability - MySQL Replicaion is asynchronous so if master is lost, some transactions could be lost. However if master is replicated by DRBD you can avoid loosing any tranactions. It also greatly simplifies slave fallback which can be as simple as moving master IP to the new box. The other way you can apply DRBD is to create network backup of your MySQL server completely live. Apachez,

DRBD is great solution but for a bit different problem. It also does replication f course but unlike with MySQL replication only one of the nodes can be used at the same time. So it gives you high availability but not much of extra performance.

DRBD however could be used together with MySQL Replication to increase High Availability – MySQL Replicaion is asynchronous so if master is lost, some transactions could be lost. However if master is replicated by DRBD you can avoid loosing any tranactions. It also greatly simplifies slave fallback which can be as simple as moving master IP to the new box.

The other way you can apply DRBD is to create network backup of your MySQL server completely live.

]]>