I just found post by Kevin, in which he criticizes Master-Master approach, finding Master with many slaves more optimal. There is surely room for master-N-slaves systems but I find Master-Master replication much better approach in many cases.
Kevin Writes “It requires extra hardware thats sitting in standby and not being used (more money and higher capital expenditures)”, I’m surprised why would it ? Typically you can use both nodes for most of the reads, and this is in fact use pattern MMM was designed for.
“There’s only one machine to act as the standby master. If you have 10 slaves you should be able to fail five times and still be operational.” This is valid consideration but honestly for most of applications it is enough unless you’re using unreliable hardware.
Typically other risks of downtime are higher, what is about having application error which deletes/trashes all the data and gets replicated to all the slaves ?
When you compare master-master to multiple slaves you should compare it for same amount of servers. For example if we have 6 boxes we can use 1 master and 5 slaves or 3 master-master pairs. In this case each of the master-master nodes gets 1/3rd of database size and about 1/3rd of traffic.
The benefits of Master-Master replication are the following
It is more simple especially if you just write to one node fallback and recovery are rather easy. Even if all things are automated simple things mean less software bugs.
Handling write load If your application is write intensive master-N-slave configuration will be saturated much faster because it has to handle much more write load. Especially keeping into account MySQL replication is single thread it might be not long before it will be unable to keep up.
Waste of cache memory If you have same data on the slaves you will likely same data cached in their database caches. You can partially improve it by load partitioning but still it will not be perfect – for example all of write load has to go to all nodes getting appropriate data in the cache. In our example if you have 16GB boxes and say 12GB allocated to MySQL database caches you can get 12GB effective cache on the master-N-slave configuration compared to 36GB of effective cache on 3 master-master pairs.
Waste of Disk Disk is cheap but for IO bound workloads you may need fast disk array, which becomes not so cheap so having less data to deal with becomes important.
More time to clone If replication breaks you may need more time to re-clone it (or restore database from backup) compared to multiple master-master pairs.
I however agree if you have small database (compared to memory size) with insignificant write load master-N-slave configuration may be indeed better as it allows to keep application simple, having all data on the same server.
The customer for which MMM cluster was implemented has about 600GB per node and some other installations have similar sizes.
MMM Cluster support Master with multiple slaves configuration as well, this however was not the main focus.
“When you compare master-master to multiple slaves you should compare it for same amount of servers. For example if we have 6 boxes we can use 1 master and 5 slaves or 3 master-master pairs. In this case each of the master-master nodes gets 1/3rd of database size and about 1/3rd of traffic.”
I don’t know if I am reading this right, but the 1/3 of the data is spread across each master-master pair? If so, wouldn’t the application need to know where the data is to access the correct master-master? The solution doesn’t look like it will abstract the data-layer that much.
Other then the question above:
Overall I would say this is a great solution for many people to get a HA solution without building it directly into their app.
Hi Peter,
I completely agree with you except for one point: If you send reads to the “inactive” master, you are lying to yourself about your true capacity. If one of the masters fails, you don’t have enough read capacity to handle your load.
For this reason, I always suggest that customers use master-master, but scale by adding more partitions, rather than using the “slave” for reads.
Regards,
Jeremy
““There’s only one machine to act as the standby master. If you have 10 slaves you should be able to fail five times and still be operational.†This is valid consideration but honestly for most of applications it is enough unless you’re using unreliable hardware.”
No.. actually. It’s not.
Let’s take Technorati for example. They have three copies of everything. This way if a machine failed at night you still have a redundant copy.
If it’s 1AM I want to be able to go back to sleep. 🙂
“Typically other risks of downtime are higher, what is about having application error which deletes/trashes all the data and gets replicated to all the slaves ?”
Sure… but this is a non sequitor because we’re dealing with master promotion design here.
“Handling write load If your application is write intensive master-N-slave configuration will be saturated much faster because it has to handle much more write load. Especially keeping into account MySQL replication is single thread it might be not long before it will be unable to keep up.”
This is a tangential discussion isnt’ it? We use shard and partition our data so we spread writes this way. Each shard has a single master and two slaves that can be promoted to master.
We can just create as many shards as we need writes.
“Waste of cache memory If you have same data on the slaves you will likely same data cached in their database caches. You can partially improve it by load partitioning but still it will not be perfect – for example all of write load has to go to all nodes getting appropriate data in the cache. In our example if you have 16GB boxes and say 12GB allocated to MySQL database caches you can get 12GB effective cache on the master-N-slave configuration compared to 36GB of effective cache on 3 master-master pairs.”
Again. This seems like a tangential issue and one your design suffers from as well. Isnt’ the data replayed on your slaves in MMM?
It seems like we’re comparing apples to oranges here.
I’m not arguing that partitioning your data isn’t important. It is. I’m just saying that there’s and easy way to use a slave as a master anytime you want.
I didn’t grok from your original post that MMM has sharding (which it seems to now) so I’ll have to play with it. I was just arguing that master to master replication isn’t necessary.
Thanks for posting this stuff!
Kevin
Jeremy,
Yes you’re right. You should take care about halving read capacity of secondary node goes down. However avoiding reads from Secondary node is only one of possible approaches.
First in many cases I’ve seen customers would rather see slower response time for fraction of users in this case, than having double hardware requirements. This especially makes sense if you keep into account you have to prepare to “spikes” which can be as much as 2 times of 95% (“normal”) load in some cases. In this case if you read from both nodes and one of them fails you will still have enough capacity to handle your normal load but not the spikes.
Second there are load control which can be used, for example many systems get large portion of load from search spiders, especially as these are hard on cache. Throttling spiders in case you have master failure is another trick you can use. It is not perfect but may be better than double amount hardware.
Third there are often actions which can be delayed for number of hours while master is down without production impact, such as reporting queries – these are perfect candidate for standby master. Other example is Backup process (I prefer to have this done from Active node to avoid problems in case there are bugs in MySQL replication)
If properly planned the partial capacity should be no more than a few hours, even if master dies completely you should have standby box which can be scripted to take over by recovery from backup or re-cloning master. Using backup may be better choice as master can already higher load than usual.
This is not to say your advice is wrong, but to say using master as standby only is only one of possible solution, which is however good and simply if you have enough money to pay for it 🙂
Kevin,
First let me clarify my position a bit. For Many applications level of Master-Master reliability is enough + backup of course.
Second I’m not really arguing over having 3-4 copies. It is a bit of complication but it is worth for some applications because it has other benefits. What is often misuse is having 10+ slaves replicating 500GB database.
Regarding good night sleep, well it depends a lot on number of boxes you have availability requirements and extra hardware needed. Assume for example you have a choice of paying 200$ bonus to the sys admin who had to wake up at nigh and fix things once per three months or buying 5 more boxes to have more copies ? And again you have a third choice to assume it is unlikely for second node to fail and fix it in the morning.
It is interesting you have Technorati as example – I’ve seen their search being down so many times I think it is not MySQL related at all.
“Sure… but this is a non sequitor because we’re dealing with master promotion design here.”
Well it is because the more slaves you have the more redundancy you have but the more hardware you waste.
“This is a tangential discussion isnt’ it? We use shard and partition our data so we spread writes this way. Each shard has a single master and two slaves that can be promoted to master.
We can just create as many shards as we need writes.”
Well that if fine when. To often however people use too many slaves using replication Instead of partitioning.
By the way, how many shards do you have and how many times you had 2 nodes from the same one failing at the same time ?
“Again. This seems like a tangential issue and one your design suffers from as well. Isnt’ the data replayed on your slaves in MMM?”
Right. I have explained it above with Master-Master you have 2 copies of data with Master-20-slaves you would have 21 copies which is significant waste.
Speaking about Shards this is not MMM part – you can however set up several “clusters” and have you application partition data among them a way you desire.
Dathan,
“I don’t know if I am reading this right, but the 1/3 of the data is spread across each master-master pair? If so, wouldn’t the application need to know where the data is to access the correct master-master? The solution doesn’t look like it will abstract the data-layer that much.”
That is right. I mean general design here. of course you need to change your application to use data partitioning wisely. You also normally need to change one to use pure replication correctly – you need to decide which reads you can do from the slave etc.
Another great guide on this subject I’ve found…
http://www.dancryer.com/2010/01/mysql-circular-replication
This is part 1 of a three posts series:
– MySQL Load-Balanced Cluster Guide – Part 1 – setting up the servers themselves and configuring MySQL replication.
– MySQL Load-Balanced Cluster Guide – Part 2 – set up a script to monitor the status of your MySQL cluster nodes, which we’ll use in the next guide to set up our proxy.
– MySQL Load-Balanced Cluster Guide – Part 3 – setting up the load balancer with HAProxy, using the monitoring scripts