The MySQL Master-Master replication (often in active-passive mode) is popular pattern used by many companies using MySQL for scale out. Most of the companies would have some internal scripts to handle things as automatic fallback and slave cloning but no Open Source solution was made available.
Few months ago we were asked to implement such solution for one of the customers and they kindly agreed to let us release things under GPL2 License, and we gave them reduced rate for being Open Source friendly.
So what does this tool do and how it works ?
Currently it is implemented based on Linux IP management tools and LVM for snapshot creation but we hope support for other operation systems added in the future.
It can manage master-master pair as well as other configurations such as master-master and bunch of slaves.
Typically you would define “roles” for example READER and WRITER roles for most simply case and assign them to the severs. For example you can say both servers in pair can be reader at the same time but only one of them should be writable at the same time for master-master pair.
Each of the roles will have pair of IPs associated with it in this case and it will make sure all of these IPs are handled by some server, so you can use DNS for load balancing without worrying about TTL and similar things.
If Active server fails in the pair both its READER and WRITER role will be taken over by passive node and depending on monitoring configuration it can happen within few seconds.
Such IP based high availability and load balancing does not require any extra hardware or software and works well for bunch of applications which could be implemented using different languages etc, which makes application based fallbacks problematic.
The tool also takes extra caution to prevent application mistakes. What will happen if you will write to the slave because of application error ? Well you’ll break replication often without knowing about it. You can kind of solve it with using read-only user for slave connection but what if your application simply was misconfigured and things the server is master when it is not ? To take care of this Master Master Manager makes sure only one of the nodes is writable at all times and other is set to –read-only, so unless you use user with SUPER privilege you should be safe.
It has some other neat features, for example you may configure it to remove READER role from the server if it gets too delayed with replication (or if replication breaks) so you do not have to do it in each application.
We also took extra caution about making sure things can’t run out of sync silently. For example you might know if slave server reboots (say power goes down) you can’t be sure about data consistency because replication may be restarted with wrong position. Sometimes this shows up as an errors but for some query pattern it will not. MMM will detect this situation and will hold the server for administrator to decide.
One command LVM based sync is also implemented so restoring broken replication safe way becomes very easy.
Besides simple “cloning” of the nodes you can use same tool to create a backup with number of methods supported including compressed backup or incremental backups with rdiff.
Finally we have implemented safe role switch, meaning you can move writer to other node in clean way (making sure replication is still in sync) – I see too often this switch happens “dirty” way potentially risking replication inconsistencies. This is very handy if you want to restart one of servers in clusters to upgrade OS or add more RAM to the system.
This tool works well for number of customers and users but it is surely early software version so try it on your own risk and make sure to provide your feedback and suggestions.
You can learn some more about MMM here, download the tool here and have your questions answered and suggestions welcomed here
Hey.
I’ll have to take a look.
Sounds a little bit like lbpool.
We’ve been working on automatic master promotion and it should be in the next version of lbpool.
We found a cool little hack where you start with a series of slaves and the clients can promote a master. Then that master can fail and the clients can promote a new master.
The cool thing is that since any slave can become a master there’s no need for special hardware on the master like RAID 10….. you can just use RAID 0…. Since you have three slave (any of which can become master) you’re redundant anyway.
Onward!
Another thing… MyISAM tables support generation of checksums. I was thinking of using this to detect inconsistent states in the cluster. You could lock the master and then wait for the slaves to catch up and then compare their checksums.
Onward!
Kevin,
lbpool is Java thing while this solution is application independent, so it will work for any language and what is important set of languages.
With LAMP environments you would frequently find Web written in PHP some scripts written in Perl or Python and some cron scripts in shell simply calling mysql to run some queries.
Regarding Master promotion from the slave yes you can do that. The problem is however making it 100% clean way if you care. MySQL replication is async which is first part so you will lose some transactions on promotion but when you need to make sure you’re promoting slave which is the most up to day or you’re risking inconsistences.
I know for many applications you do not care but it is worth noting.
MyISAM checksum can of course be used. The problem is in described scenario with power down on the slave – some of MyISAM tables would likely be corrupted anyway, not to mention it is MyISAM specific.
We’ve solved the problem of missed transactions too.
We emulate synchronous commit on the client and wait for that binary log position to execute on at least ONE of the slaves. That way if the master fails that data is on the next slave that would become the master.
Kevin
Kevin,
Sure you can do that but that is performance penalty… Most cases you do not have to wait for long but if batch update happens to fit in between it may take too long to be used for interactive applications.
I’m not saying you can’t solve it this way just to mention it is rather application specific.
I also should mention this approach does not guaranty you master and slave are 100% in sync, it is simply barrier to guaranty slave had caught up to certain point.
For example if master crashes slave may never be able to go to the specified position while some other slave could be more lucky.
How do you make replication synchronous to the standby master?
Wouldn’t it have the same latency that my approach would?
“I also should mention this approach does not guaranty you master and slave are 100% in sync, it is simply barrier to guaranty slave had caught up to certain point.”
Yes it does…. because if a master crashes in during a commit the commit would be aborted and run on the new master.
“For example if master crashes slave may never be able to go to the specified position while some other slave could be more lucky.”
No. That isn’t an issue. When doing promotion you find the slave which is the most up to date with the master and that becomes the new master. You don’t just pick a random slave….
Kevin
I do not make replication synchronous to standby master. If master crashes some transactions can be lost (though small one as relay logs usually get most of updates even if replication itself is delayed)
It is however lesser problem if you simple have master-master pair compared to 5 slaves which all can be at slightly different positions.
Now I do not exactly understand how do commit on both master and slave ? What I’m saying is following example you’ve committed on master successfully but master_pos_wait on the slave was not successful, what you can say at this stage ? Nothing.
Regarding slave selection – it is issue which you can solve yes. Finding most up to date slave and re-positioning all other slaves to appropriately adjusted positions. It is however rather complicated.
“I do not make replication synchronous to standby master. If master crashes some transactions can be lost (though small one as relay logs usually get most of updates even if replication itself is delayed)”
Ah… ok. So in this situation it’s just as good as my async replication feature. So both our solutions are the same here. You could still implement synchronous replication if you wanted. It’s at least nice to have that functionality for some applications.
“It is however lesser problem if you simple have master-master pair compared to 5 slaves which all can be at slightly different positions.”
No… that’s the point I was trying to make. I’ve solved that problem. You just select the slave least behind the master. Since they ALL use the same binary log offsets the slaves can be reparented to the new master.
The new master is selected by looking at all the slaves and finding the slave closest to the master.
“Now I do not exactly understand how do commit on both master and slave ? What I’m saying is following example you’ve committed on master successfully but master_pos_wait on the slave was not successful, what you can say at this stage ? Nothing.”
Huh? I don’t understand. We don’t do MASTER_POS_WAIT() but emulate it in the client. You’d only need this wen doign synchronous writes. It’s entirely optional.
“Regarding slave selection – it is issue which you can solve yes. Finding most up to date slave and re-positioning all other slaves to appropriately adjusted positions. It is however rather complicated.”
No… 🙂
that’s the point I was trying to make. I had anepiphany which makes this all possible.
I was able to find a REALLY trivial way to solve this. It’s a bit difficult to explain but it’s one of those “duh.. I should have thought of that” moments.
You basically have all the slaves write in the same binary log positions as the master. Then the new slaves already know the position and you just change the server name and start reading there?
Does that make sense? Maybe I’ll have to create a diagram for this.
Kevin
Oh.. and amazingly I already wrote an article about this 🙂
http://www.feedblog.org/2007/02/a_simple_protoc_1.html
It’s pretty much my current thinking ….
Onward!
Kevin,
“Yes it does…. because if a master crashes in during a commit the commit would be aborted and run on the new master.”
That is storage engine specific, or perhaps I don’t understand what you really mean.
I wrote MySQL Table Checksum to do fast, parallel table checksums across many servers at a time. As far as I know, this is the only tool in existence that can definitively say “yes, the slave is completely identical to the master.”
One thing this has taught me is how frequently there is silent corruption on the slaves, even in the absence of errors. I think most people assume if there are no errors, the slaves will have the same data. If my experience is similar to others, that is a wrong assumption. We get slaves with different data all the time, when there are no errors.
“assign them to the severs”
servers?
Kevin,
“Ah… ok. So in this situation it’s just as good as my async replication feature. So both our solutions are the same here. You could still implement synchronous replication if you wanted. It’s at least nice to have that functionality for some applications.”
How ? You can implement semi-synchronous replication which is barrier based, what I’ve told but not fully Synchronous. Synchronous replication means any reads at any time will return same data both for master and the slave. You can’t get it with MySQL replication by design because slave only fetches statement after it was completely executed (and visible) on the master.
“No… that’s the point I was trying to make. I’ve solved that problem. You just select the slave least behind the master. Since they ALL use the same binary log offsets the slaves can be reparented to the new master.
The new master is selected by looking at all the slaves and finding the slave closest to the master.”
Well. Finding slave closest to the master is not hard as you have positions where each slave is on the master which is easy to compare. I guess you mean this by “all they use same binary log offsets”. The offsets in the slave binary log can be different if if you have slaves restarted for example.
“Huh? I don’t understand. We don’t do MASTER_POS_WAIT() but emulate it in the client. You’d only need this wen doign synchronous writes. It’s entirely optional.”
Fine you can emulate it. But it does not really change the big picture. I guess a lot of arguing here is because you do not define Synchronous replication the same way I do 🙂
“I was able to find a REALLY trivial way to solve this. It’s a bit difficult to explain but it’s one of those “duh.. I should have thought of that†moments.
You basically have all the slaves write in the same binary log positions as the master. Then the new slaves already know the position and you just change the server name and start reading there?”
This one you should better to explain more clear way. How do you guarantee it ? Any server restart will change binary log positions on the slave not to mention flush logs etc. I guess with little bit of hacking you can reach it but I’m not sure it is worth the trouble compared to some math you have to do to change positions to the slave.
In any case I believe you you have working solution for your application, I’m just trying to figure out for which scope it works well. There are no perfect solutions out where.
Xaprb,
This is very good point. There are bunch of documented things which are not handled by MySQL 5.0 replication for example UUID and I guess there are more less known things like UPDATE/DELETE statements with LIMIT which can have different result depending on data layout and of course bugs.
This is why I recommend taking people taking backups from the Master whenever possible.
Regarding your script, I’m curious if it works with active replication going as this is when you run into the problems as you need to checksum tables at the state, which would require table lock for checksum duration for MyISAM tables.
I’m a little confused… Does software interacting with a master-master setup have to be specially written? Or can I take an out of the box piece of software like freeradius and expect it to work properly?
Jason,
It depends on what are you looking for. You can take any software and use it with MMM Cluster, using only Master for both reads and writes and passive node simply for fallbacks.
If you want to send some queries to second node you will need to change software appropriately.
Could you please explain me on how the MySQL Multi Master replication model is working or how it is different from a Master – Slave replication in terms of the latency since both are asynchronous. How in the Master Master replication the latency is less?
Apart from the above I have other questions which will help me in deciding on a solution for our project,
1. Whether a Backup server is required in a model of Master-Master replication and when I have slaves of each master in a different machine, I totally will have 4 machines in each cluster.
2. How is the Alter table should be done in a Master-Master replication model? How do we distinguish between a destructive and non destructive schema change?
Srini
Srini,
I’ve answered your question here: http://groups.google.com/group/mmm-devel/browse_thread/thread/2738b71da466a3eb.
Alexey
Thanks for your reply Alexey. I have couple of questions given below.
> With this “Master-Master” in the project name we mean Active+Passive
> master scheme which is the same as Master+Slave, but one of the slaves
> being set up to be spare master and be able to handle writes when
> active master dies.
Does that mean that there will be no latency difference in terms of a
Master-Master and Master-Slave (no additional priority for the passive
master to get replication updates when compared to a pure slave) ,
expect for the fact that it is Administratively easier to bring back
the failed master in Master-Master.
> Something like this:
> # mmm_control set_offline
> …doing schema change on the passive master…
> # mmm_control set_online
> # mmm_control move_role writer
> # mmm_control set_offline
> …doing schema change on former active master…
> # mmm_control set_online
> So, major idea of this process with mmm is that you could say “some
> server is offline” and it would be offloaded and be available for any
> changes/backups/etc.
In the above scenario how to restore the changes that occurred to the
active master when the passive master is offline. How the changes
during that period will be applied to the passive master when both are
online.
Moreover on the destructive/non destructive schema change, I assume
data type change, new column additions,modify column to be non
destructive and drop a column, table name changes to be destructive.
Would you recommend the above approach to both these cases.
Hi everyone.
Ok here is what i dont understand.
In a typical MASTER_AMASTER_B >> MANY_REPLICA how do you handle, if you set up all slave to replicate from MASTER_A and its failes (mean the server go offline) that all the replicaes change their master to MASTER_B without loosing data in an MMM cluster?
Thanks in advance.
drk
Hi! i am new in mysql and trying to establish master master replication in mysql
on first server i have added these lines to mysqld section of my.cnf…
[Ed: rest of comment deleted. Please use a forum or mailing list.]
Pradeep, you need to use a mailing list to get help with things like this.
Thank you for describing how to set up MySQL master-master replication. It works pretty well for me.