The MySQL Master-Master replication (often in active-passive mode) is popular pattern used by many companies using MySQL for scale out. Most of the companies would have some internal scripts to handle things as automatic fallback and slave cloning but no Open Source solution was made available.

Few months ago we were asked to implement such solution for one of the customers and they kindly agreed to let us release things under GPL2 License, and we gave them reduced rate for being Open Source friendly.

So what does this tool do and how it works ?

Currently it is implemented based on Linux IP management tools and LVM for snapshot creation but we hope support for other operation systems added in the future.

It can manage master-master pair as well as other configurations such as master-master and bunch of slaves.

Typically you would define “roles” for example READER and WRITER roles for most simply case and assign them to the severs. For example you can say both servers in pair can be reader at the same time but only one of them should be writable at the same time for master-master pair.

Each of the roles will have pair of IPs associated with it in this case and it will make sure all of these IPs are handled by some server, so you can use DNS for load balancing without worrying about TTL and similar things.

If Active server fails in the pair both its READER and WRITER role will be taken over by passive node and depending on monitoring configuration it can happen within few seconds.

Such IP based high availability and load balancing does not require any extra hardware or software and works well for bunch of applications which could be implemented using different languages etc, which makes application based fallbacks problematic.

The tool also takes extra caution to prevent application mistakes. What will happen if you will write to the slave because of application error ? Well you’ll break replication often without knowing about it. You can kind of solve it with using read-only user for slave connection but what if your application simply was misconfigured and things the server is master when it is not ? To take care of this Master Master Manager makes sure only one of the nodes is writable at all times and other is set to –read-only, so unless you use user with SUPER privilege you should be safe.

It has some other neat features, for example you may configure it to remove READER role from the server if it gets too delayed with replication (or if replication breaks) so you do not have to do it in each application.

We also took extra caution about making sure things can’t run out of sync silently. For example you might know if slave server reboots (say power goes down) you can’t be sure about data consistency because replication may be restarted with wrong position. Sometimes this shows up as an errors but for some query pattern it will not. MMM will detect this situation and will hold the server for administrator to decide.

One command LVM based sync is also implemented so restoring broken replication safe way becomes very easy.
Besides simple “cloning” of the nodes you can use same tool to create a backup with number of methods supported including compressed backup or incremental backups with rdiff.

Finally we have implemented safe role switch, meaning you can move writer to other node in clean way (making sure replication is still in sync) – I see too often this switch happens “dirty” way potentially risking replication inconsistencies. This is very handy if you want to restart one of servers in clusters to upgrade OS or add more RAM to the system.

This tool works well for number of customers and users but it is surely early software version so try it on your own risk and make sure to provide your feedback and suggestions.

You can learn some more about MMM here, download the tool here and have your questions answered and suggestions welcomed here

22 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Kevin Burton

Hey.

I’ll have to take a look.

Sounds a little bit like lbpool.

We’ve been working on automatic master promotion and it should be in the next version of lbpool.

We found a cool little hack where you start with a series of slaves and the clients can promote a master. Then that master can fail and the clients can promote a new master.

The cool thing is that since any slave can become a master there’s no need for special hardware on the master like RAID 10….. you can just use RAID 0…. Since you have three slave (any of which can become master) you’re redundant anyway.

Onward!

Kevin Burton

Another thing… MyISAM tables support generation of checksums. I was thinking of using this to detect inconsistent states in the cluster. You could lock the master and then wait for the slaves to catch up and then compare their checksums.

Onward!

Kevin Burton

We’ve solved the problem of missed transactions too.

We emulate synchronous commit on the client and wait for that binary log position to execute on at least ONE of the slaves. That way if the master fails that data is on the next slave that would become the master.

Kevin

Kevin Burton

How do you make replication synchronous to the standby master?

Wouldn’t it have the same latency that my approach would?

“I also should mention this approach does not guaranty you master and slave are 100% in sync, it is simply barrier to guaranty slave had caught up to certain point.”

Yes it does…. because if a master crashes in during a commit the commit would be aborted and run on the new master.

“For example if master crashes slave may never be able to go to the specified position while some other slave could be more lucky.”

No. That isn’t an issue. When doing promotion you find the slave which is the most up to date with the master and that becomes the new master. You don’t just pick a random slave….

Kevin

Kevin Burton

“I do not make replication synchronous to standby master. If master crashes some transactions can be lost (though small one as relay logs usually get most of updates even if replication itself is delayed)”

Ah… ok. So in this situation it’s just as good as my async replication feature. So both our solutions are the same here. You could still implement synchronous replication if you wanted. It’s at least nice to have that functionality for some applications.

“It is however lesser problem if you simple have master-master pair compared to 5 slaves which all can be at slightly different positions.”

No… that’s the point I was trying to make. I’ve solved that problem. You just select the slave least behind the master. Since they ALL use the same binary log offsets the slaves can be reparented to the new master.

The new master is selected by looking at all the slaves and finding the slave closest to the master.

“Now I do not exactly understand how do commit on both master and slave ? What I’m saying is following example you’ve committed on master successfully but master_pos_wait on the slave was not successful, what you can say at this stage ? Nothing.”

Huh? I don’t understand. We don’t do MASTER_POS_WAIT() but emulate it in the client. You’d only need this wen doign synchronous writes. It’s entirely optional.

“Regarding slave selection – it is issue which you can solve yes. Finding most up to date slave and re-positioning all other slaves to appropriately adjusted positions. It is however rather complicated.”

No… 🙂

that’s the point I was trying to make. I had anepiphany which makes this all possible.

I was able to find a REALLY trivial way to solve this. It’s a bit difficult to explain but it’s one of those “duh.. I should have thought of that” moments.

You basically have all the slaves write in the same binary log positions as the master. Then the new slaves already know the position and you just change the server name and start reading there?

Does that make sense? Maybe I’ll have to create a diagram for this.

Kevin

Kevin Burton

Oh.. and amazingly I already wrote an article about this 🙂

http://www.feedblog.org/2007/02/a_simple_protoc_1.html

It’s pretty much my current thinking ….

Onward!

Xaprb

Kevin,

“Yes it does…. because if a master crashes in during a commit the commit would be aborted and run on the new master.”

That is storage engine specific, or perhaps I don’t understand what you really mean.

I wrote MySQL Table Checksum to do fast, parallel table checksums across many servers at a time. As far as I know, this is the only tool in existence that can definitively say “yes, the slave is completely identical to the master.”

One thing this has taught me is how frequently there is silent corruption on the slaves, even in the absence of errors. I think most people assume if there are no errors, the slaves will have the same data. If my experience is similar to others, that is a wrong assumption. We get slaves with different data all the time, when there are no errors.

Glader

“assign them to the severs”
servers?

Jason Frisvold

I’m a little confused… Does software interacting with a master-master setup have to be specially written? Or can I take an out of the box piece of software like freeradius and expect it to work properly?

Srini

Could you please explain me on how the MySQL Multi Master replication model is working or how it is different from a Master – Slave replication in terms of the latency since both are asynchronous. How in the Master Master replication the latency is less?

Apart from the above I have other questions which will help me in deciding on a solution for our project,

1. Whether a Backup server is required in a model of Master-Master replication and when I have slaves of each master in a different machine, I totally will have 4 machines in each cluster.
2. How is the Alter table should be done in a Master-Master replication model? How do we distinguish between a destructive and non destructive schema change?

Srini

Alexey Kovyrin
Srini

Thanks for your reply Alexey. I have couple of questions given below.

> With this “Master-Master” in the project name we mean Active+Passive
> master scheme which is the same as Master+Slave, but one of the slaves
> being set up to be spare master and be able to handle writes when
> active master dies.

Does that mean that there will be no latency difference in terms of a
Master-Master and Master-Slave (no additional priority for the passive
master to get replication updates when compared to a pure slave) ,
expect for the fact that it is Administratively easier to bring back
the failed master in Master-Master.

> Something like this:
> # mmm_control set_offline
> …doing schema change on the passive master…
> # mmm_control set_online
> # mmm_control move_role writer
> # mmm_control set_offline
> …doing schema change on former active master…
> # mmm_control set_online

> So, major idea of this process with mmm is that you could say “some
> server is offline” and it would be offloaded and be available for any
> changes/backups/etc.

In the above scenario how to restore the changes that occurred to the
active master when the passive master is offline. How the changes
during that period will be applied to the passive master when both are
online.

Moreover on the destructive/non destructive schema change, I assume
data type change, new column additions,modify column to be non
destructive and drop a column, table name changes to be destructive.
Would you recommend the above approach to both these cases.

Istvan Podor

Hi everyone.

Ok here is what i dont understand.
In a typical MASTER_AMASTER_B >> MANY_REPLICA how do you handle, if you set up all slave to replicate from MASTER_A and its failes (mean the server go offline) that all the replicaes change their master to MASTER_B without loosing data in an MMM cluster?

Thanks in advance.
drk

pradeep

Hi! i am new in mysql and trying to establish master master replication in mysql

on first server i have added these lines to mysqld section of my.cnf…

[Ed: rest of comment deleted. Please use a forum or mailing list.]

Baron Schwartz

Pradeep, you need to use a mailing list to get help with things like this.

Thank you for describing how to set up MySQL master-master replication. It works pretty well for me.