July 31, 2014

MySQL 5.7 multi-source replication

Recently Oracle announced several new features for the latest available development version of MySQL that is 5.7.2 at the time of writing this article. Most of them are performance and replication related that show us how incredible the new release will be.

In this post I’m going to try to explain in some easy steps how the new multi-source replication works and how we can configure it for our own tests. It is important to mention that this is a development release, so it is not production ready. Therefore this post is intend to people that want to test the new feature and see how it works with their application, always in a staging environment.

What is multi-source replication?

First, we need to have clear that multi-master and multi-source replication are not the same. Multi-Master replication is the usual circular replication where you can write on any server and data gets replicated to all others.

Multi master replication
Multi-source is different. MySQL replication had a limitation, fixed with this new release, that said that one slave could have only one master. That is a limiting factor when we are designing our replication environment. There were some “hacks” to make it work, but now there is an official way. So, in a nutshell, Multi-Source means that a slave can have more than one master. Now, replication environments like these one are possible:

Multi source replication
This will help us create some replication hierarchies that were impossible in the past. For example, you can have a slave in your office replicating the data from all servers that you have in the offices spread around the world.

How does it work?

Now we have the concept of communication channels. Each communication channel is a connection from slave to master to get the binary log events. That means we will have one IO_THREAD for each communication channel. We will need to run different “CHANGE MASTER” commands, one for each master, with the “FOR CHANNEL” argument that we will use to give a name to that channel.

Pretty easy. There is one single pre-requisite. The slave should have been configured first with the crash-safe feature of MySQL 5.6. That means that info usually included in master.info or relay-log.info should be on a table. Let’s start with the configuration.

Show me an example!

First you need to download the lab version of mysql from this link.

We have a sandbox environment with 2 masters and 1 slave servers. I won’t go over the details of how to configure the server_id, binary logs or replication users. I assume they are well configured. If you need a howto, you can follow this one.

First, we have to enable the crash safe feature on the slave:

After a restart of the slave we can start creating the channels with the names “master1″ and “master2″:

To start the slave processes you need to specify what channel are you referring to:

Now, we want to check the status of the slave:

Oh, it is empty. We have to specify again which channel we want to check:

and we can also check that the IO_THREAD and SQL_THREADS are running:

Let’s test it:

It works, that easy!

Conclusion

The new multi-source feature allow us to create new replication environments that were impossible in the past without some complex “hacks”. Of course, your application should be designed and developed with this new architecture in mind. Like with multi-master, multi-source needs special care to not end up with your data messed up.

MySQL replication is getting better on every release giving us more configuration, performance and design possibilities. And all those new features can be combined. Your replication environment can be even better if you mix some of the new (and old) features added recently to the replication. For example, you can configure GTID or enable multi-threaded slave per schema or intra-database.

About Miguel Angel Nieto

Miguel joined Percona in October 2011. He has worked as a System Administrator for a Free Software consultant and in the supporting area of the biggest hosting company in Spain. His current focus is improving MySQL and helping the community of Free Software to grow. Miguel's roles inside Percona are Senior Support Engineer and Manager of EMEA Support Team.

Comments

  1. Marc Castrovinci says:

    Great feature! It looks helpful for when you want to funnel sharded data into a single location so you can run OLAP type operations over all the data.

  2. Jonatas says:

    Very nice feature. Keeping eye on 5.7…

  3. shiv says:

    Hi Miguel,
    You can use the new replication performance schema tables to look at all the channels or choose which channel you want…

  4. shiv, yes, it is possible to get that info from P_S tables:

    replication_connection_configuration
    replication_execute_configuration

    but… in my personal opinion that should be in I_S. That info is not performance related, it just show us configuration information so it should be available via SHOW (for example SHOW SLAVE STATUS CHANNEL ALL) or via I_S tables. Maybe I’m wrong, but I’ve opened a feature request anyway:

    http://bugs.mysql.com/bug.php?id=70434

  5. Tariq says:

    I trying to imagine the practical implementation of this new ‘multi source replication’ and how does it really help having a slave getting replicated from multiple sources when the data_set_from_every_master is the same ?

    Consider the M-M replication master1 and master2 having the dataset X and I could easily run a slave1 either from Master1 or Master2 and possibly have many other slaves from slave1 or master[1-2] and so on, why would one need to replicate from multiple masters if data is all the same unless for some arcane reason i want to replicate a particular db from master 1 and another db2 from another master which is not necessarily in the M-M replication chain .

  6. Tariq, the data from masters don’t need to be the same. The masters on the second graph they are not in M-M replication. There are lot of examples where multi-source can be applied:

    - maybe you have hundred of apache servers, logging accesses to a mysql. in that case, having all those mysql replicating to a single slave could help you to run queries there.

    - or you have radar-controlled speed cameras all around the country and you need to have one single point to query the data.

    - or just the example made by Marc on the first comment.

    So, masters don’t need to have the same data set. Now practical implementation depends on what you need not on MySQL limitations.

  7. I will definitely test and blog this setup. Will make downtime-less migrations easier, without any doubt.

  8. Hugo says:

    Nice Nice.

    This will drastically reduce the complexity on our OLAP servers.

    Does this works nice if the salve uses BlackHole Engine for all the replicated databases?

  9. Hugo, this still works like regular replication, just with more IO and SQL threads. No limitation on what storage engine you use.

  10. Hi Miguel,

    For the multi-source replication, I would like to know if it is sufficient to upgrade the slave to 5.7.2, or do the masters also need to run the same mysql version?

  11. Terence, didn’t have time to check it. But the theory says “it should”. It just opens more “io threads”, so for the master server is like having two different slaves. But as I said, didn’t check it. If I get some spare time I will try it.

  12. Hi Miguel,

    Thanks for your reply. I just got multi-source replication working with MySQL 5.7.2 on my laptop with 2 vms running 5..

  13. I got multi-source replication running with 2 VMs with different MySQL versions (5.1.49 and 5.1.63). Thanks.

  14. Marko says:

    Hi all,
    I’ve just slave it in m-m with two percona’s 5.6 and it works as expected. Everything is replicated everywhere. I noticed. after 5.7.2 is restarted the replications on it can’t be started again and have to be setup again.

  15. rucypli says:

    is percona plans to add feathure Parallel Slave like MariaDB? thanks

  16. Mike Siekkinen says:

    During my (still labs-version 5.7.2) testing of multi source replication, it seems there is a bug where replication does not automatically pick back up after a server restart.

    Information remains in the slave_master_info table, how ever attempt to start slave for channel or show slave status for channel returns an error as if it had never been setup.

    Rerunning a full change master to command from the log positions left in slave_master_info table seemed to allow replication to resume.

    Aside from that one gotcha I ran into was trying to replicate a table with the now-reserved word “channel” for a column name. The application wasn’t encasing the column names in backticks to replication would break on a syntax error for those inserts.

  17. Mike Siekkinen says:

    Further testing off a real replication stream broke with an error about not being able to parse log events. This was trying to read binlogs from a percona 5.5.34 build. Between mixing percona builds with mysql labs builds and a big “TESTING ONLY” warning on the download I’ll wait for it to become more stable before pursuing further.

  18. Mike, thanks for your comments. That is what Oracle wants with Lab version, to have people breaking it like you are doing. It would be good if you could report those bugs on http://bugs.mysql.com/ That will help to make the final release even better :)

  19. Wayne Gaylard says:

    I am trying to do some testing but I keep getting syntax errors when trying to do CHANGE MASTER TO …. FOR CHANNEL = ‘channel_name’. Any ideas. Using 5.7.3, tried in Workbench and on command line – same result.

  20. s capkan says:

    I’m wondering how mysql does 5.7 handle auto_increment feature. What if i have two master without any communication and both of them has auto increment row, they can easily get same ids. Should i also use auto_increment_increment and auto_increment_offset parameters like master-master structure?.

  21. Mike Siekkinen says:

    @wayne, you have to download the laps specific version from http://labs.mysql.com/ Select multi source replication from the drop down.

  22. Mike Siekkinen says:

    *labs; I wish I could edit comments here

  23. Mike Siekkinen says:

    @s capkan, I believe the point of multi source replication is to get the databases from server A and server B into the same instance on the slave C instance. If both A and B hold the same database/table schemas I imagine you’re going to have a bad time.

    The auto_increment_offest is meant for multi master replication so they don’t step on each other since mysql replication is not synchronous.

  24. Alex says:

    We have many MySQL instances that are set up with the same set of databases for each instance. I was planning to use multi-source to create a single backup point and reporting point. The problem is that all the source instances have the same databases. Are there any considerations for a prefix on the database name, or something that creates a unique database name on the replica in such a scenario?

  25. Moody says:

    I have the same concern as the post above me, Alex. If Master1 and Master2 both have a schema called “test”, how does this conflict with multi-source replication? I would imagine there would be 2 problems, the first one being can’t have 2 schemas with the same name. Second, how does the slave know which schema to apply the slave event?

  26. alex says:

    have you checked if mysql 5.7.4 support this feature? when i issue this statement “change master to master_host=”127.0.0.1″, master_port=12047, master_user=”msandbox”,master_password=”msandbox” for channel=”master1″; , mysql reports an syntax error near ‘ for channel’. i havent found any document for ‘for channel’ clause. could you please give me some help ? thanks .

  27. Alex says:

    @alex: It is not in the release builds yet, you have to download the multi-source replication version from http://labs.mysql.com/

Speak Your Mind

*