April 20, 2014

Multicast replication in Percona XtraDB Cluster (PXC) and Galera

Bandwidth multiplication and synchronous clusters

I’ve seen a lot of people setting up clusters with 3-6+ nodes on 1 Gbps networks.  1 Gbps seems like a lot, doesn’t it?  Actually, maybe not as much as you think.  While the theoretical limit of 1Gbps is actually 120MBps, I start to get nervous around 100MBps. By default Galera uses unicast TCP for replication.  Because synchronous replication needs to replicate to all nodes at once, this means 1 copy of your replication message is sent to other node in the cluster.  The more nodes in your cluster, the more the bandwidth required for replication multiplies. Now, this isn’t really much different from standard mysql replication.  1 master with 5 async slaves is going to send a separate replication stream to each, so your bandwidth requirements will be similar.  However, with async replication you have the luxury of not blocking the master from taking writes if bandwidth is constrained and the slaves lag for a bit, not so in Galera. So, let’s see this effect in action.  I have a simple script that outputs the network throughput on an interface every second.  I’m running a sysbench test on one node and measuring the outbound (UP) bandwidth on that same node:

This isn’t much traffic in my puny local VMs, but you get the idea.  We can clearly see some factor in play adding the extra nodes.

Multicast to the rescue!

One way to address this bandwidth constraint is to switch to multicast UDP replication in Galera.  This is actually really easy to do. First, we need to make sure our environment will support multicast.  This is a question for your network guys and beyond the scope of this post, but in my trivial VM environment, I just need to make sure that the multicast address space routes to my Galera replication interface, eth1:

In that space, we pick an unused mcast address (again, talk to your network guys).  I’m using 239.192.0.11, so we’ll add this to our my.cnf:

If you already have wsrep_provider_options set, add it to the semicolon separated list instead of a separate line in your config. If we already have a running cluster, we need to shut it down, configure our mcast address and re-bootstrap it:

We can see that a multicast node still needs to bind to the Galera replication port, and of course that needs to be bound to the interface that  the multicast will be received on.

Now, let’s re-do our above test:

So, we can see our outbound bandwidth on our master node doesn’t change as we add more nodes when we are using multicast.

Other multicast tips

We can also also bootstrap nodes using the mcast address:

And this works fine.  Pretty slick! Note that IST and SST will still use TCP unicast, so we still want to make sure those are configured to use the regular IP of the node. Typically I just set the wsrep_node_address setting on each node if this IP is not the default IP of the server. I could not find a way to migrate an existing unicast cluster to multicast with a rolling update.  I believe (but could be proven wrong) that you must re-bootstrap your entire cluster to enable multicast.

About Jay Janssen

Jay joined Percona in 2011 after 7 years at Yahoo working in a variety of fields including High Availability architectures, MySQL training, tool building, global server load balancing, multi-datacenter environments, operationalization, and monitoring. He holds a B.S. of Computer Science from Rochester Institute of Technology.