April 20, 2014

Changing an async slave of a PXC cluster to a new Master

Async and PXC

A common question I get about Percona XtraDB Cluster is if you can mix it with asynchronous replication, and the answer is yes!  You can pick any node in your cluster and it can either be either a slave or a master just like any other regular MySQL standalone server (Just be sure to use log-slave-updates in both cases on the node in question!).  Consider this architecture:

Canvas 1

However, there are some caveats to be aware of.  If you slave from a cluster node, there is no built in mechanism to fail that slave over automatically to another master node in your cluster.  You cannot assume that the binary log positions are the same on all nodes in your cluster (even if they start binary logging at the same time), so you can’t issue a CHANGE MASTER without knowing the proper binary log position to start at.

Canvas 2

Until recently, I thought it was not possible to easily do this without scanning both the relay log on the slave and the binary log on the new master and trying to match up the (binary) replication event payloads somehow.  It turns out it’s actually quite easy to do without the old master being available and without pausing writes on your cluster.

Galera Seqnos and Binary log Xids

Given the above scenario, we have a 2 node cluster with node3 async replicating from node1.  Writes are going to node2.  If I stop writes briefly and look at node1′s binary log, I can see something interesting:

Notice the Xid field. Galera is modifying this field that exists in a standard binlog with its last commited seqno:

If I check node2, sure enough its latest binlog has the same Xid, even though it’s in a different file at a different position:

So, with both nodes doing log-bin (and both with log-slave-updates), even though they are on different log files, they still have the same Xid that is very easy to search for.

But, does our slave (node3) know about these Xids? If we check the latest relay log:

Yes, indeed! We’re assuming the SQL thread on this slave has processed everything in the relay log and, if it has, then we know that the Galera transaction seqno 86277 was the last thing written on the slave. If it hasn’t, you’d have to find the last position applied by the SQL thread using SHOW SLAVE STATUS and find the associated Xid with that position.

Testing it out

Now, let’s simulate a failure to see if we can use this information to our advantage. I have write load running against node2 and I kill node1:

The node failure takes a little while. This is because it was a 2 node cluster and we just lost quorum. Note the node goes to the Init state and it is also non-Primary. This also happened to kill our test client. If we check our slave, we can see it is indeed disconnected since node1 went away:

Since our remaining node is non-Primary, we need to bootstrap it first and restart our load. (Note: This is only because it was a 2 node cluster, this is not strictly necessary to do this procedure in general).

Ok, so our cluster of 1 is back up and taking writes. But, how can we CHANGE MASTER so node3 replicates from node1? First we need to find node3′s last received Xid:

We can tell from SHOW SLAVE STATUS that all of the relay log was processed, so we know the last applied Xid was 88085. Checking node2, we can indeed tell the cluster has progressed further:

It’s easy to grep for node3′s last Xid in node2′s binlog (if the slave has been disconnected a while, you may have to search a bit more through the binlogs):

So, on node2, Xid 88085 is in binary log node2.000002, position 2973899.  2973899 is the “end” position, so that should be where we start from on the slave.  Back on node3 we can now issue a correct CHANGE MASTER:

Of course, we are still scanning relay logs and binary logs, but matching Xid numbers is a lot easier than trying to match RBR payloads.

Summary

  • Galera writes the seqno (part of its GTID) to node’s binary logs (if enabled) in the Xid field
  • Xid is consistent across all nodes in a cluster, regardless of individual binary log position
  • This Xid field is propagated to slaves and can be matched up in their relay logs
  • Those slaves that are also binary logging with log-slave-updates will overwrite Xid, so you this isn’t a true GTID for async in 5.5.
About Jay Janssen

Jay joined Percona in 2011 after 7 years at Yahoo working in a variety of fields including High Availability architectures, MySQL training, tool building, global server load balancing, multi-datacenter environments, operationalization, and monitoring. He holds a B.S. of Computer Science from Rochester Institute of Technology.

Comments

  1. It’s even possible to use async replication between galera clusters, but then you need to be aware of a pt-table-checkum/sync bug.

    https://bugs.launchpad.net/percona-toolkit/+bug/1169853

Speak Your Mind

*