August 27, 2014

Using pt-table-checksum with Percona XtraDB Cluster

As of Percona Toolkit v2.1.5, pt-table-checksum works correctly with Percona XtraDB Cluster, but it doesn’t work quite like a traditional replication setup because cluster nodes are not like traditional replicas.  In this post I demonstrate how to use pt-table-checksum with Percona XtraDB Cluster.

First, you’ll need Percona Toolkit v2.1.5 or newer and Percona XtraDB Cluster 5.5.27-23.6 or newer (these are the current versions at the time of writing).  Second, I presume that you already know how to use pt-table-checksum and that your cluster is already setup.  If the later presumption is false, then read the Percona XtraDB Cluster documentation or watch the Percona XtraDB Cluster – Installation and Setup webinar.  My setup has 3 nodes called “node300″, “node4000″, and “node6000″ on ports 4000, 5000, and 6000 respectively.

The first thing you must do is create a “DSNs table” because pt-table-checksum cannot currently auto-detect cluster nodes.  My DSNs table contains:

There is a row for each node in the cluster, specifying the node’s DSN.  (The id and parent_id columns are not used by pt-table-chekcusm yet, so their values don’t matter.)  Since all these nodes are in a cluster, this table should exist on all nodes.  This means you can run pt-table-checksum on any node.  I’ll run the tool on node4000.

The second thing you must do is specify “–recursion-method dsn=DSN-TABLE” when running pt-table-checksum, where “DSN-TABLE” is a DSN specifying the above DSNs table.  For example:

That makes pt-table-checksum connect to the given host (127.0.0.1:4000), and select all rows from the given table (percona.dsns), and check those nodes for differences.  Without this option, the tool may throw an error like:

If the tool detects that the master host (the main DSN specified on the command line) is a cluster node but no other cluster nodes were specified and no regular replicas were detected, then it could checksum but it could not detect any differences, so it throws that error rather than doing only half its job.  Furthermore, not being able to detect differences might be misinterpreted as there being no differences, so it’s better to error than potentially mislead users.

The aforementioned hints at something else you should know: pt-table-checksum can work with cluster nodes and regular replicas at the same time, but since cluster nodes require –recursion-method=dsn, you must also specify any regular replicas in the DSN table, else pt-table-checksum will not check them for differences.

Here’s the abbreviated output of a real run:

The tool works as usual: checking every database and table.  So far there are no differences, but we’ll run it again with a difference on one node to prove that it works.  Right now, though, notice the three warnings at the beginning of the output: pt-table-checksum cannot and will not check the replica lag on cluster nodes because SHOW SLAVE STATUS on a cluster node doesn’t work (because a node isn’t a slave) and cluster nodes should not be out of sync.

Now let’s make a difference on node5000 and verify that pt-table-checksum detects it when running on node4000:

Ok, so row 30 has value 11 on node5000.  Let’s double-check that it’s different (the original value) on the other nodes:

Ok, so we know there’s a difference on node5000, now let’s see if pt-table-checksum detects it:

Success!  The difference was detected.  Since we’re still “PXC-approving” pt-table-sync, it’s best that a DBA manually investigate any differences.

In summary:

Please submit bugs if found.

Comments

  1. You skipped the most interesting part: How does it work?

  2. Scott Haas says:

    Daniel -
    Can you provide a refresh of this article, based on new recursion-method=cluster?

    Thanks,
    Scott

  3. Nick says:

    PXC required binlog_format=ROW on all nodes. But pt-table-checksum required running on binlog_format=STATEMENT so this node should not be a cluster node. How it works?

  4. joel says:

Speak Your Mind

*