Using pt-table-checksum with Percona XtraDB Cluster

As of Percona Toolkit v2.1.5, pt-table-checksum works correctly with Percona XtraDB Cluster, but it doesn’t work quite like a traditional replication setup because cluster nodes are not like traditional replicas. In this post I demonstrate how to use pt-table-checksum with Percona XtraDB Cluster.

First, you’ll need Percona Toolkit v2.1.5 or newer and Percona XtraDB Cluster 5.5.27-23.6 or newer (these are the current versions at the time of writing). Second, I presume that you already know how to use pt-table-checksum and that your cluster is already setup. If the later presumption is false, then read the Percona XtraDB Cluster documentation or watch the Percona XtraDB Cluster – Installation and Setup webinar. My setup has 3 nodes called “node300”, “node4000”, and “node6000” on ports 4000, 5000, and 6000 respectively.

The first thing you must do is create a “DSNs table” because pt-table-checksum cannot currently auto-detect cluster nodes. My DSNs table contains:

node4000> SELECT * FROM percona.dsns;
+----+-----------+-----------------------+
| id | parent_id | dsn                   |
+----+-----------+-----------------------+
| 1  | NULL      | h=127.1,P=4000,u=root |
| 2  | NULL      | h=127.1,P=5000,u=root |
| 3  | NULL      | h=127.1,P=6000,u=root |
+----+-----------+-----------------------+

node4000> SELECT * FROM percona.dsns;

+----+-----------+-----------------------+

| id | parent_id | dsn |

+----+-----------+-----------------------+

| 1 | NULL | h=127.1,P=4000,u=root |

| 2 | NULL | h=127.1,P=5000,u=root |

| 3 | NULL | h=127.1,P=6000,u=root |

+----+-----------+-----------------------+

There is a row for each node in the cluster, specifying the node’s DSN. (The id and parent_id columns are not used by pt-table-chekcusm yet, so their values don’t matter.) Since all these nodes are in a cluster, this table should exist on all nodes. This means you can run pt-table-checksum on any node. I’ll run the tool on node4000.

The second thing you must do is specify “–recursion-method dsn=DSN-TABLE” when running pt-table-checksum, where “DSN-TABLE” is a DSN specifying the above DSNs table. For example:

--recursion-method dsn=h=127.1,P=4000,u=root,D=percona,t=dsns

1	--recursion-method dsn=h=127.1,P=4000,u=root,D=percona,t=dsns

That makes pt-table-checksum connect to the given host (127.0.0.1:4000), and select all rows from the given table (percona.dsns), and check those nodes for differences. Without this option, the tool may throw an error like:

node4000 is a cluster node but no other nodes or regular replicas were found.
Use --recursion-method=dsn to specify the other nodes in the cluster.

1 2	node4000 is a cluster node but no other nodes or regular replicas were found. Use --recursion-method=dsn to specify the other nodes in the cluster.

If the tool detects that the master host (the main DSN specified on the command line) is a cluster node but no other cluster nodes were specified and no regular replicas were detected, then it could checksum but it could not detect any differences, so it throws that error rather than doing only half its job. Furthermore, not being able to detect differences might be misinterpreted as there being no differences, so it’s better to error than potentially mislead users.

The aforementioned hints at something else you should know: pt-table-checksum can work with cluster nodes and regular replicas at the same time, but since cluster nodes require –recursion-method=dsn, you must also specify any regular replicas in the DSN table, else pt-table-checksum will not check them for differences.

Here’s the abbreviated output of a real run:

$ ./pt-table-checksum h=127.1,P=4000,u=root \
--recursion-method dsn=h=127.1,P=4000,u=root,D=percona,t=dsns
Not checking replica lag on node4000 because it is a cluster node.
Not checking replica lag on node5000 because it is a cluster node.
Not checking replica lag on node6000 because it is a cluster node.
            TS ERRORS DIFFS ROWS CHUNKS SKIPPED   TIME TABLE
10-05T14:02:17      0     0    0      1       0  0.025 mysql.columns_priv
10-05T14:02:17      0     0    0      1       0  0.019 mysql.db
...
10-05T14:02:18      0     0    3      1       0  0.023 percona.dsns
10-05T14:02:18      0     0   10      1       0  0.032 test.t

$ ./pt-table-checksum h=127.1,P=4000,u=root \

--recursion-method dsn=h=127.1,P=4000,u=root,D=percona,t=dsns

Not checking replica lag on node4000 because it is a cluster node.

Not checking replica lag on node5000 because it is a cluster node.

Not checking replica lag on node6000 because it is a cluster node.

TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE

10-05T14:02:17 0 0 0 1 0 0.025 mysql.columns_priv

10-05T14:02:17 0 0 0 1 0 0.019 mysql.db

...

10-05T14:02:18 0 0 3 1 0 0.023 percona.dsns

10-05T14:02:18 0 0 10 1 0 0.032 test.t

The tool works as usual: checking every database and table. So far there are no differences, but we’ll run it again with a difference on one node to prove that it works. Right now, though, notice the three warnings at the beginning of the output: pt-table-checksum cannot and will not check the replica lag on cluster nodes because SHOW SLAVE STATUS on a cluster node doesn’t work (because a node isn’t a slave) and cluster nodes should not be out of sync.

Now let’s make a difference on node5000 and verify that pt-table-checksum detects it when running on node4000:

node5000> SET SQL_LOG_BIN=0;
Query OK, 0 rows affected (0.00 sec)

node5000> UPDATE test.t SET val=val+1 WHERE id=30;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0

node5000> SELECT * FROM test.t WHERE id=30;
+----+------+
| id | val  |
+----+------+
| 30 | 11   |
+----+------+

node5000> SET SQL_LOG_BIN=0;

Query OK, 0 rows affected (0.00 sec)

node5000> UPDATE test.t SET val=val+1 WHERE id=30;

Query OK, 1 row affected (0.00 sec)

Rows matched: 1 Changed: 1 Warnings: 0

node5000> SELECT * FROM test.t WHERE id=30;

+----+------+

| id | val |

+----+------+

| 30 | 11 |

+----+------+

Ok, so row 30 has value 11 on node5000. Let’s double-check that it’s different (the original value) on the other nodes:

node4000> SELECT * FROM test.t WHERE id=30;
+----+------+
| id | val  |
+----+------+
| 30 | 10   |
+----+------+

node6000> SELECT * FROM test.t WHERE id=30;
+----+------+
| id | val  |
+----+------+
| 30 | 10   |
+----+------+

node4000> SELECT * FROM test.t WHERE id=30;

+----+------+

| id | val |

+----+------+

| 30 | 10 |

+----+------+

node6000> SELECT * FROM test.t WHERE id=30;

+----+------+

| id | val |

+----+------+

| 30 | 10 |

+----+------+

Ok, so we know there’s a difference on node5000, now let’s see if pt-table-checksum detects it:

$ ./pt-table-checksum h=127.1,P=4000,u=root -d test \
--recursion-method dsn=h=127.1,P=4000,u=root,D=percona,t=dsns
Not checking replica lag on node4000 because it is a cluster node.
Not checking replica lag on node5000 because it is a cluster node.
Not checking replica lag on node6000 because it is a cluster node.
            TS ERRORS DIFFS ROWS CHUNKS SKIPPED   TIME TABLE
10-05T14:47:52      0     1   10      1       0  0.023 test.t

$ ./pt-table-checksum h=127.1,P=4000,u=root -d test \

--recursion-method dsn=h=127.1,P=4000,u=root,D=percona,t=dsns

Not checking replica lag on node4000 because it is a cluster node.

Not checking replica lag on node5000 because it is a cluster node.

Not checking replica lag on node6000 because it is a cluster node.

TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE

10-05T14:47:52 0 1 10 1 0 0.023 test.t

Success! The difference was detected. Since we’re still “PXC-approving” pt-table-sync, it’s best that a DBA manually investigate any differences.

In summary:

pt-table-checksum v2.1.5 and newer works with Percona XtraDB Cluster 5.5.27-23.6 and newer
You must run pt-table-checksum with –recrusion-method=dsn
In the DSN table you must specify all the cluster node that you want to check
Any differences should be investigated manually until pt-table-sync is “PXC-approved”
There are other limitations when using pt-table-checksum with Percona XtraDB Cluster

Please submit bugs if found.

6 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Henrik Ingo

11 years ago

You skipped the most interesting part: How does it work?

Scott Haas

11 years ago

Daniel –
Can you provide a refresh of this article, based on new recursion-method=cluster?

Thanks,
Scott

Nick

10 years ago

PXC required binlog_format=ROW on all nodes. But pt-table-checksum required running on binlog_format=STATEMENT so this node should not be a cluster node. How it works?

joel

9 years ago

No longer works properly.

https://bugs.launchpad.net/percona-toolkit/+bug/1316863

Marc Castrovinci

9 years ago

Joel, It’s fixed with PXC 5.6.20. Horray!

K Thomas Kuruvilla

7 years ago

I have two entries in the dsns table but it does only one of the slaves.

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Using pt-table-checksum with Percona XtraDB Cluster

Related

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis Replication and Auto-Failover With Sentinel Service

Seamless Table Modifications: Leveraging pt-online-schema-change for Online Alterations

Valkey/Redis: Sets and Sorted Sets

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Using pt-table-checksum with Percona XtraDB Cluster

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis Replication and Auto-Failover With Sentinel Service

Seamless Table Modifications: Leveraging pt-online-schema-change for Online Alterations

Valkey/Redis: Sets and Sorted Sets

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation