Realtime stats to pay attention to in Percona XtraDB Cluster and Galera

I learn more and more about Galera every day. As I learn more, I try to keep my myq_gadgets toolkit up to date with what I consider is important to keep any eye on on a PXC node. In that spirit, I just today pushed some changes to the ‘wsrep’ report, and I thought I’d go over some of the status variables and metrics being tracked there with the aim to show folks what they should be watching (at least from my opinion, this is subject to change!).

First, let’s take a look at the output:

[root@node3 ~]# myq_status -t 1 wsrep
Wsrep    Cluster        Node           Queue   Ops     Bytes     Flow        Conflct
    time  name P cnf  #  name  cmt sta  Up  Dn  Up  Dn   Up   Dn pau snt dst lcf bfa
19:17:01 trime P   3  3 node3 Sync T/T   0   0  35  40  54K  61K 0.0   0  17   0   2
19:17:03 trime P   3  3 node3 Sync T/T   0   0  70  85 107K 124K 0.0   0  13   0   2
19:17:04 trime P   3  3 node3 Sync T/T   0   0  72  81 111K 121K 0.0   0  16   0   3
19:17:05 trime P   3  3 node3 Sync T/T   0   0  70  85 108K 124K 0.0   0  17   0   4
19:17:06 trime P   3  3 node3 Sync T/T   0   0  66  82 100K 124K 0.0   0  17   0   3
19:17:07 trime P   3  3 node3 Sync T/T   0   0  68  78 105K 117K 0.0   0  22   0   0
19:17:08 trime P   3  3 node3 Sync T/T   0   0  65  93 101K 135K 0.0   0  14   1   5
19:17:09 trime P   3  3 node3 Sync T/T   0   0  73  83 111K 125K 0.0   0  19   0   3
19:17:10 trime P   3  3 node3 Sync T/T   0   0  30  46  46K  66K 0.0   0  10   0   2
19:17:12 trime P   3  3 node3 Sync T/T   0   0  64  80  97K 120K 0.0   0  19   0   4
19:17:13 trime P   3  3 node3 Sync T/T   0   0  69  88 106K 131K 0.0   0  28   0   1
19:17:14 trime P   3  3 node3 Sync T/T   0   0  70  83 106K 121K 0.0   0  11   0   3
19:17:15 trime P   3  3 node3 Sync T/T   0   0  72  84 111K 126K 0.0   0  15   0   3

[root@node3 ~]# myq_status -t 1 wsrep

Wsrep Cluster Node Queue Ops Bytes Flow Conflct

time name P cnf # name cmt sta Up Dn Up Dn Up Dn pau snt dst lcf bfa

19:17:01 trime P 3 3 node3 Sync T/T 0 0 35 40 54K 61K 0.0 0 17 0 2

19:17:03 trime P 3 3 node3 Sync T/T 0 0 70 85 107K 124K 0.0 0 13 0 2

19:17:04 trime P 3 3 node3 Sync T/T 0 0 72 81 111K 121K 0.0 0 16 0 3

19:17:05 trime P 3 3 node3 Sync T/T 0 0 70 85 108K 124K 0.0 0 17 0 4

19:17:06 trime P 3 3 node3 Sync T/T 0 0 66 82 100K 124K 0.0 0 17 0 3

19:17:07 trime P 3 3 node3 Sync T/T 0 0 68 78 105K 117K 0.0 0 22 0 0

19:17:08 trime P 3 3 node3 Sync T/T 0 0 65 93 101K 135K 0.0 0 14 1 5

19:17:09 trime P 3 3 node3 Sync T/T 0 0 73 83 111K 125K 0.0 0 19 0 3

19:17:10 trime P 3 3 node3 Sync T/T 0 0 30 46 46K 66K 0.0 0 10 0 2

19:17:12 trime P 3 3 node3 Sync T/T 0 0 64 80 97K 120K 0.0 0 19 0 4

19:17:13 trime P 3 3 node3 Sync T/T 0 0 69 88 106K 131K 0.0 0 28 0 1

19:17:14 trime P 3 3 node3 Sync T/T 0 0 70 83 106K 121K 0.0 0 11 0 3

19:17:15 trime P 3 3 node3 Sync T/T 0 0 72 84 111K 126K 0.0 0 15 0 3

As I’ve mentioned before, myq_status gives an iostat-like output of your server. This tool takes what are usually global counters in SHOW GLOBAL STATUS and calculates the change each second and reports that. There’s lot of other reports it can run, but this one is focused on ‘wsrep%’ status variables.

It’s important to note that this reflects the status of a single PXC node in my cluster, node3 to be precise, so some information is cluster-wide, other information is specific to this particular node. I tend to open a window for each node and run the tool on each so I can see things across the entire cluster at a glance. Sometime in the future, I’d like to build a tool that polls every cluster node, but that’s not available currently.

Let’s go through the columns.

Cluster

There are 4 columns in the cluster section, and it’s important to understand that this tool only currently connections to a single node (by default, localhost). The state of the cluster could be divergent across multiple nodes, so be careful to not assume all nodes have these values!

name

The cluster’s name (first 5 characters). This is wsrep_cluster_name.

P

Either P for Primary or N for Non-primary. This is the state of this partition of the cluster. If a cluster gets split brained, then only a quorum (>=51%) of the remaining nodes will remain Primary. Non-primary clusters are the remaining minority and will not allow database operations.

cnf

This is wserep_cluster_conf_id — the version # of the cluster configuration. This changes every time a node joins or leaves the cluster. Seeing high values here may indicate you have nodes frequently dropping out and rejoining the cluster and you may need some retuning of some node timeouts to prevent this.

#

The number of nodes in the cluster.

Node

This is state data about the local node that the tool happens to be connected to.

name

The name of this local node (first 5 characters). This is handy when you have this tool running in several windows on several nodes.

cmt

This is the wsrep_local_state_comment — basically a plaintext word describing the state of the node in terms of the rest of the cluster. ‘Sync’ (Synced) is what you want to see, but ‘Dono’ (Donor), ‘Join’ (Joiner), and others are possible. This is handy to quickly spot which node was elected to Donate either SST or IST to another node entering the cluster.

sta

Short for state, this is two True/False values (T/F) for wsrep_ready and wsrep_connected. These are somewhat redundant with the local_state value, so I may remove them in the future.

Queue

This is information about the replication queues in both directions.

The ‘Up’ queue is outbound replication. This generally increases when some other node is having difficulty receiving replication events.

The ‘Dn’ (down) queue is inbound replication. Positive values here can be an indicator that this node is slow to apply replication writesets.

Ops

Ops are simply replication transactions or writesets. Up is outbound, i.e., where this node was the originator of the transaction. Dn is download, that is, transactions from other nodes in the cluster.

Bytes

Just like Ops, but in Bytes instead of transaction counts. I have seen production clusters having performance issues where I noticed that the Ops and Bytes went to Zeros on all the nodes for a few seconds, and then a massive 90M+ replication transaction came through. Using the Up and Dn columns, I could easily see which node was the originator of the transaction.

Flow

Flow gives some information about Flow Control events. Galera has some sophisticated ways of metering replication so lag does not become a problem.

pau

wsrep_flow_control_paused — This is the amount of time since the last time SHOW GLOBAL STATUS was run that replication was paused due to flow control. This is a general indicator that flow control is slowing replication (and hence overall cluster writes) down.

snt

wsrep_flow_control_sent — how many flow control events were SENT from this node. Handy to find the node slowing the others down.

dst

This does not go under the Flow group. This is wsrep_cert_deps_distance — This is a general indicator of how many parallel replication threads you could use. In practice I haven’t found this extremely helpful yet and I may remove this in the future. I think being aware of how Flow control works and watching flow control events and queue sizes is a better way to detect replication lag, and this really just tells you if multi-threaded replication could help improve replication speed at all.

Conflct

Replication conflicts, as described in my last post. lcf is local certification failures, and bfa is brute force aborts. This should be helpful to understand that these conflicts are or are not happening.

Interpreting the results

Let’s look at that output again and make some observations about our cluster and this node:

[root@node3 ~]# myq_status -t 1 wsrep
Wsrep    Cluster        Node           Queue   Ops     Bytes     Flow        Conflct
    time  name P cnf  #  name  cmt sta  Up  Dn  Up  Dn   Up   Dn pau snt dst lcf bfa
19:17:01 trime P   3  3 node3 Sync T/T   0   0  35  40  54K  61K 0.0   0  17   0   2
19:17:03 trime P   3  3 node3 Sync T/T   0   0  70  85 107K 124K 0.0   0  13   0   2
19:17:04 trime P   3  3 node3 Sync T/T   0   0  72  81 111K 121K 0.0   0  16   0   3
19:17:05 trime P   3  3 node3 Sync T/T   0   0  70  85 108K 124K 0.0   0  17   0   4
19:17:06 trime P   3  3 node3 Sync T/T   0   0  66  82 100K 124K 0.0   0  17   0   3
19:17:07 trime P   3  3 node3 Sync T/T   0   0  68  78 105K 117K 0.0   0  22   0   0
19:17:08 trime P   3  3 node3 Sync T/T   0   0  65  93 101K 135K 0.0   0  14   1   5
19:17:09 trime P   3  3 node3 Sync T/T   0   0  73  83 111K 125K 0.0   0  19   0   3
19:17:10 trime P   3  3 node3 Sync T/T   0   0  30  46  46K  66K 0.0   0  10   0   2
19:17:12 trime P   3  3 node3 Sync T/T   0   0  64  80  97K 120K 0.0   0  19   0   4
19:17:13 trime P   3  3 node3 Sync T/T   0   0  69  88 106K 131K 0.0   0  28   0   1
19:17:14 trime P   3  3 node3 Sync T/T   0   0  70  83 106K 121K 0.0   0  11   0   3
19:17:15 trime P   3  3 node3 Sync T/T   0   0  72  84 111K 126K 0.0   0  15   0   3