Finding MySQL Replication Latency in Percona XtraDB Cluster

Investigating MySQL Replication Latency in Percona XtraDB Cluster

I was curious to check how Percona XtraDB Cluster behaves when it comes to MySQL replication latency — or better yet, call it data propagation latency. It was interesting to see whenever I can get stale data reads from other cluster nodes after write performed to some specific node. To test it I wrote quite a simple script (you can find it in the end of the post) which connects to one node in the cluster, performs an update and then immediately does the read from second node. If the data has been already propagated — good, if not we’ll continue to retry reads until it finally propagates, and then measure the latency. This is used to see whenever application can see any stale reads.

My setup is 3 Percona XtraDB Cluster nodes talking through dedicated 1Gbit cluster network (DPE1, DPE2, DPE3) and I’m running the test from 4th server (SMT2) so it is pretty realistic setup from typical data center latency point of view though the server hardware is not the most recent.

First lets look at baseline when cluster has no load but running the script doing writes to DPE1 and immediately reading from DPE2

Summary: 94 out of 10000 rounds (0.94%)  Delay distribution: Min: 0.71 ms;  Max: 2.16 ms Avg: 0.89 ms

1	Summary: 94 out of 10000 rounds (0.94%) Delay distribution: Min: 0.71 ms; Max: 2.16 ms Avg: 0.89 ms

These results tell me 2 things. First Replication by default in Percona XtraDB Cluster is Asynchronous from Data Propagation Standpoint – it takes time (though short one in this case) for changes committed on the one node to become visible to the other. Second – it is actually doing quite well with less than 1% of tests able to see any inconsistency and the delay being less than 1ms in average with rather stable results.

But we do not setup clusters to be idle right ? So lets to another test, now running the Sysbench load on DPE1. With concurrency of 32 this corresponds to pretty significant load.

sysbench --test=oltp   --mysql-user=root --mysql-password="" --oltp-table-size=1000000 --num-threads=32  --init-rng=on --max-requests=0 --oltp-auto-inc=off --max-time=3000 run

1	sysbench --test=oltp --mysql-user=root --mysql-password="" --oltp-table-size=1000000 --num-threads=32 --init-rng=on --max-requests=0 --oltp-auto-inc=off --max-time=3000 run

Results become as follows:

Summary: 3901 out of 10000 rounds (39.01%)  Delay distribution: Min: 0.66 ms;  Max: 201.36 ms Avg: 3.81 ms
Summary: 3893 out of 10000 rounds (38.93%)  Delay distribution: Min: 0.66 ms;  Max: 42.9 ms Avg: 3.76 ms

1 2	Summary: 3901 out of 10000 rounds (39.01%) Delay distribution: Min: 0.66 ms; Max: 201.36 ms Avg: 3.81 ms Summary: 3893 out of 10000 rounds (38.93%) Delay distribution: Min: 0.66 ms; Max: 42.9 ms Avg: 3.76 ms

As expected we can observe inconsistency much more frequently in almost 40% though the average observed delay remains just few milliseconds, which is something most applications would not even notice.

Now if we run sysbench on DPE2 (the load on the node which we’re reading from)

Summary: 3747 out of 10000 rounds (37.47%)  Delay distribution: Min: 0.86 ms;  Max: 108.15 ms Avg: 8.62 ms
Summary: 3721 out of 10000 rounds (37.21%)  Delay distribution: Min: 0.81 ms;  Max: 291.81 ms Avg: 8.54 ms

1 2	Summary: 3747 out of 10000 rounds (37.47%) Delay distribution: Min: 0.86 ms; Max: 108.15 ms Avg: 8.62 ms Summary: 3721 out of 10000 rounds (37.21%) Delay distribution: Min: 0.81 ms; Max: 291.81 ms Avg: 8.54 ms

We can observe the effect in similar amount of cases but delay is higher in this case both in average and the Maximum one. This tells me from data propagation standpoint the cluster is more sensitive to the load on the nodes which receive the data, not the ones where writes are done.

Lets remember though what Sysbench OLTP has only rather small portion of writes. What if we look at workloads which consists 100% of Writes. We can do it with Sysbench, for example:

sysbench --test=oltp --oltp-test-mode=nontrx --oltp-nontrx-mode=update_key  --mysql-user=root --mysql-password="" --oltp-table-size=1000000 --num-threads=32  --init-rng=on --max-requests=0 --max-time=3000 run

1	sysbench --test=oltp --oltp-test-mode=nontrx --oltp-nontrx-mode=update_key --mysql-user=root --mysql-password="" --oltp-table-size=1000000 --num-threads=32 --init-rng=on --max-requests=0 --max-time=3000 run

Running this load on DPE1 I’m getting:

Summary: 1062 out of 10000 rounds (10.62%)  Delay distribution: Min: 0.71 ms;  Max: 285.07 ms Avg: 3.21 ms
Summary: 1113 out of 10000 rounds (11.13%)  Delay distribution: Min: 0.81 ms;  Max: 275.94 ms Avg: 5.06 ms

1 2	Summary: 1062 out of 10000 rounds (10.62%) Delay distribution: Min: 0.71 ms; Max: 285.07 ms Avg: 3.21 ms Summary: 1113 out of 10000 rounds (11.13%) Delay distribution: Min: 0.81 ms; Max: 275.94 ms Avg: 5.06 ms

Surprise! results are actually better than if we put mixed load as we can observe any delay only in about 11%.

However if we run the same side load on DPE2 we get:

Summary: 5349 out of 10000 rounds (53.49%)  Delay distribution: Min: 0.81 ms;  Max: 519.61 ms Avg: 5.02 ms
Summary: 5355 out of 10000 rounds (53.55%)  Delay distribution: Min: 0.81 ms;  Max: 526.95 ms Avg: 5.06 ms

1 2	Summary: 5349 out of 10000 rounds (53.49%) Delay distribution: Min: 0.81 ms; Max: 519.61 ms Avg: 5.02 ms Summary: 5355 out of 10000 rounds (53.55%) Delay distribution: Min: 0.81 ms; Max: 526.95 ms Avg: 5.06 ms

Which is the worse result out there with over 50% samples produced inconsistent data and average delay for those over 5ms and outliers going to half a second.

From these results I read the side load on the node TO which updates are being propagated causes largest delay.

At this point I remembered there is one more test I can run. What is if I put side load on DPE3 server, from which I’m not touching from the test at all ?

Summary: 833 out of 10000 rounds (8.33%)  Delay distribution: Min: 0.66 ms;  Max: 353.61 ms Avg: 2.76 ms

1	Summary: 833 out of 10000 rounds (8.33%) Delay distribution: Min: 0.66 ms; Max: 353.61 ms Avg: 2.76 ms

No surprise here as DPE3 is not being directly read or written to the load on it should cause minimal delays to data propagation from DPE1 to DPE2.

The propagation latency we’ve observed in the test so far is quite good but it is not synchronous replication behavior – we still can’t treat the cluster as if it were single server from generic application. Right. Default configuration for Percona XtraDB Cluster at this point is to replicate data asynchronously, but still guaranty there is no conflicts and data inconsistency then updates are done on multiple nodes. There is an option you can enable to get fully synchronous replication behavior:

mysql> set global wsrep_causal_reads=1;
Query OK, 0 rows affected (0.00 sec)

1 2	mysql> set global wsrep_causal_reads=1; Query OK, 0 rows affected (0.00 sec)

When this option is enabled the cluster will wait for the data to be actually replicated (committed) before serving the read. The great thing is wsrep_causal_reads is session variables so you can mix different applications on the same cluster – some requiring better data consistency guarantees, other being OK with a little bit stale data but looking for the best performance possible.

So far so good. We can make cluster to handle significant load with small transactions and still have very respectful data propagation delay or we can enable wsrep_causal_reads=1 option and get full data consistency. But what happens if we have some larger transactions ? To test this I have created the copy of sbtest table and will run a long update while running my test to see how the latency is impacted:

mysql> update sbtest2 set k=k+1;
Query OK, 1000000 rows affected (1 min 14.12 sec)
Rows matched: 1000000  Changed: 1000000  Warnings: 0

mysql> update sbtest2 set k=k+1;

Query OK, 1000000 rows affected (1 min 14.12 sec)

Rows matched: 1000000 Changed: 1000000 Warnings: 0

Running this query on the DPE1 box I’m getting following result:

...
Result Mismatch for Value 48;  Retries: 1   Delay: 0.76 ms
Result Mismatch for Value 173;  Retries: 1   Delay: 1.21 ms
Result Mismatch for Value 409;  Retries: 1   Delay: 0.86 ms
Result Mismatch for Value 460;  Retries: 142459   Delay: 46526.7 ms
Result Mismatch for Value 461;  Retries: 65   Delay: 22.92 ms
Result Mismatch for Value 464;  Retries: 1   Delay: 0.71 ms
Result Mismatch for Value 465;  Retries: 1   Delay: 0.76 ms
...
Summary: 452 out of 10000 rounds (4.52%)  Delay distribution: Min: 0.66 ms;  Max: 46526.7 ms Avg: 104.28 ms

...

Result Mismatch for Value 48; Retries: 1 Delay: 0.76 ms

Result Mismatch for Value 173; Retries: 1 Delay: 1.21 ms

Result Mismatch for Value 409; Retries: 1 Delay: 0.86 ms

Result Mismatch for Value 460; Retries: 142459 Delay: 46526.7 ms

Result Mismatch for Value 461; Retries: 65 Delay: 22.92 ms

Result Mismatch for Value 464; Retries: 1 Delay: 0.71 ms

Result Mismatch for Value 465; Retries: 1 Delay: 0.76 ms

...

Summary: 452 out of 10000 rounds (4.52%) Delay distribution: Min: 0.66 ms; Max: 46526.7 ms Avg: 104.28 ms

So the propagation delay was pretty good until this given query had to be replicated, in which case we could observe the replication delay for over 45 seconds which is quite nasty.
Note though delay was for less period than it takes to execute the query on the master. This is because application of the changes on the master in parallel and updates to the sbtest table and sbtest2 table can be done in parallel (even changes to the same table can) but the certification process is serial as well as sending write set to the other nodes, and it must be taking some 45 seconds to send the write set and perform certification.

If we run the same query on DPE2 the interesting thing happens. The script does not show any data propagation delays but it visibly stalls, as I guess because the UPDATE statement issued to DPE1 is blocked for some time. To check this idea I decided to use the sysbench script with very simple point update queries to see if we get any significant stalls. My base run on DPE1 is as follows:

root@dpe01:/etc/mysql# sysbench --test=oltp --oltp-auto-inc=off --oltp-test-mode=nontrx --oltp-nontrx-mode=update_key  --mysql-user=root --mysql-password="" --oltp-table-size=1000000 --num-threads=1  --init-rng=on --max-requests=0 --max-time=300 run
....
    per-request statistics:
         min:                                  0.68ms
         avg:                                  0.88ms
         max:                                306.80ms
         approx.  95 percentile:               0.94ms
....

root@dpe01:/etc/mysql# sysbench --test=oltp --oltp-auto-inc=off --oltp-test-mode=nontrx --oltp-nontrx-mode=update_key --mysql-user=root --mysql-password="" --oltp-table-size=1000000 --num-threads=1 --init-rng=on --max-requests=0 --max-time=300 run

....

per-request statistics:

min: 0.68ms

avg: 0.88ms

max: 306.80ms

approx. 95 percentile: 0.94ms

....

We can see quite respectful performance with longest request taking some 300ms – so no stalls. Lets do the run again now running the same update statement on the different cluster node:

    per-request statistics:
         min:                                  0.69ms
         avg:                                  1.12ms
         max:                              52334.76ms
         approx.  95 percentile:               0.97ms

per-request statistics:

min: 0.69ms

avg: 1.12ms

max: 52334.76ms

approx. 95 percentile: 0.97ms

As we see there is a stall in update for 50+ second, again while certification is happening. So certification does not only delay data propagation but can stall updates done to the different tables on the different nodes.

Summary:

Percona XtraDB Cluster performs very well when it comes to small transactions offering very small propagation delay and an option of synchronous replication all together. However when it comes to large transactions you can get in a lot of trouble with major stalls both in terms of data propagation and in terms of writes. The system I did test on is pretty old and I would expect modern systems can run certification several times faster still taking tens of seconds for what I would consider medium size transaction modifying 1 million of rows is rather long time. So make sure to have a good understanding how large transactions your application has and how longs stalls it can handle.

Appendix:
As promised the script I was using for testing.

# The idea with this script is as follows. We have 2 nodes. We write to one node and when read from second node
# To see whenever we get the same data or different

$writer_host="dpe01";
$reader_host="dpe02";
$user="test";
$password="test";
$table="test.sbtest";

$increment=2;
$offset=1;
$max_id=1000;
$rounds=10000;

$writer=new mysqli($writer_host,$user,$password);
$reader=new mysqli($reader_host,$user,$password);

$total_delay=0;
$min_delay=100000000;
$max_delay=0;
$delays=0;
$sum_delay=0;
for($val=0; $val<$rounds;$val++) { $id=rand(1,$max_id); $id=floor($id/$increment)*$increment+$offset; $writer->query("UPDATE $table set k=$val where id=$id");
$tw=microtime(true);
/* Loop while we get the right result */
$retries=0;
while(true)
{
$result=$reader->query("SELECT k from $table where id=$id");
$row=$result->fetch_row();
if ($row[0]!=$val)
$retries++;
else
{
$tr=microtime(true);
break;
}
$result->close();
}
if ($retries!=0) /* If we had to retry compute stats */
{
$delay=round(($tr-$tw)*1000,2);
$delays++;
$sum_delay+=$delay;
$min_delay=min($min_delay,$delay);
$max_delay=max($max_delay,$delay);
echo("Result Mismatch for Value $val; Retries: $retries Delay: $delay msn");
}
}
if ($delays>0)
$avg_delay=round($sum_delay/$delays,2);
else
$avg_delay=0;
$delay_pct=round($delays/$val*100,3);

echo("Summary: $delays out of $val rounds ($delay_pct%) Delay distribution: Min: $min_delay ms; Max: $max_delay ms Avg: $avg_delay msn");

# The idea with this script is as follows. We have 2 nodes. We write to one node and when read from second node

# To see whenever we get the same data or different

$writer_host="dpe01";

$reader_host="dpe02";

$user="test";

$password="test";

$table="test.sbtest";

$increment=2;

$offset=1;

$max_id=1000;

$rounds=10000;

$writer=new mysqli($writer_host,$user,$password);

$reader=new mysqli($reader_host,$user,$password);

$total_delay=0;

$min_delay=100000000;

$max_delay=0;

$delays=0;

$sum_delay=0;

for($val=0; $val<$rounds;$val++) { $id=rand(1,$max_id); $id=floor($id/$increment)*$increment+$offset; $writer->query("UPDATE $table set k=$val where id=$id");

$tw=microtime(true);

/* Loop while we get the right result */

$retries=0;

while(true)

{

$result=$reader->query("SELECT k from $table where id=$id");

$row=$result->fetch_row();

if ($row[0]!=$val)

$retries++;

else

{

$tr=microtime(true);

break;

}

$result->close();

}

if ($retries!=0) /* If we had to retry compute stats */

{

$delay=round(($tr-$tw)*1000,2);

$delays++;

$sum_delay+=$delay;

$min_delay=min($min_delay,$delay);

$max_delay=max($max_delay,$delay);

echo("Result Mismatch for Value $val; Retries: $retries Delay: $delay msn");

}

if ($delays>0)

$avg_delay=round($sum_delay/$delays,2);

else

$avg_delay=0;

$delay_pct=round($delays/$val*100,3);

echo("Summary: $delays out of $val rounds ($delay_pct%) Delay distribution: Min: $min_delay ms; Max: $max_delay ms Avg: $avg_delay msn");

Appendix2: Percona XtraDB Cluster related configuration

# PXC Settings for  Version: '5.5.29-55-log'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  Percona XtraDB Cluster (GPL), wsrep_23.7.2.r3843
wsrep_node_address=10.9.9.1
wsrep_provider=/usr/lib/libgalera_smm.so
wsrep_cluster_address=gcomm://10.9.9.1,10.9.9.2,10.9.9.3
#wsrep_cluster_address=gcomm://
wsrep_slave_threads=8
wsrep_sst_method=xtrabackup
wsrep_cluster_name=DPE
binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1
wsrep_sst_auth=root:

# PXC Settings for Version: '5.5.29-55-log' socket: '/var/run/mysqld/mysqld.sock' port: 3306 Percona XtraDB Cluster (GPL), wsrep_23.7.2.r3843

wsrep_node_address=10.9.9.1

wsrep_provider=/usr/lib/libgalera_smm.so

wsrep_cluster_address=gcomm://10.9.9.1,10.9.9.2,10.9.9.3

#wsrep_cluster_address=gcomm://

wsrep_slave_threads=8

wsrep_sst_method=xtrabackup

wsrep_cluster_name=DPE

binlog_format=ROW

default_storage_engine=InnoDB

innodb_autoinc_lock_mode=2

innodb_locks_unsafe_for_binlog=1

wsrep_sst_auth=root:

16 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Tobias

11 years ago

Are there any benchmarks showing the difference on TPS or commit time for setting wsrep_causal_reads to 0 or 1? I did not see any benchmark stating which values for wsrep_causal_reads are used, including the benchmarks done by percona.

Peter Zaitsev

Author

11 years ago

Tobias,

I have not seen much. This is one thing I was going to take a look at 🙂

Robert Hodges

11 years ago

Hi Peter, thanks so much for this article. I was planning to do exactly the same sort of test as part of preparing for my multi-master talk at the Percona Live conference. This behavior, especially for long transactions, is very much expected so it’s great to see some proof from such a simple benchmark. Another interesting thing to try is generating distributed deadlock through a table hotspot that creates a rollback when you update simultaneously across multiple nodes. I would expect this to become a bigger problem with workloads that (a) have a lot of updates on a small set of rows and (b) as the overall load increases, since this makes the deadlock window bigger when update propagation is delayed in any way. I’ll post my results unless you beat me to it.

mike morse

11 years ago

Hi Peter,

Great you’re doing tests on the intricacies of Galera, we use it in production, and overall very satisfied with it, but more use cases and advanced testing is needed.

Could you clarify your meaning in your statement,

“First Replication by default in Percona XtraDB Cluster is Asynchronous from Data Propagation Standpoint – it takes time (though short one in this case) for changes committed on the one node to become visible to the other.”

I want to confirm you are you saying your test is showing a delay simply between the local commit (not visibility) of the update on one node vs. the visibility on the second node (all nodes however would read exactly the same lag of commit -> visibility when the certification and then cluster wide commit is issued).

Or are you saying it shows data inconsistency between the two nodes in terms of the actual visibility (inconsistency in the cluster wide commit)?

Robert Hodges

11 years ago

@Mike, what do you mean by a cluster-wide commit? Galera just ensures the transactions get to the DBMS nodes, which certify and commit independently. In this case you can get some differences between when transactions show up on different nodes. In fact, it should be possible for transactions to commit and become visible on replicas before they are visible on the originating master if the master commits more slowly. This could happen if there are differences in the speed of file systems across hosts. (Another interesting case to test.)

Peter Zaitsev

Author

11 years ago

Mike – what I mean is what there is the delay between data is committed at DPE1 and it become visible at DPE2. I have not tested it but I assume from architecture design there is no guaranty what it will be visible at DPE2 and DPE3 at the same time.

If you need to ensure you never read stale data you should be using wsrep_causal_reads=1

mike morse

11 years ago

Thanks Peter, so indeed you were referring to actual visibility differences on the nodes, when you said “..Asynchronous from Data Propagation Standpoint…” I thought perhaps you meant the time from when the update was first issued (including the time to replicate over and certify).

Robert – when I said cluster-wide commit, I meant at the point at which a global transaction ID is issued, then certification takes place, when I said local commit, I meant when the statement was initiated (meant to write ‘local commit issuance’) just before replication and certification. No argument the commits happen independently (good clarification to bring up) and it’s possible for them to happen at different times, just trying to figure out what was being measured here.

Robert Hodges

11 years ago

@Mike, thanks for the clarification. One of the issues with Galera overall is that the terminology is sometimes not very clear from the user docs, though of course it’s pretty clear in the heads of the Codership developers. Cheers, Robert

Vadim Tkachenko

Admin

11 years ago

Peter,

The script got totally broken,
so there is no good way to copy-paste it.

Peter Zaitsev

Author

11 years ago

Vadim,

Ah… Still the highlighting plugin problems. I have changed it to blockquote hope it is easy now

Jason

11 years ago

If you’re interested in using Percona xtraDB Cluster for production, I would highly recommend you use something else instead. Trust me on this, it will cost you dearly.

Manish

11 years ago

Hi Peter,

Nice article. Recently we shifted on following new dedicated server. From first day onwards, we were facing the performance issue. Still MYSQL load was very high but site was not going down.
Then we did some R&D on Percona MYSQL server and found that Percona need some configuration setting regarding to memory allocation. So we did some memory allocation setting in my.cnf file. On first day after doing memory allocation setting in my.cnf file, site performance was good.

But from second day again performance degradation started and now it is slow again. MySQL load is not going very high. We have also noticed that many process stay in sleeping condition for long time like 200-300 processes.

Current Server Details (Dedicated Server):

Software:
Apache Version – 2.2.15 (CentOS)
PHP Version – 5.3.3
MySQL – 5.1.66 [Percona XtraDB Cluster (GPL) (5.5.30)] (InnoDB Engine)
PHP My Admin – 5.3.3

Hardware:
Node:2
CPU = Quard Core
RAM = 12 GB
HDD = 1 TB
IP Addresses = 2

Please help me if there is something we missed.

Peter Zaitsev

Author

11 years ago

Manish,
Blog post comment is not the best place for asking questions specific to your case. I’d encourage you to ask the question on our forums instead https://www.percona.com/forums/questions-discussions/percona-xtradb-cluster

10 years ago

Jason – Could you clarify what you mean? Why will it cost dearly and what would you propose instead? Percona Server?

In my experience all Percona products are quite stable and are pretty well supported (even if you don’t have the premium consulting services)

Echoinvestigations

9 years ago

We are providing services echo-investigations for business or wedding, associate people qualifications or quality.

Sing

7 years ago

Doing Sysbench test with Standalone Mysql Server we got trasactions/sec 3000 & with percona cluster we got 400/sec. Is it expected?

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Investigating MySQL Replication Latency in Percona XtraDB Cluster

Related

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: Not-So-Good Practices

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

Valkey/Redis: Configuration Best Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Investigating MySQL Replication Latency in Percona XtraDB Cluster

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: Not-So-Good Practices

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

Valkey/Redis: Configuration Best Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation