Seconds_Behind_Master fluctuating wildly? Check for looped events

Seconds_Behind_Master fluctuating wildly? Check for events caught in a loop

Recently I was working with a customer where we noticed that Seconds_Behind_Master fluctuating from an expected value of 0 seconds behind to a fairly high six figure value. The servers were configured in a master-master relationship and used 5 figure server_id values, and we had just migrated this cluster from one data centre to another by re-pointing masters. Seeing large fluctuations in Seconds_Behind_Master can often be explained by long running queries being processed by the SQL_THREAD, however SHOW PROCESSLIST indicated that there were no long running replication events, and we had no other indication that the server was lagging due to resource constraints — CPU, disk, and memory were under-utilized.

We then moved our investigation to manual review of the binary log where events appeared normal (5 digit server_id values) until every once in a while we would see a rash of server_id 21 events.. Wait, what? I asked the customer about this server_id and was informed that this was in fact the old master from the original data centre. We looked again at the dates on the server_id 21 events and sure enough they were from the time period when the cut-over between data centres took place. Conclusion: We had a series of old events caught in a loop between the masters that due to log_slave_updates and the lack of a server_id 21 host meant that these events would never be terminated and would loop indefinitely.

So what was the fix? We leveraged the CHANGE MASTER TO syntax to utilize IGNORE_SERVER_IDS setting:

CHANGE MASTER TO IGNORE_SERVER_IDS = (21);

1	CHANGE MASTER TO IGNORE_SERVER_IDS = (21);

This virtually immediately caused the server_id 21 events to be dropped and for Seconds_Behind_Master to stop fluctuating.

The MySQL manual has a short discussion regarding IGNORE_SERVER_IDS:

IGNORE_SERVER_IDS was added in MySQL 5.5. This option takes a comma-separated list of 0 or more server IDs. Events originating from the corresponding servers are ignored, with the exception of log rotation and deletion events, which are still recorded in the relay log.
In circular replication, the originating server normally acts as the terminator of its own events, so that they are not applied more than once. Thus, this option is useful in circular replication when one of the servers in the circle is removed. Suppose that you have a circular replication setup with 4 servers, having server IDs 1, 2, 3, and 4, and server 3 fails. When bridging the gap by starting replication from server 2 to server 4, you can includeIGNORE_SERVER_IDS = (3) in the CHANGE MASTER TO statement that you issue on server 4 to tell it to use server 2 as its master instead of server 3. Doing so causes it to ignore and not to propagate any statements that originated with the server that is no longer in use.

Hopefully this article helps someone else trying to explain Seconds_Behind_Master variations!

2 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Ravi

8 years ago

Hi Michael,

We are using percona version 5.6.15 for Master Slave Configuration.
For past few weeks we are noticing increase in value for seconds_behind_master.

We had changed this innodb-flush-log-at-trx-commit = 2
in my.cnf & restarted mysql service still there is no change.

As of now the seconds behind master is value is Seconds_Behind_Master: 42293

Not sure how to bring it down.Could you please help me with this.

By the way we are using 1000 iops for our DB machines with 15 GB Memory in Amazon .

Michael Coburn

Author

8 years ago

Hi Ravi,

The slave is only able to apply events/statements as quickly as the resources available are able to keep up — that means that you could be hitting a bottleneck of the capacity of your CPU or disks. You should check to see if you are exceeding any of these thresholds. You can also try setting sync_binlog=0 if that has not already been done. Otherwise, I encourage you to refer to our Forums for additional support as this blog isn’t set up to deal with specific, ongoing advice unlike our Forums: https://www.percona.com/forums/questions-discussions/mysql-and-percona-server . Best of luck!

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Seconds_Behind_Master fluctuating wildly? Check for events caught in a loop

Related

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

Valkey/Redis: Not-So-Good Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Seconds_Behind_Master fluctuating wildly? Check for events caught in a loop

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

Valkey/Redis: Not-So-Good Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation