How to lag a slave behind to avoid a disaster

MySQL Replication is useful and easy to setup. It is used for very different purposes. For example:

split read and writes
run data mining or reporting processes on them
disaster recovery

Is important to mention that a replication server is not a backup by itself. A mistake on the master, for example a DROP DATABASE or an UPDATE without a WHERE clause will be replicated nearly instantly to all slave servers and just having a slave is not going to be helpful here. How can we avoid that kind of mistakes? Having a slave server lagging behind.

pt-slave-delay

pt-slave-delay is another tool that comes with Percona Toolkit and it is very easy to use. It works stopping and starting the SQL Thread in order to archive the lag that you have specified in command line. Lets see an example:

root@debian:~# date
Mon Sep 10 12:40:28 CEST 2012

root@debian:~# pt-slave-delay --delay=10m u=root,p=msandbox,h=127.0.0.1,P=21983
2012-09-10T12:40:31 Reconnected to slave
2012-09-10T12:40:31 slave running 83 seconds behind
2012-09-10T12:40:31 STOP SLAVE until 2012-09-10T12:49:08 at master position mysql-bin.000001/83915317
2012-09-10T12:41:31 Reconnected to slave
2012-09-10T12:41:31 slave stopped at master position mysql-bin.000001/100072820
[...]
2012-09-10T12:49:31 START SLAVE until master 2012-09-10T12:39:08 mysql-bin.000001/83915317
2012-09-10T12:50:31 Reconnected to slave
2012-09-10T12:50:31 slave running 672 seconds behind
2012-09-10T12:50:31 slave running 672 seconds behind at master position mysql-bin.000001/233713063

slave1 [localhost] {msandbox} ((none)) > show slave status\G
*************************** 1. row ***************************
[...]
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
[...]

root@debian:~# date

Mon Sep 10 12:40:28 CEST 2012

root@debian:~# pt-slave-delay --delay=10m u=root,p=msandbox,h=127.0.0.1,P=21983

2012-09-10T12:40:31 Reconnected to slave

2012-09-10T12:40:31 slave running 83 seconds behind

2012-09-10T12:40:31 STOP SLAVE until 2012-09-10T12:49:08 at master position mysql-bin.000001/83915317

2012-09-10T12:41:31 Reconnected to slave

2012-09-10T12:41:31 slave stopped at master position mysql-bin.000001/100072820

[...]

2012-09-10T12:49:31 START SLAVE until master 2012-09-10T12:39:08 mysql-bin.000001/83915317

2012-09-10T12:50:31 Reconnected to slave

2012-09-10T12:50:31 slave running 672 seconds behind

2012-09-10T12:50:31 slave running 672 seconds behind at master position mysql-bin.000001/233713063

slave1 [localhost] {msandbox} ((none)) > show slave status\G

*************************** 1. row ***************************

[...]

Slave_IO_Running: Yes

Slave_SQL_Running: No

[...]

With “–delay=10m” I’m asking the tool to lag the slave 10 minutes behind. The rest of options are just to specify how to connect to the slave. Piece of cake 🙂 If you run the tool in order to lag a slave, for example 2 hours behind, you could have enough time to stop the replication and skip an offending query executed by mistake on the master.

Conclusion

We can improve the security and availability of our infrastructure using delayed replication slave. Is useful not only for disaster recovery but also to test how our application deal with lagged replication.

With MySQL 5.6 this technique will be included natively:

http://dev.mysql.com/doc/refman/5.6/en/replication-delayed.html

CHANGE MASTER TO MASTER_DELAY = N;

1	CHANGE MASTER TO MASTER_DELAY = N;

That command will do the magic, until we have 5.6 as GA pt-slave-delay can help us to get the same functionality.

5 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Sheeri Cabral

11 years ago

Sadly, you have to know how long you want a delay for. An hour? 4 days? I’ve had developers who dropped tables and databases who came to me a week later (and in one case a month later!) and asked if there was any way to retrieve the missing data.

And there was, due to backups. I always make logical (export with mysqldump) backups and physical backups, so that if I just need to restore one database/table, I can. I tried using pt-slave-delay and did all the work to set it up, and all the work to ensure that monitoring checked to make sure replication wasn’t completely broken, just on a 4-hour delay…..it’s a difficult problem to solve. What if replication breaks entirely? how are you monitoring that?

Twirrim

11 years ago

In a previous job this tool was absolutely essential (under its previous mk- guise). We had a developer who was fond of dropping databases and tables in production without checking with sysadmin first, and unfortunately someone we couldn’t block access to for various reasons. We ran a 24 hour lagged slave as an essential way of staying sane.

mike morse

11 years ago

This is one of my favorite pt/mk tools, probably not unreasonable to say data poisoning, (particularly admin error) is one of the more frequent causes of bad/lost data. I’ve seen it first hand in the form of a senior linux admin mistakenly working in the wrong environment, ‘poof’ half a table disappeared from the master and 3 real-time slaves in a production environment. Quite a headache to recover from backups and the binlog, a delayed slave makes life a lot easier. I’ve found a delay of 4 hours usually covers the most serious cases where either the admin realizes the mistake immediately, and/or it’s readily apparent from the behavior of the environment.

However, as noted above, a slave (even delayed) is not a silver bullet, if it’s past the delay time, you’re out of luck, conversely the longer you set the delay, the more you’ll need to catch up, along with the fact that the delay can be unpredictable (single-threaded execution and I/O lag). Goes without saying multiple logical/physical backups are a must.

Sheeri, are there any issues pt-heartbeat doesn’t solve regarding the monitoring issue?

Sheeri Cabral

11 years ago

Mike – in fact, that’s how we dealt with the monitoring issue. But pt-heartbeat doesn’t stay up, iirc whenever we restarted mysql we had to make sure pt-heartbeat was started. It’s another thing to worry about. But yeah, it worked, and when pt-heartbeat died, we got paged, so at least there are no SILENT failures.

Frankly, I’ve found that one time having a painful recovery leads to it happening a LOT less frequently. Making it easy to recover doesn’t train people nearly as effectively.

Sheyda

11 years ago

Is mysql ‘start slave until master-log-file master-log-pos’ reliable enough to synch up two slaves with their master?

I have a set up now with master and 2 slaves off of it, let’s call it slave1 and slave2. I want to swing slave 2 behind slave1 to make it a chain replication , master => slave 1 = > slave 2. The only option I see to stop slave on slave2 and wait a few seconds to make sure master_log_position on slave 1 is ahead of slave 2 and stop slave 2, take the master positions off of it and start slave until these positions on slave 2 to make them in sync and to further be able to make the chain replication work, without breaking replication.

Please advise.

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

How to lag a slave behind to avoid a disaster

pt-slave-delay

Conclusion

Related

Related Blog Articles

RECOMMENDED ARTICLES

Seamless Table Modifications: Leveraging pt-online-schema-change for Online Alterations

Securing Your MySQL Database: Essential Best Practices

MySQL 8.4 First Peek

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

How to lag a slave behind to avoid a disaster

pt-slave-delay

Conclusion

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Seamless Table Modifications: Leveraging pt-online-schema-change for Online Alterations

Securing Your MySQL Database: Essential Best Practices

MySQL 8.4 First Peek

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation