Troubleshooting Relay Log Corruption in MySQL

Have you ever seen the replication stopped with message like this:

Last_Error: Could not parse relay log event entry. The possible reasons are: the master’s binary log is corrupted (you can check this by running ‘mysqlbinlog’ on the binary log), the slave’s relay log is corrupted (you can check this by running ‘mysqlbinlog’ on the relay log), a network problem, or a bug in the master’s or slave’s MySQL code. If you want to check the master’s binary log or slave’s relay log, you will be able to know their names by issuing ‘SHOW SLAVE STATUS’ on this slave.

This is relay relay log corruption and you can check details in the MySQL Error log file. The error message describes few reasons and indeed because there is little validation (ie no checksums) in the replication there are multiple reasons for bad event to show up in relay logs.

Really this is only one of various error messages you could see if relay log corrupted. You could also see malformed queries (with some junk), complaining about event to big etc if there is a garbage in relay logs.

If relay logs are corrupted it is surely worth to check what could cause it – it could be network (especially if replicating over unreliable long distance networks), MySQL bugs on master or slave, hardware problems and few others. In any case it is worth investigating.

Investigating is what you do later but how do you fix the problem first ? The important question you need to have answered – are logs corrupted on the master ? If logs on the master are OK you can just run SHOW SLAVE STATUS on slave experiencing error and use CHANGE MASTER TO to re-point replication to Relay_Master_Log_File:Exec_Master_Log_Pos:

 localhost:(none)> slave stop;
Query OK, 0 rows affected (0.00 sec)

localhost:(none)> change master to master_log_file=Relay_Master_Log_File,master_log_pos=Exec_Master_Log_Pos
Query OK, 0 rows affected (1.16 sec)

localhost:(none)> slave start;
Query OK, 0 rows affected (0.00 sec)

localhost:(none)> slave stop;

Query OK, 0 rows affected (0.00 sec)

localhost:(none)> change master to master_log_file=Relay_Master_Log_File,master_log_pos=Exec_Master_Log_Pos

Query OK, 0 rows affected (1.16 sec)

localhost:(none)> slave start;

Query OK, 0 rows affected (0.00 sec)

This will purge existing relay logs re-fetch all events which have not been executed yet. Doing this command make sure your master is operational and it still has all the logs needed to re-fetch events.

How would you know if logs are OK on the master ? Well in this case there were probably 5 another slaves which did not have the problem – which means Master is most likely OK. In any case it is little harm to try restarting from the same position – if logs are bad on the master you would get the same error message again and can continue with investigation.

What if logs on the master are corrupted ? In this case you have couple of choices (and you also potentially have multiple slaves to deal with). You can use mysqlbinlog (or you favorite hex editor if mysqlbinlog does not work) to find the next event start and potentially recover “corrupted” event to be manually executed on the slaves.

Skipping around event makes master and slave potentially inconsistent and you should access the risks depending on applications (and on amount of events which were corrupted) you may want to let replication continue from the new position or resync the slaves to the master.

How can you recover the slave ? As all slaves are likely to be affected in this case you can’t clone another slave. You also can’t use classical method of recovery from backup – because you would need relay logs to roll forward, and they are corrupted. You can either re-clone the data from Master. (This is where LVM or similar techniques can help you a lot) or skip bad events as described and when use Maatkit mk-table-checksum to check what tables are out of sync and when use mk-table-sync to resync them.

Last method works in particularly well in case you can afford to run for a while with slaves which are a bit out of sync, which is quite often better than having just master available (also having extra load of data copied from it).

12 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

John Marc

15 years ago

FYI
http://server.dzone.com/news/troubleshooting-relay-log-corr

Josue

14 years ago

Hello Peter,

This apply to situations where you have the following error:
“Error reading packet from server: Client requested master to start replication from impossible position ( server_errno=1236)”

Tnkx for this great post.

Antoine

14 years ago

Dude, you rock. You saved my life. I was planning to spend another night trying to figure out what went wrong and with your clear and pragmatic explanations I solved the problem in no time at all.

Justin

13 years ago

Let me also say thank you for writing this up so thoroughly. I was *mostly* sure it would work, but it’s always good to get confirmation from an expert! This site has saved me so much time and trouble, I really appreciate it.

tony balleras

12 years ago

hi peter,

thanks for this information. saved my life. by the way when i executed your script i was able to get a syntax error.
it seems the Relay_Master_Log_File should be ‘Relay_Master_Log_File’ (notice the single qoute).

anyway thanks peter.

tony

Pan Admin

11 years ago

Петя, ты мой герой! Избавил мой сайт от двухчасовой пересинхронизации слейва.
PS. Гори в аду, мастерхост.

Rob

11 years ago

Great intructions, but missing what would have been useful information in my scenerio.

If you expire your binary logs after X amount of days (expire_logs_days=X) on your Master server and your Slave replication gets hung up on a Relay_Master_Log_File which no longer exists, as it has been expired. You can use:

Change MASTER to RELAY_LOG_FILE = ‘Relay_Log_File’, RELAY_LOG_POS = ‘Relay_Log_Pos’;

to use the Relay logs which have been stacking up on the Slave replication.

Also, note this bug applied to my case as well: http://bugs.mysql.com/bug.php?id=55460

Vamsi

10 years ago

Great. Thanks for the tip. It saved my time a lot.

Suyash Jain

10 years ago

Hi Saved my life too , and it worked so smoothly that i could not believe.

In my case master is not corrupted as , i have more slaves which did not get the similar error.

Is there any way to troubleshoot at slave level.

Thanks

Suyash Jain
http://linuxhacks.in

Mattias

8 years ago

Help me, since relay got corupted

Charitha

8 years ago

Thanks. This helped me a lot to recover from a relay log corruption due to abrupt slave restart.

Ganesh

3 years ago

STOP SLAVE;

RESET SLAVE;

CHANGE MASTER TO master_log_file=’mysql-bin.001273′, master_log_pos=617884110;

START SLAVE;

can above steps get any inconsistency on data with respective to master database.

as i checked master database size was less compared to the slave database size.

whether RESET SLAVE done any thing wrong in my case,can someone help me

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Troubleshooting Relay Log Corruption in MySQL

Related

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: Sets and Sorted Sets

Securing Your MySQL Database: Essential Best Practices

Mastering Database Monitoring: Running PMM in High Availability Mode

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Troubleshooting Relay Log Corruption in MySQL

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: Sets and Sorted Sets

Securing Your MySQL Database: Essential Best Practices

Mastering Database Monitoring: Running PMM in High Availability Mode

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation