The other day I was working on an issue where one of the slaves was showing unexpected lag. Interestingly with only the IO thread running the slave was doing significantly more IO as compared to the rate at which the IO thread was fetching the binary log events from the master.

I found this out by polling the SLAVE STATUS and monitoring the value of Read_Master_Log_Pos as it changed over time. Then compared it to the actual IO being done by the server using the pt-diskstats tool from the excellent Percona Toolkit. Note that, when doing this analysis, I had already stopped the slave SQL thread and made sure that there were no dirty InnoDB pages, otherwise my analysis would have been skewed.

A closer inspection showed that this had to do with the error log getting filled up with error messages apparently suggesting that the slave IO thread was disconnecting and reconnecting repeatedly.

Below is the type of error messages that were being generated and were filling up the error log:

These error messages are quite vague and do not suggest the actual problem at all. The error messages appear to point out to the disconnection and reconnection of the IO thread. The rate at which these error messages were being generated was close to 20MB/s which was unnecessarily producing IO load.

Importance of a Unique MySQL 5.6 Server UUID

But anyway coming back to the problem. Looking at the output of running SHOW SLAVE HOSTS on the master showed what the problem was:

The slave hosts for the UUID “6c27ed6d-7ee1-11e3-be39-6c626d957cff” were fluctuating between “SBslave1” and “SBslave2”. So the actual problem was that there were two slaves running with the same server UUID and that was confusing the master. This was caused by the new slave “SBslave2” being cloned from the slave “SBslave1”. The cloning process ended up copying the file “auto.cnf” which is present in the MySQL data directory. The auto.cnf file stores the server’s UUID which is used to uniquely identify a server. This is a new feature in MySQL 5.6 and is explained very well in the MySQL manual here.

So essentially when a server starts up, it reads the UUID value from the auto.cnf file. If the auto.cnf file is not present or the UUID value cannot be read, then the server generates a new UUID. When a new UUID is generated you should see a message similar to the following in the MySQL error log:

Conclusion

So the important lesson is to remember to not reuse the auto.cnf file when cloning a server, as the MySQL 5.6 server UUID is an important identification of a particular server. This is used by the master for the purpose of server identification. I also believe that MySQL should show a different error message that is more pertinent to the actual error. The master could in theory send a different type of event to the slave when it notices one with a duplicate UUID.

11 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Peter Zaitsev

Ovais,

So what does this mean in practice ? I take backup of Slave1 with Percona Xtrabackup and restore it on Slave2

Does it mean “auto.cnf” should be removed from restored backup so we can set it up as a slave with no problem or is there different cause of action needed ?

Ovais Tariq

Peter,

It depends on how you are doing the backup. XtraBackup does not backup the auto.cnf file so no action is needed when you are setting up a slave using XtraBackup. However, some people prefer to cone slaves using data directory copy, in which case the auto.cnf file should be removed before starting up MySQL on the copied data directory.

sree

yes i faced this issue before and resolved it by changing the server id to a different one, usually we run an instal script for mysql which modifies the server id as (server’s ipaddress without dots) this make sure no server id is cloned or duplicated.

random-key

Useful information for DBAs. Thanks.

hartmut

As the old bug 16927 about the misleading error messages on duplicate slave IDs has been closed with “won’t happen anymore as server UUIDs fix this” I’ve opened a new bug about this now:

http://bugs.mysql.com/bug.php?id=72578

hartmut

Also weird to see that this still causes fluctuation … it has already been suggested on the previous bug reports that the master should reject the new slave connection in that case instead of disconnecting the existing one …

hartmut
Tapan Kumar Thapa

Many thanks for this article.

It resolved my issue.

Manfred

Sadly I found this article during my research after a similar problem. This killed replication for me for both slaves (the one I cloned from and the new one) after a few minutes by seemingly sending wrong replication data to the hosts. Now I have to restore both from daily backups (700GB DB – bah)

Gautam Somani

Is it not possible to change the location of the auto.cnf file to /etc/mysql rather than keeping it in /var/lib/mysql/?

This will help everyone avoid the issue very easily.

stervogon

can I replicate from 5.7 to 5.6? and from 5.6 to 5.7?