Failover with the MySQL Utilities: Part 2

In the previous post of this series we saw how you could use mysqlrpladmin to perform manual failover/switchover when GTID replication is enabled in MySQL 5.6. Now we will review mysqlfailover (version 1.4.3), another tool from the MySQL Utilities that can be used for automatic failover.

Summary

mysqlfailover can perform automatic failover if MySQL 5.6’s GTID-replication is enabled.
All slaves must use --master-info-repository=TABLE.
The monitoring node is a single point of failure: don’t forget to monitor it!
Detection of errant transactions works well, but you have to use the --pedantic option to make sure failover will never happen if there is an errant transaction.
There are a few limitations such as the inability to only fail over once, or excessive CPU utilization, but they are probably not showstoppers for most setups.

Setup

We will use the same setup as last time: one master and two slaves, all using GTID replication. We can see the topology using mysqlfailover with the health command:

$ mysqlfailover --master=root@localhost:13001 --discover-slaves-login=root health
[...]
MySQL Replication Failover Utility
Failover Mode = auto     Next Interval = Tue Jul  1 10:01:22 2014

Master Information
------------------
Binary Log File   Position  Binlog_Do_DB  Binlog_Ignore_DB  
mysql-bin.000003  700                                       

GTID Executed Set
a9a396c6-00f3-11e4-8e66-9cebe8067a3f:1-3

Replication Health Status
+------------+--------+---------+--------+------------+---------+
| host       | port   | role    | state  | gtid_mode  | health  |
+------------+--------+---------+--------+------------+---------+
| localhost  | 13001  | MASTER  | UP     | ON         | OK      |
| localhost  | 13002  | SLAVE   | UP     | ON         | OK      |
| localhost  | 13003  | SLAVE   | UP     | ON         | OK      |
+------------+--------+---------+--------+------------+---------+

$ mysqlfailover --master=root@localhost:13001 --discover-slaves-login=root health

[...]

MySQL Replication Failover Utility

Failover Mode = auto Next Interval = Tue Jul 1 10:01:22 2014

Master Information

------------------

Binary Log File Position Binlog_Do_DB Binlog_Ignore_DB

mysql-bin.000003 700

GTID Executed Set

a9a396c6-00f3-11e4-8e66-9cebe8067a3f:1-3

Replication Health Status

+------------+--------+---------+--------+------------+---------+

+------------+--------+---------+--------+------------+---------+

| localhost | 13001 | MASTER | UP | ON | OK |

| localhost | 13002 | SLAVE | UP | ON | OK |

| localhost | 13003 | SLAVE | UP | ON | OK |

+------------+--------+---------+--------+------------+---------+

Note that --master-info-repository=TABLE needs to be configured on all slaves or the tool will exit with an error message:

2014-07-01 10:18:55 AM CRITICAL Failover requires --master-info-repository=TABLE for all slaves.
ERROR: Failover requires --master-info-repository=TABLE for all slaves.

1 2	2014-07-01 10:18:55 AM CRITICAL Failover requires --master-info-repository=TABLE for all slaves. ERROR: Failover requires --master-info-repository=TABLE for all slaves.

Failover

You can use 2 commands to trigger automatic failover:

auto: the tool tries to find a candidate in the list of servers specified with --candidates, and if no good server is found in this list, it will look at the other slaves to see if one can be a good candidate. This is the default command
elect: same as auto, but if no good candidate is found in the list of candidates, other slaves will not be checked and the tool will exit with an error.

Let’s start the tool with auto:

$ mysqlfailover --master=root@localhost:13001 --discover-slaves-login=root auto

1	$ mysqlfailover --master=root@localhost:13001 --discover-slaves-login=root auto

The monitoring console is visible and is refreshed every --interval seconds (default: 15). Its output is similar to what you get when using the health command.

Then let’s kill -9 the master to see what happens once the master is detected as down:

Failed to reconnect to the master after 3 attemps.

Failover starting in 'auto' mode...
# Candidate slave localhost:13002 will become the new master.
# Checking slaves status (before failover).
# Preparing candidate for failover.
# Creating replication user if it does not exist.
# Stopping slaves.
# Performing STOP on all slaves.
# Switching slaves to new master.
# Disconnecting new master as slave.
# Starting slaves.
# Performing START on all slaves.
# Checking slaves for errors.
# Failover complete.
# Discovering slaves for master at localhost:13002

Failover console will restart in 5 seconds.
MySQL Replication Failover Utility
Failover Mode = auto     Next Interval = Tue Jul  1 10:59:47 2014

Master Information
------------------
Binary Log File   Position  Binlog_Do_DB  Binlog_Ignore_DB  
mysql-bin.000005  191                                       

GTID Executed Set
a9a396c6-00f3-11e4-8e66-9cebe8067a3f:1-3

Replication Health Status
+------------+--------+---------+--------+------------+---------+
| host       | port   | role    | state  | gtid_mode  | health  |
+------------+--------+---------+--------+------------+---------+
| localhost  | 13002  | MASTER  | UP     | ON         | OK      |
| localhost  | 13003  | SLAVE   | UP     | ON         | OK      |
+------------+--------+---------+--------+------------+---------+

Failed to reconnect to the master after 3 attemps.

Failover starting in 'auto' mode...

# Candidate slave localhost:13002 will become the new master.

# Checking slaves status (before failover).

# Preparing candidate for failover.

# Creating replication user if it does not exist.

# Stopping slaves.

# Performing STOP on all slaves.

# Switching slaves to new master.

# Disconnecting new master as slave.

# Starting slaves.

# Performing START on all slaves.

# Checking slaves for errors.

# Failover complete.

# Discovering slaves for master at localhost:13002

Failover console will restart in 5 seconds.

MySQL Replication Failover Utility

Failover Mode = auto Next Interval = Tue Jul 1 10:59:47 2014

Master Information

------------------

Binary Log File Position Binlog_Do_DB Binlog_Ignore_DB

mysql-bin.000005 191

GTID Executed Set

a9a396c6-00f3-11e4-8e66-9cebe8067a3f:1-3

Replication Health Status

+------------+--------+---------+--------+------------+---------+

+------------+--------+---------+--------+------------+---------+

| localhost | 13002 | MASTER | UP | ON | OK |

| localhost | 13003 | SLAVE | UP | ON | OK |

+------------+--------+---------+--------+------------+---------+

Looks good! The tool is then ready to fail over to another slave if the new master becomes unavailable.

You can also run custom scripts at several points of execution with the --exec-before, --exec-after, --exec-fail-check, --exec-post-failover options.

However it would be great to have a --failover-and-exit option to avoid flapping: the tool would detect master failure, promote one of the slaves, reconfigure replication and then exit (this is what MHA does for instance).

Tool registration

When the tool is started, it registers itself on the master by writing a few things in the specific table:

mysql> SELECT * FROM mysql.failover_console;
+-----------+-------+
| host      | port  |
+-----------+-------+
| localhost | 13001 |
+-----------+-------+

mysql> SELECT * FROM mysql.failover_console;

+-----------+-------+

| host | port |

+-----------+-------+

| localhost | 13001 |

+-----------+-------+

This is nice as it avoids that you start several instances of mysqlfailover to monitor the same master. If we try, this is what we get:

$ mysqlfailover --master=root@localhost:13001 --discover-slaves-login=root auto
[...]
Multiple instances of failover console found for master localhost:13001.
If this is an error, restart the console with --force. 
Failover mode changed to 'FAIL' for this instance. 
Console will start in 10 seconds..........starting Console.

$ mysqlfailover --master=root@localhost:13001 --discover-slaves-login=root auto

[...]

Multiple instances of failover console found for master localhost:13001.

If this is an error, restart the console with --force.

Failover mode changed to 'FAIL' for this instance.

Console will start in 10 seconds..........starting Console.

With the fail command, mysqlfailover will monitor replication health and exit in the case of a master failure, without actually performing failover.

Running in the background

In all previous examples, mysqlfailover was running in the foreground. This is very good for demo, but in a production environment you are likely to prefer running it in the background. This can be done with the --daemon option:

$ mysqlfailover --master=root@localhost:13001 --discover-slaves-login=root auto --daemon=start --log=/var/log/mysqlfailover.log

1	$ mysqlfailover --master=root@localhost:13001 --discover-slaves-login=root auto --daemon=start --log=/var/log/mysqlfailover.log

and it can be stopped with:

$ mysqlfailover --daemon=stop

1	$ mysqlfailover --daemon=stop

Errant transactions

If we create an errant transaction on one of the slaves, it will be detected:

MySQL Replication Failover Utility
Failover Mode = auto     Next Interval = Tue Jul  1 16:29:44 2014
[...]
WARNING: Errant transaction(s) found on slave(s).
Replication Health Status
[...]

MySQL Replication Failover Utility

Failover Mode = auto Next Interval = Tue Jul 1 16:29:44 2014

[...]

WARNING: Errant transaction(s) found on slave(s).

Replication Health Status

[...]

However this does not prevent failover from occurring! You have to use --pedantic:

$ mysqlfailover --master=root@localhost:13001 --discover-slaves-login=root --pedantic auto
[...]
# WARNING: Errant transaction(s) found on slave(s).
#  - For slave 'localhost@13003': db906eee-012d-11e4-8fe1-9cebe8067a3f:1
2014-07-01 16:44:49 PM CRITICAL Errant transaction(s) found on slave(s). Note: If you want to ignore this issue, please do not use the --pedantic option.
ERROR: Errant transaction(s) found on slave(s). Note: If you want to ignore this issue, please do not use the --pedantic option.

$ mysqlfailover --master=root@localhost:13001 --discover-slaves-login=root --pedantic auto

[...]

# WARNING: Errant transaction(s) found on slave(s).

# - For slave 'localhost@13003': db906eee-012d-11e4-8fe1-9cebe8067a3f:1

2014-07-01 16:44:49 PM CRITICAL Errant transaction(s) found on slave(s). Note: If you want to ignore this issue, please do not use the --pedantic option.

ERROR: Errant transaction(s) found on slave(s). Note: If you want to ignore this issue, please do not use the --pedantic option.

Limitations

Like for mysqlrpladmin, the slave election process is not very sophisticated and it cannot be tuned.
The server on which mysqlfailover is running is a single point of failure.
Excessive CPU utilization: once it is running, mysqlfailover hogs one core. This is quite surprising.

Conclusion

mysqlfailover is a good tool to automate failover in clusters using GTID replication. It is flexible and looks reliable. Its main drawback is that there is no easy way to make it highly available itself: if mysqlfailover crashes, you will have to manually restart it.

19 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Daniël van Eeden

9 years ago

The mysqlfailover process is not a SPOF. If it fails the system as a whole continues to run. I think the same is true for MHA. This and also the manual restart issue can be fixed by running mysqlfailover with Solaris SMF, systemd, etc. or by running it on a cluster.

Stephane Combaudon

Author

9 years ago

Daniël,

I may not have used the right word: what I meant is that setting up mysqlfailover is not enough to guarantee automated failover for your database cluster. Is it acceptable? Sometimes it is, sometimes it’s not.
Of course this can be fixed by adding another layer, but it adds some complexity.

Gurbrinder Singh

9 years ago

Thanks for such a lovely explanation.
Really useful!
Can you please elaborate around the another layer we can add over mysqlfailover which although adds complexity but ensures more guarantee!

Many thanks!

Stephane Combaudon

Author

9 years ago

Gurbrinder,

You could use for instance Pacemaker and Corosync to make mysqlfailover highly available. However there is some glue to write for it to work.
Another solution could be to use Pacemaker to detect a master failure and let it trigger mysqlrpladmin to perform failover at the MySQL level.

Bhavesh

9 years ago

What are the steps to bring back the original master back in service as a slave ?

Stephane Combaudon

Author

9 years ago

Bhavesh,

You can use the change master to statement:
mysql> change master to master_host=’new_master_ip’, master_user=’your_repl_user’, master_password=’your_repl_pwd’, master_auto_position=1;
mysql> start slave;

As GTIDs are used it’s not necessary to specify binlog coordinates.

Gurbrinder Singh

9 years ago

Thanks a ton!
We use VIP so is it any coding or mechanism by which VIP also failovers at same time when mysqlrpladmin command does it magic of switch over?

Stephane Combaudon

Author

9 years ago

Gurbrinder,

mysqlrpladmin will only take care of reconfiguring MySQL replication but you can use the –exec-after and –exec-before options to run external scripts that will move the VIP

Javier Bautista

9 years ago

Hello

First of all thank you for your post. It is very useful. I have a mysql cluster with gtid replication and my problem is when slave lose sync, automatically it becomes as a master server and this is a problem for us because we have a load balancer that redirect querys based on server’s role and we can have two masters at same time which it is a problem. Is there any way to make that slave server loses sync it does not change the role and will be the same?

Thank you in advance

Stephane Combaudon

Author

9 years ago

Javier,

I don’t understand what you mean when you say that a slave becomes a master when it is out of sync. Could you clarify?

Joe Dunn

9 years ago

Thanks for posting this article. Very helpful.

I was wondering what your thoughts are for running mysqlfailover on each of the slave hosts. Because of the conflict resolution built into the tool it seems as though only one at a time. Of course, we will try testing it, but was wondering what your thoughts on doing so.

Thanks, again.

Stephane Combaudon

Author

9 years ago

Joe,

You should only have a single mysqlfailover running at each time. Otherwise all the mysqlfailover instances may want to trigger failover at the same time and you may break your topology.

abhishek rai

9 years ago

Hi, please help me with this error.

[root@ip-172-31-6-140 ~]# mysqlfailover –master=slave2 –slaves=slave1 health
# Checking privileges.
2015-01-22 09:35:50 AM CRITICAL Query failed. 1694 (HY000): Cannot modify @@session.sql_log_bin inside a transaction
ERROR: Query failed. 1694 (HY000): Cannot modify @@session.sql_log_bin inside a transaction

Warisara

Reply to abhishek rai

5 years ago

Hello, abhishek rai

How do you fix this problem?
I have same problem with you. please helpme.

chan

Reply to abhishek rai

5 years ago

Hello,
ERROR: Query failed. 1694 (HY000): Cannot modify @@session.sql_log_bin inside a transaction

How do you fix this problem?
Pease help me.

Aneesha KA

9 years ago

Hi ,

I have setup replication with one master and one slave. Replication is working perfectly. Then I tried to excecute the mysqfailover command, it does not list slave. I have got the following result

MySQL Replication Failover Utility
Failover Mode = auto Next Interval = Tue May

Master Information
——————
Binary Log File Position Binlog_Do_DB Binlog
mysql-bin.000016 9568

GTID Executed Set
8fe8b710-cd34-11e4-824d-fa163e52e544:1-1143

Replication Health Status
0 Rows Found.
Q-quit R-refresh H-health G-GTID Lists U-UUIDs U

when i try to excecute mysqlrplcheck and mysqlrplshow, it lists my slave and master.

Can anyone help me .

I am very knew in this. I have one doubt, Where we need to excecute the mysqlfaiover command – (slave or master) .

Aneesha KA

9 years ago

Configuration files.

Master my.ini

[mysqld]
server-id=7
expire_logs_days = 30
log-bin = “C:/logmysql/mysql-bin.log”
binlog-format=ROW
log-slave-updates=true
gtid-mode=on
enforce-gtid-consistency=true
master-info-repository=TABLE
relay-log-info-repository=TABLE
sync-master-info=1
binlog-checksum=CRC32
master-verify-checksum=1
report-host=10.24.184.12
report-port=3306
port=3306

Slave my.ini

sync_relay_log_info=10000
binlog_format=ROW
log-slave-updates=true
log-bin=C:\logs\mysql-bin.log
gtid-mode=ON
enforce-gtid-consistency=true
server-id=8
report-host=10.24.184.13
report-port=3306
master-info-repository=TABLE
relay-log-info-repository=TABLE
sync-master-info=1
port=3306

Sebastiano Favaro

8 years ago

Thanks for this article!
I have setup replication with 1 master and 4 slave (5 different physical servers).
When i stop the master database (but the physical server stay up) the failover take about 15 seconds.
But when the server is powered off the failover process take about 8 minutes!!!!

Karan

8 years ago

How to configure the ‘elect’ setting. Can anyone give example? I am running one master 2 slaves traditional topology and would only want one slave to be considered for master promotion in case of master failure.

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Failover with the MySQL Utilities: Part 2 – mysqlfailover

Summary

Setup

Failover

Tool registration

Running in the background

Errant transactions

Limitations

Conclusion

Related

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

Valkey/Redis: Not-So-Good Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Failover with the MySQL Utilities: Part 2 – mysqlfailover

Summary

Setup

Failover

Tool registration

Running in the background

Errant transactions

Limitations

Conclusion

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

Valkey/Redis: Not-So-Good Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation