Debugging MySQL Problems with Row-Based Replication

MySQL 5.1 introduces row-based binary logging. In fact, the default binary logging format in GA versions of MySQL 5.1 is ‘MIXED’ STATEMENT*; The binlog_format variable can still be changed per sessions which means it is possible that some of your binary log entries will be written in a row-based fashion instead of the actual statement which changed data, even when the global setting on the master is to write binary logs in statement mode. The row-based format does offer advantages particularly if triggers or stored procedures are used, or if non-deterministic functions like RAND() are used in DML statements.

A statement based replication slave can get out of sync with the master fairly easily, especially if data is changed on the slave. It is possible for a statement to execute successfully on a slave even if the data is not 100% in sync, so MySQL doesn’t know anything is wrong. This isn’t the case with row-based replication. Only actual changes are written to the database with row-based logs. Take as an example a DELETE statement which does not modify any rows. No binary logging will be performed by this statement in row-based logs, but the DELETE will be written to a statement based binary log. Since row-based logs only contain information about changed rows, a replication slave which does not find a row referenced in a row-based log will immediately return an error: HA_ERR_KEY_NOT_FOUND.

This is probably best shown by example. First, let’s create a test table on a MySQL master database and populate it with some data.

CREATE TABLE `t1` (
`c1` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`c1`)
) ENGINE=InnoDB;
Query OK, 1 row affected (0.02 sec)

CREATE TABLE `t1` (

`c1` int(11) NOT NULL AUTO_INCREMENT,

PRIMARY KEY (`c1`)

) ENGINE=InnoDB;

Query OK, 1 row affected (0.02 sec)

Insert a single “seed” row into the table:

INSERT INTO repl.t1 VALUES ();
Query OK, 1 row affected (0.00 sec)

1 2	INSERT INTO repl.t1 VALUES (); Query OK, 1 row affected (0.00 sec)

INSERT … SELECT a few times to get some dummy data:

insert into repl.t1 select NULL from t1;
Query OK, 1 row affected (0.01 sec)
Records: 1  Duplicates: 0  Warnings: 0

insert into repl.t1 select NULL from t1;
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

...

insert into repl.t1 select NULL from t1;
Query OK, 32768 rows affected (0.62 sec)
Records: 32768  Duplicates: 0  Warnings: 0

insert into repl.t1 select NULL from t1;

Query OK, 1 row affected (0.01 sec)

Records: 1 Duplicates: 0 Warnings: 0

insert into repl.t1 select NULL from t1;

Query OK, 2 rows affected (0.01 sec)

Records: 2 Duplicates: 0 Warnings: 0

...

insert into repl.t1 select NULL from t1;

Query OK, 32768 rows affected (0.62 sec)

Records: 32768 Duplicates: 0 Warnings: 0

Verify that we have a reasonable amount of data to play with:

select count(*) from t1;
+----------+
| count(*) |
+----------+
|    65536 |
+----------+
1 row in set (0.02 sec)

select count(*) from t1;

+----------+

| count(*) |

+----------+

| 65536 |

+----------+

1 row in set (0.02 sec)

My test setup includes a MySQL slave. After the above setup completes, I am going to change data on the slave by truncating t1. After truncation:

select count(*)
from t1 as empty_table_on_the_slave;
+----------+
| count(*) |
+----------+
|    0     |
+----------+
1 row in set (0.00 sec)

select count(*)

from t1 as empty_table_on_the_slave;

+----------+

| count(*) |

+----------+

| 0 |

+----------+

1 row in set (0.00 sec)

Now, lets make a modification on the master:

update repl.t1 set c1 = c1 + 100000 where c1 = 1;
Query OK, 1 row affected (0.02 sec)
Rows matched: 1  Changed: 1  Warnings: 0

update repl.t1 set c1 = c1 + 100000 where c1 = 1;

Query OK, 1 row affected (0.02 sec)

Rows matched: 1 Changed: 1 Warnings: 0

On the slave, SHOW SLAVE STATUS now reports an error (many fields omitted):

SHOW SLAVE STATUS\G
Relay_Log_File: mysql_sandbox25162-relay-bin.000002
Relay_Log_Pos: 340718
Slave_SQL_Running: No
Last_Errno: 1032
Last_Error: Could not execute Update_rows event on table repl.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000001, end_log_pos 340718

SHOW SLAVE STATUS\G

Relay_Log_File: mysql_sandbox25162-relay-bin.000002

Relay_Log_Pos: 340718

Slave_SQL_Running: No

Last_Errno: 1032

Last_Error: Could not execute Update_rows event on table repl.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000001, end_log_pos 340718

This error message is telling us that a row that was modified on the master can not be found on the slave:HA_ERR_KEY_NOT_FOUND. It doesn’t tell us what row it couldn’t find though. In our simple example, we know that all the rows are missing, and the one in particular that the server is trying to find is c1=1, but how can we find that out if we don’t know the conditions under which replication went out of sync?

To determine exactly what row is having an issue, we can use the ‘mysqlbinlog’ utility in combination with the slave relay log. Examining the output of SHOW SLAVE STATUS, we can see that the slave is currently executing at relay log file: mysql_sandbox25162-relay-bin.000002 at position: 340718. MySQL 5.1 ships with a new version of mysqlbinlog that includes new options that make decoding RBR statements possible. These options are “–base64-output” and “-v”. For debugging row-based logs, I suggest ‘–base64-output=decode-rows -v”. This omits any base64 output and instead replaces the output with semi-valid SQL statements. I say semi-valid because the statements do not include column names. Column names are not included in the binary log, only positions.

mysqlbinlog mysql_sandbox25162-relay-bin.000005 –base64-output=decode-rows -v > decoded.log

Examine the decoded binary log with the ‘less’ utility, and search forward for the value 340718:

# at 340718
#100506 12:42:56 server id 1  end_log_pos 340637        Query   thread_id=6     exec_time=0     error_code=0
SET TIMESTAMP=1273174976/*!*/;
BEGIN
/*!*/;
# at 340782
# at 340823
#100506 12:42:56 server id 1  end_log_pos 340678        Table_map: `repl`.`t1` mapped to number 15
#100506 12:42:56 server id 1  end_log_pos 340718        Update_rows: table id 15 flags: STMT_END_F
### UPDATE repl.t1
### WHERE
###   @1=1
### SET
###   @1=100001
# at 340863
#100506 12:42:56 server id 1  end_log_pos 340745        Xid = 51
COMMIT/*!*/;
# at 340890

# at 340718

#100506 12:42:56 server id 1 end_log_pos 340637 Query thread_id=6 exec_time=0 error_code=0

SET TIMESTAMP=1273174976/*!*/;

BEGIN

/*!*/;

# at 340782

# at 340823

#100506 12:42:56 server id 1 end_log_pos 340678 Table_map: `repl`.`t1` mapped to number 15

#100506 12:42:56 server id 1 end_log_pos 340718 Update_rows: table id 15 flags: STMT_END_F

### UPDATE repl.t1

### WHERE

### @1=1

### SET

### @1=100001

# at 340863

#100506 12:42:56 server id 1 end_log_pos 340745 Xid = 51

COMMIT/*!*/;

# at 340890

As you can see, the RBR entry has been decoded as an update statement. The OLD value (the row that is missing) is represented in the WHERE clause. The binary log does not carry column names, so each column is represented by the column position prefixed with @.

There you have it. You can now find out exactly which row is missing and begin your investigation as to why it is missing. You may have to do a bit more leg work if many rows were modified during a transaction.

You should always set read_only = true on your MySQL slaves to prevent accidental changes in data from happening!

*edit*
The MySQL 5.1 default binary logging format changed back to STATEMENT in MySQL 5.1.29.

14 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Giuseppe Maxia

14 years ago

MySQL 5.1 default binlog_format in GA has always been STATEMENT. The main reason for this choice is that we wanted to avoid compatibility issues.
Notice that 5.1 has been GA since version 5.1.30.
I must say that your example is quite unsettling, as it is based on a human mistake that puts the slave in a non recoverable state.
See also this post, where the enhancement of mysqlbinlog was announced.
http://datacharmer.blogspot.com/2008/09/dsclosing-binlog-entries-with-row-based.html

Justin Swanhart

Author

14 years ago

The ‘human mistake’ is just one reason a slave can get out of sync…

Try this on your statement based master:
DELETE FROM some_table ORDER BY RAND() LIMIT 1000;

Your slave is now out of sync. Oops..

Justin Swanhart

Author

14 years ago

And I’ve run into this problem when SQL_LOG_BIN=0 was set on the master and changes were made. Since deletions (or insertions) happened on the master that were not on the slave the slave was out of sync. By diagnosing the problem using this technique, I was able to track down how the rows went ‘missing’ on the slave.

My simple example is just that – a simple example. Reality is complex and there are very many reasons a slave can get out of sync. When it happens you need to know how to discover what isn’t in sync (which is not obvious from the error message) so that you can find the underlying cause and prevent it from happening again.

Further, it is possible to change from MIXED to ROW to STATEMENT in an individual session, thus a slave could get RBR updates even when the master is in STATEMENT mode.

Justin Swanhart

Author

14 years ago

Also, how is statement “more compatible”? You can’t safely replicate from a 5.1 to a 5.0 host, which is the only place the “compatibility” issue seems to make sense. A 5.1 slave can consume both Query and RBR events, so how does using statement improve anything? Triggers, cascading foreign keys and other indirect data changes are much more safely captured using RBR than statement logs. Using non-deterministic functions like UUID or RAND() are much safer with RBR.

Justin Swanhart

Author

14 years ago

And I don’t buy the argument in: http://bugs.mysql.com/bug.php?id=39812

The people who were relying on statement side effects should manually use statement, or fix their application to set binlog_format=STATEMENT before statements where the stupid statement side-effects are desired.

Peter Zaitsev

Admin

14 years ago

Justin,

With Statement Level replication MySQL 5.1 to 5.0 works for some kinds of applications. It is not guaranteed to work in all cases of course but it does for many simple MySQL use cases in practice.
This is in fact important case because it allows to get extra assurance for MySQL 5.0 -> 5.1 migrations – you can keep MySQL 5.0 slave for some time after promoting Master to MySQL 5.1 to have something to roll back to.

Peter Zaitsev

Admin

14 years ago

Justin,

On Compatibility – I think this is why there is MySQL and Drizzle. MySQL has a lot of weight with bad choices over many years which can’t be easily broken because this unsettles a lot of customers who relay on these behaviors.

Also people do not often know they relay on any side effects. They just upgrade Master and Slave to MySQL 5.1 and just expect things work as before. If it breaks this is serious issue.
I think moving back to STATEMENT by default in MySQL 5.0 was a good choice. Breaking existing applications is a lot more serious issue than improving things (which row level replication does)

Kim Carlsen

14 years ago

One problem I have experienced with ROW based binary logging, is that mysql can make log entries that get too huge to be parsed through mysql client.

An example is, if you delete all rows in a huge table
DELETE * FROM huge_table;

This will be formulated as one BINLOG ” entry in the binary log, with a delete statement for each row in the table.
As a result you will get the error ‘ERROR 1153 (08S01) at line 1074175: Got a packet bigger than ‘max_allowed_packet’ bytes’ when trying to apply
this log to a database (mysqlbinlog bin-log.000001 | mysql -uroot)

You can increase the max_allowed_packet to a maximum of 1GB, but you will still be able to get larger entries in the log.

Justin Swanhart

13 years ago

Hi Kim,

I wanted to let you know that the next release of mysqlbinlog should include a –max-allowed-packet option instead of hard coding it to 1G. This should resolve your problem with mysqldump.

Alexandr Yatsuk Leonidovich

13 years ago

Hi All.
I using MIXED binlog format. And having problem with simple delete query.
DELETE FROM temp.tenp WHERE id = 10;
This query must delete 3000000 rows from my innodb table.
But slave came to out of sync. (like in comment#2 by Justin Swanhart were hi used STATEMENT binlog format)

But if I’ve change binlog format to ROW based, I’ve got error like in comment#8 by Kim Carlsen …..

What I must to do ? Some configuratuin tuning?
do not delete a lot of rows in one query ?
How much rows I can delete per query if I using MIXED binlog format ?

Trung Ly

11 years ago

I’m experiencing a very strange situation. I set binlog_format = “ROW” in a console applying for my current session only

mysql> show variables like ‘%binlog_format%’;
+—————+——-+
| Variable_name | Value |
+—————+——-+
| binlog_format | ROW |
+—————+——-+

and ran an update query to change data of 1 row. The change only affected on the specific column(s) which determined in the update query. At this moment, I’m so confused about the accuracy of MySQL replication.

For your info
* MySQL Master: MySQL Cluster 5.1.56-ndb-7.1.17-cluster-gpl-log
* MySQL Master: binlog_format = MIXED
* MySQL Slave: MySQL 5.1.56-ndb-7.1.17-cluster-gpl
* MySQL Slave: SET GLOBAL SLAVE_EXEC_MODE = ‘IDEMPOTENT’;

Are there bugs on this version?

Raghunandan

11 years ago

While we are on this topic, is it possible to replicate only inserts that happen on master to slave? I know its not there in 5.1 but in 5.5 is there any option to replicate DML of choice? I need this for a thing I am experimenting and wanted to know if either MySQL or Percona has any solution.

Thanks and Regards,
Raghu

arganzheng

8 years ago

Hi, you mistaken the row-based replication and statement-based replication. It is the RBR with slave_exec_mode=strict will cause the HA_ERR_KEY_NOT_FOUND problem. check it for yourself: http://dev.mysql.com/doc/refman/5.1/en/replication-rbr-usage.html

mani

7 years ago

Thanks man, I will check it immediately

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Debugging MySQL Problems with Row-Based Replication

Related

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: Not-So-Good Practices

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

Valkey/Redis: Configuration Best Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Debugging MySQL Problems with Row-Based Replication

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: Not-So-Good Practices

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

Valkey/Redis: Configuration Best Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation