MySQL 5.1 introduces row-based binary logging. In fact, the default binary logging format in GA versions of MySQL 5.1 is ‘MIXED’ STATEMENT*; The binlog_format variable can still be changed per sessions which means it is possible that some of your binary log entries will be written in a row-based fashion instead of the actual statement which changed data, even when the global setting on the master is to write binary logs in statement mode. The row-based format does offer advantages particularly if triggers or stored procedures are used, or if non-deterministic functions like RAND() are used in DML statements.
A statement based replication slave can get out of sync with the master fairly easily, especially if data is changed on the slave. It is possible for a statement to execute successfully on a slave even if the data is not 100% in sync, so MySQL doesn’t know anything is wrong. This isn’t the case with row-based replication. Only actual changes are written to the database with row-based logs. Take as an example a DELETE statement which does not modify any rows. No binary logging will be performed by this statement in row-based logs, but the DELETE will be written to a statement based binary log. Since row-based logs only contain information about changed rows, a replication slave which does not find a row referenced in a row-based log will immediately return an error: HA_ERR_KEY_NOT_FOUND.
This is probably best shown by example. First, let’s create a test table on a MySQL master database and populate it with some data.
1 2 3 4 5 | CREATE TABLE `t1` ( `c1` int(11) NOT NULL AUTO_INCREMENT, PRIMARY KEY (`c1`) ) ENGINE=InnoDB; Query OK, 1 row affected (0.02 sec) |
Insert a single “seed” row into the table:
1 2 | INSERT INTO repl.t1 VALUES (); Query OK, 1 row affected (0.00 sec) |
INSERT … SELECT a few times to get some dummy data:
1 2 3 4 5 6 7 8 9 10 11 12 13 | insert into repl.t1 select NULL from t1; Query OK, 1 row affected (0.01 sec) Records: 1 Duplicates: 0 Warnings: 0 insert into repl.t1 select NULL from t1; Query OK, 2 rows affected (0.01 sec) Records: 2 Duplicates: 0 Warnings: 0 ... insert into repl.t1 select NULL from t1; Query OK, 32768 rows affected (0.62 sec) Records: 32768 Duplicates: 0 Warnings: 0 |
Verify that we have a reasonable amount of data to play with:
1 2 3 4 5 6 7 | select count(*) from t1; +----------+ | count(*) | +----------+ | 65536 | +----------+ 1 row in set (0.02 sec) |
My test setup includes a MySQL slave. After the above setup completes, I am going to change data on the slave by truncating t1. After truncation:
1 2 3 4 5 6 7 8 | select count(*) from t1 as empty_table_on_the_slave; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec) |
Now, lets make a modification on the master:
1 2 3 | update repl.t1 set c1 = c1 + 100000 where c1 = 1; Query OK, 1 row affected (0.02 sec) Rows matched: 1 Changed: 1 Warnings: 0 |
On the slave, SHOW SLAVE STATUS now reports an error (many fields omitted):
1 2 3 4 5 6 | SHOW SLAVE STATUS\G Relay_Log_File: mysql_sandbox25162-relay-bin.000002 Relay_Log_Pos: 340718 Slave_SQL_Running: No Last_Errno: 1032 Last_Error: Could not execute Update_rows event on table repl.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000001, end_log_pos 340718 |
This error message is telling us that a row that was modified on the master can not be found on the slave:HA_ERR_KEY_NOT_FOUND. It doesn’t tell us what row it couldn’t find though. In our simple example, we know that all the rows are missing, and the one in particular that the server is trying to find is c1=1, but how can we find that out if we don’t know the conditions under which replication went out of sync?
To determine exactly what row is having an issue, we can use the ‘mysqlbinlog’ utility in combination with the slave relay log. Examining the output of SHOW SLAVE STATUS, we can see that the slave is currently executing at relay log file: mysql_sandbox25162-relay-bin.000002 at position: 340718. MySQL 5.1 ships with a new version of mysqlbinlog that includes new options that make decoding RBR statements possible. These options are “–base64-output” and “-v”. For debugging row-based logs, I suggest ‘–base64-output=decode-rows -v”. This omits any base64 output and instead replaces the output with semi-valid SQL statements. I say semi-valid because the statements do not include column names. Column names are not included in the binary log, only positions.
mysqlbinlog mysql_sandbox25162-relay-bin.000005 –base64-output=decode-rows -v > decoded.log
Examine the decoded binary log with the ‘less’ utility, and search forward for the value 340718:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | # at 340718 #100506 12:42:56 server id 1 end_log_pos 340637 Query thread_id=6 exec_time=0 error_code=0 SET TIMESTAMP=1273174976/*!*/; BEGIN /*!*/; # at 340782 # at 340823 #100506 12:42:56 server id 1 end_log_pos 340678 Table_map: `repl`.`t1` mapped to number 15 #100506 12:42:56 server id 1 end_log_pos 340718 Update_rows: table id 15 flags: STMT_END_F ### UPDATE repl.t1 ### WHERE ### @1=1 ### SET ### @1=100001 # at 340863 #100506 12:42:56 server id 1 end_log_pos 340745 Xid = 51 COMMIT/*!*/; # at 340890 |
As you can see, the RBR entry has been decoded as an update statement. The OLD value (the row that is missing) is represented in the WHERE clause. The binary log does not carry column names, so each column is represented by the column position prefixed with @.
There you have it. You can now find out exactly which row is missing and begin your investigation as to why it is missing. You may have to do a bit more leg work if many rows were modified during a transaction.
You should always set read_only = true on your MySQL slaves to prevent accidental changes in data from happening!
*edit*
The MySQL 5.1 default binary logging format changed back to STATEMENT in MySQL 5.1.29.
MySQL 5.1 default binlog_format in GA has always been STATEMENT. The main reason for this choice is that we wanted to avoid compatibility issues.
Notice that 5.1 has been GA since version 5.1.30.
I must say that your example is quite unsettling, as it is based on a human mistake that puts the slave in a non recoverable state.
See also this post, where the enhancement of mysqlbinlog was announced.
http://datacharmer.blogspot.com/2008/09/dsclosing-binlog-entries-with-row-based.html
The ‘human mistake’ is just one reason a slave can get out of sync…
Try this on your statement based master:
DELETE FROM some_table ORDER BY RAND() LIMIT 1000;
Your slave is now out of sync. Oops..
And I’ve run into this problem when SQL_LOG_BIN=0 was set on the master and changes were made. Since deletions (or insertions) happened on the master that were not on the slave the slave was out of sync. By diagnosing the problem using this technique, I was able to track down how the rows went ‘missing’ on the slave.
My simple example is just that – a simple example. Reality is complex and there are very many reasons a slave can get out of sync. When it happens you need to know how to discover what isn’t in sync (which is not obvious from the error message) so that you can find the underlying cause and prevent it from happening again.
Further, it is possible to change from MIXED to ROW to STATEMENT in an individual session, thus a slave could get RBR updates even when the master is in STATEMENT mode.
Also, how is statement “more compatible”? You can’t safely replicate from a 5.1 to a 5.0 host, which is the only place the “compatibility” issue seems to make sense. A 5.1 slave can consume both Query and RBR events, so how does using statement improve anything? Triggers, cascading foreign keys and other indirect data changes are much more safely captured using RBR than statement logs. Using non-deterministic functions like UUID or RAND() are much safer with RBR.
And I don’t buy the argument in: http://bugs.mysql.com/bug.php?id=39812
The people who were relying on statement side effects should manually use statement, or fix their application to set binlog_format=STATEMENT before statements where the stupid statement side-effects are desired.
Justin,
With Statement Level replication MySQL 5.1 to 5.0 works for some kinds of applications. It is not guaranteed to work in all cases of course but it does for many simple MySQL use cases in practice.
This is in fact important case because it allows to get extra assurance for MySQL 5.0 -> 5.1 migrations – you can keep MySQL 5.0 slave for some time after promoting Master to MySQL 5.1 to have something to roll back to.
Justin,
On Compatibility – I think this is why there is MySQL and Drizzle. MySQL has a lot of weight with bad choices over many years which can’t be easily broken because this unsettles a lot of customers who relay on these behaviors.
Also people do not often know they relay on any side effects. They just upgrade Master and Slave to MySQL 5.1 and just expect things work as before. If it breaks this is serious issue.
I think moving back to STATEMENT by default in MySQL 5.0 was a good choice. Breaking existing applications is a lot more serious issue than improving things (which row level replication does)
One problem I have experienced with ROW based binary logging, is that mysql can make log entries that get too huge to be parsed through mysql client.
An example is, if you delete all rows in a huge table
DELETE * FROM huge_table;
This will be formulated as one BINLOG ” entry in the binary log, with a delete statement for each row in the table.
As a result you will get the error ‘ERROR 1153 (08S01) at line 1074175: Got a packet bigger than ‘max_allowed_packet’ bytes’ when trying to apply
this log to a database (mysqlbinlog bin-log.000001 | mysql -uroot)
You can increase the max_allowed_packet to a maximum of 1GB, but you will still be able to get larger entries in the log.
Hi Kim,
I wanted to let you know that the next release of mysqlbinlog should include a –max-allowed-packet option instead of hard coding it to 1G. This should resolve your problem with mysqldump.
Hi All.
I using MIXED binlog format. And having problem with simple delete query.
DELETE FROM temp.tenp WHERE id = 10;
This query must delete 3000000 rows from my innodb table.
But slave came to out of sync. (like in comment#2 by Justin Swanhart were hi used STATEMENT binlog format)
But if I’ve change binlog format to ROW based, I’ve got error like in comment#8 by Kim Carlsen …..
What I must to do ? Some configuratuin tuning?
do not delete a lot of rows in one query ?
How much rows I can delete per query if I using MIXED binlog format ?
I’m experiencing a very strange situation. I set binlog_format = “ROW” in a console applying for my current session only
mysql> show variables like ‘%binlog_format%’;
+—————+——-+
| Variable_name | Value |
+—————+——-+
| binlog_format | ROW |
+—————+——-+
and ran an update query to change data of 1 row. The change only affected on the specific column(s) which determined in the update query. At this moment, I’m so confused about the accuracy of MySQL replication.
For your info
* MySQL Master: MySQL Cluster 5.1.56-ndb-7.1.17-cluster-gpl-log
* MySQL Master: binlog_format = MIXED
* MySQL Slave: MySQL 5.1.56-ndb-7.1.17-cluster-gpl
* MySQL Slave: SET GLOBAL SLAVE_EXEC_MODE = ‘IDEMPOTENT’;
Are there bugs on this version?
While we are on this topic, is it possible to replicate only inserts that happen on master to slave? I know its not there in 5.1 but in 5.5 is there any option to replicate DML of choice? I need this for a thing I am experimenting and wanted to know if either MySQL or Percona has any solution.
Thanks and Regards,
Raghu
Hi, you mistaken the row-based replication and statement-based replication. It is the RBR with slave_exec_mode=strict will cause the HA_ERR_KEY_NOT_FOUND problem. check it for yourself: http://dev.mysql.com/doc/refman/5.1/en/replication-rbr-usage.html
Thanks man, I will check it immediately