Jeremy Cole recently posted very nice post about MySQL Replication Performance. There are however few points I should comment on.

  • Jeremy Speaks about MyISAM and Innodb in the same terms, in fact they are quite different for Replication. As you probably know MyISAM uses buffered IO so if master crashes you’re out of sync anyway – whatever sync_binlog option was set to. Well it is still a bit better with sync_binlog enabled as slave could have consistent data after it catches up but master may have some of its tables out of sync with binary log and it is not easy to find which.
  • I’m surprised to read enabling binary log drops update performance. I guess Jeremy speaks about enabling binary log and setting sync_binlog option which is good for safe replication. But it is setting which is off by default so you should not see such effect just by enabling log_bin.
  • Innodb tables also have to flush logs on transaction commit unless you disable it so you should not be getting 1500 update transactions per second from the single thread without battery backed up cache. If you do it is quite possible Operation system is faking fsync() for you. In certain cases it only does so for sequential writes this is why sync_binlog changes things. Two logs are synced now which requires seeks which make certain disks to flush their cache.
  • Watch out for MySQL 5.0 – if you enabled Binary log with MySQL 5.0 you loose group commit which can dramatically increase commit rate for multiple user load. See this post by Vadim for benchmark results.
  • Solution with Battery Backed up RAID is great one and is not that expensive these days. Other two probably just create conditions for Operation System to fake fsync. You need to get data to the disk and you can’t physically get data to the disk surface at 1500 operarions per second. Jeremy is speaking about seeks but it is only part of the latency. Disk rotation is another big contibutor. Even if we stay on the same track we’ll be only able to do single disk write per rotation, which is about 250 for 15000 RPM drive. If you’re getting more without battery backed up cache something is likely faking it.

I also should mention this covers only Master overhead of Replication. Slave is where you can expect more problems, especially on high end systems. Slave serializes all operations to single stream so group commit can’t work. Furthermore even if you run with innodb_flush_logs_at_trx_commit=2 you still do not solve all problems. You still will practically have only One CPU to handle replication load. Furthermore if you have multiple disks they will not be used effectively as replication SQL thread will typically submit IO requests to the disk one by one so only one drive will do the work and other will be idle. You mostly really affected by this if you have a lot of writes to large database size.
This is one more reason why you might want to go with scale out and keep single server a box with few CPUs and few hard drives while running MySQL.

11 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Jeremy Cole

Hi Peter,

Sorry for being unclear on a few points. You’re right abuot sync_binlog. I am mainly discussing setting it to sync_binlog=1, but I neglected to mention that in the post. I’ll update it!

To give some context, my current research in this area is with regard to using replication for failover/disaster recovery, where sync_binlog=1 is basically a requirement.

Regarding transaction rates, in a single-threaded environment, you’re right, but when multiple threads are writing, the actual disk writes may be batched together, so you can get higher write throughput.

Additionally, those numbers are the averages, and depending on workload it’s possible to get much better than average throughput.

For a busy server writing small binary log events constantly or near constantly, to an ext3 filesystem set with data=journal, I think that’s nearly the optimal workload for the disk, without a write cache.

Regards,

Jeremy

tnagy

Hello,
I was at the MySQL conference in May and attended Jeremy’s presentation on replication. My understanding was that all “writes” to the binary log were protected by a mutex–so you may have 100s of threads updating different tables/rows, but if sync_binlog = 1 (which is a requirement for true ACID integrity)–doesn’t this mean that in the end all writes (to the binary log) are “effectively” single-threaded?
Another question along this line is with regards to innodb_flush_log_at_trx_commit = 1? Is it also protected by a mutex and is it somehow “coordinated” with the write to the binary log?
Peter, it would be great if you could provide some high-level explanation of how InnoDB updates are propagated between the in-memory-buffer–>innodb-log-files–>innodb “tablespace” (.ibd) file–>and finally the binary-log.

tnagy

Hello Peter,
Thanks so much for the prompt response.
If I understand your response correctly regarding the “flushing” of the InnoDB buffer pool, do you mean that data in the buffer pool (memory) is written directly to the tablespace files (.ibds)? So the sequence is something like the following:
START TRANSACTION (implicit or not)
UPDATE . . .. (modify some rows in the buffer pool (in memory)
COMMIT–>write (a delta?) to innodb-log-file AND
write statements to binary (replication) log AND THEN
. . . . do whatever is necessary because of multi-versioning in buffer pool (and mark these rows as dirty)

Then at some later time, as part of a background process, write the “dirty” pages in the buffer pool to the .ibd files
I’m sure I’m leaving some things out, but is that essentially the sequence? I had assumed that the innodb-log-files were “flushed” at some point to the tablespace files, just because in case of a crash before all buffered dirty pages were flushed, then InnoDB has to use its log files to synchronize the tablespace files because obviously anything in memory would have been lost, but it seems like you’re suggesting that the log files are not used to synchronize the tablespace files EXCEPT in the case of a crash?

Also, as you mentioned the issue of group commit being broken (see

tnagy

My post in fact was too long, here’s the rest:

Also, as you mentioned the issue of group commit being broken (see

tnagy

I guess the problem was with you can’t embed html tags in the comment, sorry–i’ll try again

Also, as you mentioned the issue of group commit being broken (see http://www.mysqlperformanceblog.com/2006/05/19/group-commit-and-xa), I read that entry with interest, and what I took from it was that you should disable xa-support in my.cnf if you
1. don’t need it and
2. have binary logging (replication) enabled
your benchmarks suggested that by doing that (even if group commit is “broken”) you can double your write throughput

However, i stumbled upon this in the official mySQL 5.0 documentation:
“Even with sync_binlog set to 1, there is still the chance of an inconsistency between the table content and binary log content in case of a crash. For example, if you are using InnoDB tables and the MySQL server processes a COMMIT statement, it writes the whole transaction to the binary log and then commits this transaction into InnoDB. If the server crashes between those two operations, the transaction is rolled back by InnoDB at restart but still exists in the binary log. This problem can be solved with the –innodb-safe-binlog option, which adds consistency between the content of InnoDB tables and the binary log. (Note: –innodb-safe-binlog is unneeded as of MySQL 5.0; it was made obsolete by the introduction of XA transaction support.)”

I’m not sure what to make of that. XA support is traditionally for 2-phase commits between two different databases (or other transactional resources), i wouldn’t think that the binary log ITSELF is XA-compliant:)

I guess what I’m really asking is, if you have enabled binary logging (replication), sync_binlog = 1, and you’re using MySQL 5, and it’s the only transactional resource you’re using, is it safe to DISABLE innodb_xa_support? i.e. will the potential extra “dirty” statement remain in the replication log? I guess I don’t understand how enabling xa-support would resolve this, and as your benchmarks indicate, it potentially makes a HUGE difference (group commit or not).
Sorry for the length of this post, but I think you can see it’s a liitle bit confusing:)

Thanks

Sean

Peter,
I will be inheriting 25+ db servers (M->S replication) with over 800GB of data, 1000+ of writes per minute, all in MyISAM tables using version 4.1. Side note: A project will be underway to move to InnoDB. However in the mean time, I need to ensure we won’t loss data or at least keep it to minimum in the event of a crash. And if I understand you correctly, the best I can do is:
1. use battery backed up cache
2. set sync_binlog = 1

Alan

I’m planning to run with log-bin and sync_binlog=1. I’m also planning to do a full database backup (FLUSH TABLES WITH READ LOCK followed by a cp) once per day, along with a FLUSH LOGS. In the event of a crash, shouldn’t I be able to restore all data (both MyISAM and all committed InnoDB transactions) by just restoring the last full database backup and then applying the binary logs with mysqlbinlog? If so, why do I need the InnoDB logs, ib_logfile0 and ib_logfile1? Is there a way to disable them so they are not even written?