I generally thought about MySQL replication as being quite low overhead on Master, depending on number of Slaves. What kind of load extra Slave causes ? Well it just gets a copy of binary log streamed to it. All slaves typically get few last events in binary log so it is in cash. In most cases having some 10 slaves are relatively low overhead not impacting master performance a lot. I however ran into the case when performance was significantly impacted by going from 2 to 4 slave on the box. Why ?

The reason was Master was having a lot of very small updates going on – over 15.000 transactions a second. Each time event is logged to the binary log all 4 threads feeding binary logs to the slave were to be woken up to send the update notification. With 4 slaves connected this makes 60.000 of threads being woken up sending some 60.000 packets (there may be some buffering taking place on TCP side merging multiple sends but still)

I guess this scenario is just not really caught any developer attention yet as it should be rather easy to optimize. Same as network cards are designed to throttle numbers of interrupts they get and process several packets at the time we could make replication threads to be woken up in the batches. For example we could tune the system to wake up the thread feeding slave no more often than 1000 times a second and each wake up even would send multiple events to the slave. It should be possible to make this number tunable as more rare wakeups are less overhead but they are also can impact replication latency a bit. It is also possible to get some auto detection in place timing how long it really takes all threads to send their data to the slave. If you have large amount of slaves the delay from the event executed on the master to last thread sends packet to the slave can be significant.

What does this case teach me in general ? To always look for data rather and question your assumptions. If something is unlikely to be the bottleneck it does not mean it is not 🙂

13 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Robert Hodges

Hi Peter,

Just out of curiosity, what’s the highest update rate you have ever seen going into the binlog? I’m working on getting Tungsten Replicator to support much higher rates than we do currently and it’s helpful to understand the design ceiling. MySQL can jam an amazing amount of data into slaves.

Cheers, Robert

zerkms

why not use blackhole as an intermediate layer between master and slaves?

zerkms

hmmm…… why overhead not going avay?
it will be removed to dedicated “blank”-blackholed server and the master now will have to serve just 1 slave, instead of N.

zerkms

yeah, but the discussion is around topic sounds like “How expensive is MySQL Replication for the Master” 😉
and in this case it costs in only one slave at all 😉

Mark Callaghan

@zerkms – there is a cost from buying and managing the extra server. I don’t want to manage another server.

zerkms

@Mark
nice pov 😉 but money isn’t the absolute measure for “cost”. due to facebook practice, let imagine 1 (one) master server which serve 4 datacenters. what will be it’s bandwidth?

ps: like your posts at MySQL@Facebook 😉

Mark Callaghan

I think I want the blackhole slave to run on the same server as the master. That will eliminate the cost of extra hardware and networking. The cost of managing the extra servers still exists. I think the rise of the cloud and MySQL running on it will spur the development of frameworks that make it easier to manage large numbers of mysql instances. I don’t think we have good frameworks yet that make it easy to manage the extra servers.

Diego Cassinera

Good idea regarding the interrupts, however, if you are talking about the cost of slaves you did not get into memory usage. In order for the server to deal with the different lag of each slave, it need to have a stream of operations for each slave. In most cases most of the slaves are on the same binlog, but when they are not multiple binlogs will be in memory,

On large systems, it makes total sense to stream a binlog to one system, and this system propagates to the rest.
Using black hole you could do this in a pyramid fashion.

Bay

correct me if I am wrong, blackhole engine has its limitations, only has insert trigger, no update and delete trigger, can not support auto-increment keys, it is not reliable and often break replications.

Bay

correct me if I am wrong, blackhole engine has its limitations, only has insert trigger, no update and delete trigger, can not support auto-increment keys, it is not reliable and often break replications.