July 23, 2014

MySQL Replication vs DRBD Battles

Well these days we see a lot of post for and against (more, more) using of MySQL and DRBD as a high availability practice.

I personally think DRBD has its place but there are far more cases when other techniques would work much better for variety of reasons.
First let me start with Florian’s comments on the issue as I think they are most interested ones.

First lets get to the point what we’re comparing here – it is mainly DRBD to MySQL Replication based techniques (lets leave MySQL Cluster and Continnuent alone for a while as these are a bit from different league). It is not the question if DRBD is better than SAN – it offers more independence compared to SAN and in my view surely superior from HA point of view but this is not the point of comparison.

“Failback could destroy the original master too”, however, is plain false. DRBD won’t “destroy the original master” any more than it already was if the filesystem on top of DRBD was fried beforehand.

Let us again compare MySQL Replication to DRBD in this case – in both cases due to some reasons you can have systems to run out of sync and have conflicting updates applied to them. With DRBD you have the choice of killing one of the nodes and re-syncing from another one while with MySQL Replication you can use Maatkit to merge the changes after all, also you can review binary logs to see which updates were applied to different nodes.

Transaction log replay, yes. But fsck? These days this amounts to running a journal replay. Takes under a second in most circumstance

I would put it 10 seconds but it does not matter. The transactional log reply is likely to take much longer than that. This is very bad property of DRBD – besides the well understood overhead of committing on both nodes instead of one you also meat the tough choice – you’ve got to pick either long recovery time or further degraded performance. In the large databases I run in production relaying on MySQL Replication for HA I often have 15+ minutes reply of Innodb Transactional log which would be a huge bummer with DRBD.

I would also say this implies hidden danger – the time it will take your database to do transactional log recovery is invisible until you get a crash, meaning if your production database size growths load changes or you happen to have failure during activity of certain kind you might have it taking much longer when expected. recovery time depends on a lot of variables.

The side question about it is of course the fact you have to be picky about storage engines you’re using – DRBD does not work with MyISAM (check required) so you need to have processes to ensure your application does not uses this storage engine which may be hard to guaranty in many environments when development has too much autonomy.

I must note in this aspect however DRBD goes on par with MySQL Statement Based Replication – it is also well too easy to use MySQL features which break replication.

The failover node is a hot standby, it’s just not a running slave node from the database’s standpoint. And, nothing stops you from running two databases on two servers on two DRBD devices laid out in a “criss-cross” fashion, converging on one node in case of node failure.

This actually goes to two topics. First – hot vs cold. If you’re using decent hardware and care about performance you use O_DIRECT with Innodb which makes it to bypass file cache. If you have it DRBD slave will be fully cold. But lets assume you’re ready to pay for yet another penalty DRBD introduce and do not use this option wast memory and CPU cycles on double caching. Even in this case DRBD slave node can’t be called hot because write load often does not touches the same data as read load. Here is simple example – assume you’re inserting the data at the same time as running reporting queries on last month. All last month will be hot on the slave which is doing reads but only last few hours will be hot on the standby box.

Running two instances on the server allows to reduce hardware waste with DRBD, though not eliminate because you get some disks which you can’t really use for anything else rather than HA. Two instances also complicate things – depending on infrastructure it can be seen as almost no complication or quite serious complication.

“Cannot do maintenance on cold standby database.”

But you can do anything you want with a database that you run off a DRBD LVM snapshot. Works on a Secondary node too.

I’m not sure Florian understood what was meant here. With MySQL Master-Master replication I can add and index on passive node, wait it to catch up and switch the roles (see another post) you can’t really do this with DRBD as this requires logical level of operation to work.

This is not to mention other things you can do with MySQL Replication, such as filtered replication or cross storage engine replication, though which are not typically used for HA purposes. Time Delayed replication is however something quite helpful for some environments. Though DRBD also could be extended to support one if needed.

Now, do not get me wrong DRBD is great, and thanks to Florian for following up and making sure myths about DRBD do not spread too wide.

So when I would recommend to use DRBD with MySQL ?

There some good reasons to use DRBD with MySQL though as I mentioned I do not view it as first choice solution.

First it is good choice for organizations which are got use to SAN based high availability solutions with active-passive management software. Quite often these guys would have be familiar with such HA concept and it would be very natural for them to use same approach for MySQL as they use for PostgreSQL for example instead of investing time to learn about MySQL Replication or just looking to keep MySQL infrastructure to be as close to one for other databases in use as possible.

Second – it is often inevitable choice when you can’t avoid loosing any transaction – period. Some people would rather stand longer fallback time (as with DRBD) but would not like to have lost transactions which may happen with async replication. Another similar case is when you’re looking to ensured consistency – MySQL Replication can out of sync – and there is bunch of tips in documentation of how to do it. With DRBD the chance of nodes running out of sync is minimal and can be caused by software and hardware bugs rather than known limitations.

You could argue depending on cases you spend most time working with how much cases do these correspond – some people mainly deal with systems which can’t accept any transactions loss and for these DRBD often comes as a first choice if you have more experience with traditional web shops – these usually would prefer to lose one user comment a year instead of paying extra performance costs.

It is worth to note DRBD also allows building very nice mixed environments with MySQL – for example you can replicate binary logs using DRBD so if master node fails you have not lost transactions – when you can use such logs to do point in time recovery or to catch up to last few not committed transactions on the slave. We should spend some time implementing such script sometime which could be helping to get best of both worlds.

But currently – You can’t have it all

The state of High Availability Solutions for MySQL these days is – you can’t have it all. There is no OpenSource solution out where which would offer you full redundancy use of both nodes at least for reads no transaction loss and automated fail over. Whenever you’re using MySQL Cluster, DRBD or MySQL Replication you have to have some compromises.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. Wei says:

    Should’nt Google’s MySQL solution an HA option:
    . Semi-sync replication to wait for replica’s responses
    . Mirror binlog on the replicas
    . Make replicas serve the mirrored binlogs during the failover

    The code is there. But, it is not smoothly integrated yet.

  2. peter says:

    Wei,

    Google Patches is Semi-Synchronous solution though which is good enough for most cases. Though indeed this will be great solution once it is integrated :)

  3. Wei says:

    The patch also includes mirroring binlogs on the replicas and serve the mirrored binlogs from the replicas. The patch should also make the hierarchal replication available. I wrote it, but could not find an easy way to integrate it smoothly.

  4. Hi Peter,

    As I discussed at the conference (http://www.primebase.org/download/pbxt-uc-2008.pdf), I am planning to add HA capabilities to PBXT using synchronous replication at the engine level.

    What do you think of this approach?

    Best regards,

    Paul

  5. Great post summing up all pros and cons.

    I would add another point: Backup from slave / secondary. With DRBD this is hard / impossible in many situations.

    We use both, replication and DRBD in production. One database which is doing nothing but “spooling” working sets runs on DRBD (no slave lag, 100% sure all transactions are there, no need to backup this database). The main database is configured in Master-Master replication fashion. We evaluated using DRBD here too but did not do it because we’d loose the ability to do “online schema change” or LVM snapshot backup. And finally we were not able to get a descent InnoDB recovery time (26G buffer pool). So it would not even be a cold stand-by but a frozen one. ;)

    So all the pros and cons stroke us.

  6. I’m not sure what Peter R. means about not being able to do an lvm snapshot of the DRBD secondary node. Surely you can. If DRBD sits on top of LVM, then you snapshot the LVM and you get a snapshot of the blocks on the drbd device. Whether DRBD is in primary or secondary at this point is irrelevant.

    Another thing I’d like to add is that there are some database/application designs that just don’t allow for friendly multi-master setups. More real-time data warehouse applications that log tons of data and frequently summarize and aggregate will run into problems. Do you run aggregation scripts only on one server and propagate them? Do you run aggregates on both servers and “assume” that all the numbers add up in the end – this can be very dangerous.

    Also, merely having a slave sitting ready to take master can be troublesome in some situations because the slave is still single-threaded. You end up getting a beefier slave than your production system just so that a single thread on the slave can keep up with all your transactional threads on your master. Now you have to ask yourself – maybe I can use the slave for other things, so it doesn’t bother me too much that my slave is actually more powerful than my master. Or maybe it’s better to have two equally setup systems and use DRBD instead.

    I think given the argument that we all agree on one thing – it really depends on your application/database environment, your requirements for recovery speed, data loss, etc.

    Finally, I will leave with this note – we actually do both – drbd + ha cluster for high-availability and a big beefy slave for redundancy and read only queries.

  7. Я недавно писал об альтернативе DRBD на FreeBSD: http://michael.mindmix.ru/366-392-ggate-kak-al-ternativa-memcached-i-memory-appliance.zhtml

    Сам не пробовал такую схему реализовывать, но возможно она окажется быстрее по чтению. Особенно по случайному.

  8. I think DRBD is amazing. The concern about long crash recovery times after failover with InnoDB and DRBD can be resolved. One method to do this is a config option that enforces the maximum number of crash-recovery IO (dirty pages in buffer cache + pending IO from insert buffer + pending IO for purge thread). When the max is exceeded the server can favor background IO over transactions to let the server catch up. Another option is to let Maatkit watch the server and run commands (might need more FLUSH commands) to force more background IO when there is too much pending IO in a server.

    Is there any parallelism or concurrency (multiple threads or multiple IOs pending) in InnoDB crash recovery code?

    The Google patch had more than semi-sync for an HA solution with MySQL. It also includes a feature to keep the InnoDB transaction log and slave replication state in sync after a crash (transactional replication). But the most interesting feature was mirror binlog. This maintained mirrors of the master’s binlog on a slave (same filename as used on the master, byte for byte equivalent). This was done using the same replication protocol — the IO thread would copy events into the relay log and the mirrored binlog. It allows hierarchical replication and transparent slave failover.

    We are trying to redo support for hierarchical replication and transparent slave failover with global transaction IDs.

  9. peter says:

    Mark,

    I do not think it is as easy as you say. you can limit number of io for REDO but there is also UNDO part which depends only on size of transaction. It is indeed background in 5.0 but the data remains locked until rollback is complete which can affect operation.

    Plus you still have warmup problem.

    Speaking about Google batch – I think there is a lot of cool stuff out where besides HA, and thanks for explanations.

  10. Don’t allow long running transactions and undo is not a problem. Of course, that is easier said than done. But you don’t want long running transactions. They are an instant source of replication delay and force applications to go back to the primary to get current data or stop using the slave.

  11. peter says:

    Right. You do not want long running transactions in the perfect world, if you can avoid them it is not a big issue.
    By the way it would be cool if Innodb could allow to restrict number of changes transaction could do – enabling such stuff would allow to catch users running bad transactions easily.

    Again my point is not to say generally it is good or bad but highlight the problems you can run into :)

  12. joelb says:

    hello, we are debating DRBD vs MySqL replication. The idea would be two servers, one master, one slave running drbd. Second server would never be live unless the first server crashed, or needed maintainance. There would be a third server not at the colo that would use mysql replication to keep as closely up to date as possible with the master server at the colo, and be used as a failover in the event of an outage.

    I’m confused by what I’ve read: After a crash, the slave is now live. How long does it take to resync the master with the slave? Can the slave simply become the master and the master now be considered the hot spare slave that will take over if the new “master” machine ever crashes?

    Thanks,

    Any comments (& the sooner the better) would be helpful, also any direction to a community of folks who are deeply involved with these technologies would be helpful as well.

    – Joel

Speak Your Mind

*