After having reviewed the definition my the previous post (The definitions), the next step is to respond to some questions.

Do you need MySQL High-Availability?

That question is quite obvious but some times, it is skipped. It can also be formulated “What is the downtime cost of the service?”. In the cost, you need to include lost revenue from the service and you also need to consider less direct impact like loss of corporate image and other marketing costs. If your downtime cost is under $10/h, you can stop reading this document, you don’t need HA. For the others, let’s move on!

How to determine which MySQL High-Availability solution is best?

What is really tricky with MySQL is the number of possible HA solutions. From the simplest the most complex let’s list the most common ones:

– MySQL replication with manual failover
– Master-Master with MMM manager
– Heartbeat/SAN
– Heartbeat/DRBD
– NDB Cluster

These technologies are by far, not a one size fits all and many deployments use combination of solutions. I will not cover ScaleDB and Continuent because I know almost nothing of these solutions. There are many more questions you need to ask yourself before being able to pick the right one. Below, I listed the most common questions, I might have missed some.

1. What level of HA do you need?

Since all the technologies do not offer the same level of availability, this is a first important sorting factor. Here are estimates of the level of availability offered by the various solutions.

 Level of availability
Simple replication98 to 99.9+%
Master-Master with MMM manager99%
Heartbeat/SAN (depends on SAN)99.5% to 99.9%
Heartbeat/DRBD99.9%
NDB Cluster99.999%

From the table, if your requirements are for 99.99%, you are restricted to NDB Cluster while if it is only 99% you have more options. I recall that the level of availability is hard to estimate and subject to debate. These are the usually accepted level of availability for these technologies.

2. Can you afford to lose data?

Obviously, if you are concerned about loss of data, you are most likely using the InnoDB storage engine, since MyISAM is not transactional and do not sync data to disk. Similarly, MySQL replication is an asynchronous process and although it is fairly fast at transferring data between the master and the slaves, there is a window of time where data loss is possible.

If you can afford to lose some data, you can consider “MySQL replication” and “Master-Master with MMM manager” otherwise, you can only consider the other three solutions.

 Data 100% safe
MySQL replicationno
Master-Master with MMM managerno
Heartbeat/SAN (depends on SAN)yes
Heartbeat/DRBDyes
NDB Clusteryes

3. Does your application use MyISAM only features?

There are some features like Full text indexes and GIS indexes that are supported only by MyISAM. The HA solutions that work well with MyISAM are “MySQL replication” and “Master-Master with MMM manager”. Depending on the application, the MyISAM Full text indexes might be replaced by another search engine like Sphinx in order to remove the restriction. There is no HA solution other than the ones based on replication that handles GIS indexes.

 HA solutions
Need MyISAM Full text or GIS indexes“MySQL replication” and “Master-Master with MMM manager”
Don’t use any special MyISAM featureAll
Can change MyISAM Full text to SphinxAll

4. What is the write load?

The HA solutions we present are not equal in term of their write capacity. Due to the way replication is implemented, only one thread on the slave can handle the write operations. If the replication master is multi-cores servers and is heavily writing using multiple threads, the slaves will likely not be able to keep up. Replication is not the only technology that put a strain on the write capacity, DRBD, a shared storage emulator for Linux, also reduce by about 30% (very dependent on hardware) the write capacity of a database server. In term of write capacity here are you choices.

 Write capacity
MySQL replicationFair
Master-Master with MMM managerFair
Heartbeat/SAN (depends on SAN)Excellent
Heartbeat/DRBDGood
NDB ClusterExcellent

5. For what level of growth are you planning?

Since NDB Cluster is an integrated sharding environment, if you are planning for a growth that will need sharding (splitting the database over multiple servers), then you might need to take a serious at that solution. If not, then, apart from the write capacity, all the solutions are approximately equal.

6. How qualified is your staff or support company?

There is a quite direct relationship between the level of availability and the complexity of the solution. In order to reach the promised level of availability, the staff maintaining the HA setup, either internal or external, must have the required level of expertise. The required expertise level is summarized in the table below.

 Expertise level
MySQL replicationTypical, average MySQL DBA + some Sysadmin skills
Master-Master with MMM managerGood, average MySQL DBA + good Sysadmin skills
Heartbeat/SAN (depends on SAN)High, Good MySQL DBA + strong Sysadmin skills
Heartbeat/DRBDHigh, Good MySQL DBA + strong Sysadmin skills
NDB ClusterVery high, Specific NDB knowledge, strom MySQL skills and strong Sysadmin skills

7. How deep are your pocket?

The last aspect that needs to be considered is the budget, complexity is expensive. We will consider two types of setup. The first one is a basic proof of concept of the technology with the hardware tested, the data imported and basic testing and documentation. A proof of concept setup is a good way to get used to a technology and experiment with it in a test environment. The other type of setup we will consider is a full production setup that includes extensive testing, fire drills, full documentation, monitoring, alerting, backups, migration to production and post migration monitoring. Of course, it is the safest way to migrate an HA solution to production. All the times here are estimates based on field experience, the values presented here are fairly typical and contains some buffers for unexpected problems. Although an HA solution can be built remotely through a KVM over IP and adequate remote power management, an on site intervention with physical access to the servers is the preferred way, especially for the most complex solutions.

 Proof of conceptMigration to Production
MySQL replication4 hours12 hours
Master-Master with MMM manager8 hours24 hours
Heartbeat/SAN (depends on SAN) 32 hours120 hours
Heartbeat/DRBD40 hours120 hours
NDB Cluster40 hours120 hours+

Editor’s Note: We’ve gotten many questions about the time estimates mentioned here. The above estimates shouldn’t be used to compare against any specific situation. Time will vary greatly depending on your project. For example, “setting up replication” can be as simple as CHANGE MASTER TO, and can take as little as a few minutes in some circumstances. Yves’s estimate is for a project to create a replication slave for HA purposes, not for “setting up replication.” There is a big difference between an HA project and a DBA task. — Baron Schwartz

35 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Gerry

Hi all,

MySQL NDB Cluster is not 99.999% (aka 5 9s) uptime. 5 9s means about 5.25min a year. Certain maintenance tasks require a re-start, which can easily exceed 5min. Strictly speaking then, it’s 99.999% reliable excluding maintenance downtime.

Also the solutions you present have caveats. Example: talking about cluster again, it’s storage is limited by the amount of memory the NDB nodes have. Storing data on disk can seriously degrade performance compared to 100% in memory operation.

Also each of the solutions you present have hardware and, in general, infrastructure costs that have to be brought into the ROI equation as well.

My $.02
G

Ryan

One question not addressed there is how many copies of the data does each method store? For example with Replication and MMM you have more than one distinct copy of the data, where as with DRBD,SAN, and NDB you have a single copy of your data. This is important to the idea of % uptime, depending on what time of failure.

I know from experience that if you somehow wreck your single copy of the data in either of those solutions (for example corrupt ibdata files), you are now offline and need to restore from backups. Whereas with replication your second copy is your failover and can result in much less downtime in these scenarios.

Baron Schwartz

Ryan, with DRBD and NDB you have more than a single copy of data.

Mark Callaghan

Do the availability numbers for NDB assume that your datacenter and the network connections to it are also 99.999? And if you aren’t willing to assume that can you use NDB + geographic replication to compensate? Problems like having your country removed from the internet sometimes get in the way of uptime.

Morgan Tocker

Yeah in NDB it’s configured by NoOfReplicas. Having the data in two locations prevents you from some problems, but not corruption.

DRBD and MySQL Replication are complimentary – keep in mind that you are allowed to use both 😉

Ewen Fortune

Gerry,

NDB cluster can do rolling restarts, so the cluster can stay available during the restart.

http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-rolling-restart.html

You can also use disk based tables with only the indexed data in memory using its page buffers and OS file system cache to minimize IO.
In simple terms this is what InnoDB does with the buffer pool and MyISAM with its key_buffer and OS cache.

http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-disk-data.html
http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-ndbd-definition.html#ndbparam-ndbd-diskpagebuffermemory

domas

You’re underestimating replication availability. Do note, that for physical storage based solutions (SAN and DRBD) you have to do crash recovery, whereas replication switchover/failover can be immediate. In proper environments this can mean seven nines or something 🙂

Ryan H

In our production environment using muti-master for 6+ years we now achieve 99.998% uptime across our DB servers. It seems in the article you are assuming a single database / applicaiton. For example for a single database you can achieve higher writes with a SAN but if your already logically separating your customers over hundreds of servers then putting them all on a single SAN with not improve performance in my opinion especially if you talk about putting SSD on the local servers.

There are three types of failures:
* mysql becomes unavailable (best case)
* mysql connection are unresponsive connection happen very slowly
* mysql connection are unresponsive connection hangs indefinitely

when things crash quickly that is the best situation for failover. When a drive fails in a bad way and causes the server to lose 99% of it’s performance this is much more difficult to correct automatically.

Our downtime falls into these categories:

* kernel panics – with hundreds of DB servers we get kernel panics weekly.
* hardware failure – typically losing a disk. Usually the RAID controller does the right thing though when hard drives don’t just die but become unresponsive often they get into a bad state.
* mysql server crash due to a new bug
* mysql server crash due to innodb corruption. Over hundreds of servers this does happen and is a particular nasty situation. There were a few that have been reproducible and crashed the masters and all slaves via replicaiton (past bugs) though typically the data is corrupted on only one server. You can try to mitigate this risk by rebuilding your tables occasionally (on the secondary) and also rebuild them after upgrades. Corrupt data is a reality unless your using something like ZFS to prevent if from ever happening.

With SAN based solutions you are putting all reliance on the SAN and it’s built in redundancy. You can count on anything failing in the worst case leaving you with huge downtimes and restore times.

I have heard of several horror stories with DRBD with the data being corrupted on the master then corrupting the secondary.

With Muti-master the main advantage is the share nothing nature. It solves the problem where data gets corrupted on one node. There is no shared infrastructure (at least they way we are setup) so you have the best chance of having an available master when the worst happens. The main dis-advantage is the replication delays with muti-master. This is where muti-threaded slaves would be a great new feature.

Todd Troxell

Thanks for a great overview– one thing I’m confused about is why replication
reliability is listed at 99% and DRBD is listed at 99.9% — what is the
reason for this? I can’t think of it : )

Also, I think Replication/DRBD are the only ones most people would take
seriously. Otherwise you’re probably an enterprise and you just buy
people that know how to run a SAN or something.

Ernie S

Along the same lines as the requirement to use MyISAM (and the restrictions on HA solutions that come with it) if you need certain MyISAM-only features and can’t use Sphinx or some other workaround, there are some features which you might want in your database that aren’t supported by the NDB and would require ruling it out or potentially some serious application redesign. (No foreign keys, 8kb row limit, etc.)

LinuxJedi

Gerry: Maintenance and upgrades are done using a rolling-restart so that this does not cause any downtime. One of the key features of cluster is no single point of failure so if maintenance and upgrades went against that it would be a bit pointless.
RAM is cheap and disk operations can still be pretty fast (all the other solutions are disk based anyway).

Mark: Your datacenter internet connection may be down but at least your database will be up 😉 You could also say that datacenter power is not 99.999% reliable.

Rob Wultsch

1. If a organization is using GIS and they want to use a Open Source database is MySQL really the right decision? PostGIS is a mature solution that might well be ideal for a variety of reasons. HA Postgres is beyond the scope of this response but several solutions exist (WAL shipping isn’t brain surgery…).

2. I think the complexity of NDB might be being under rated here:

2.1 The monitoring requirements are significantly greater than the alternatives. Tablespaces are fixed size and several of the memory settings can cause the cluster to go down if they are exhausted.

2.2 With 2.1 in mind significant consideration must be payed to the actual size of the data set after being imported into NDB. Of particular note is that blob/text columns (for several reason) are less than ideal for NDB.

2.3 DDL in NDB has had historic issues with replicating correctly via internal Replication.

2.4 I think it worth pointing out NDB is less than ideal if one needs to replicate to geographically dispersed location. There are way to get around this (external replication…) but this introduces even more complexity.

2.5 NDB has bugs like all software and the userbase is not that large and the number of experts is tiny, however the LOC count is massive (with admitidly a giant test suite). The list of MySQL Certified Cluster DBA* is ~100 people, many from Sun. This should scare you.

A few months back I saw someone on #mysql-ndb with a down cluster offering 200 USD per hours for help to get the cluster back up. As far as I can tell no one ever responded to him.

2.6 MySQL-NDB is not a complete solution without putting a good bit of thought into what happens when a API node dies. One way or another a complete solution either involves a very smart application (have fun with that) or making use something to automate failing over to an IP to a API node that is up.

2.7 Just adding more data nodes to increase capacity can easily decrease performance.

2.8 Security. Most communication is in the clear.

2.9 The MySQL database does not replicate with cluster. Need to add a user or update a sproc? You will need to hit every API node.

2.10 Join performance can be less than ideal with NDB.

This is just off the top of my head…

*I recently passed the MySQL Certified Cluster DBA exam and am *far* from an expert.

LinuxJedi

Hello Rob,

Yes, prior planning is needed when setting up and configuring a cluster (I would imagine the same is true of many database deployments). Most memory settings should not cause the cluster to go down if exhausted, just cause transactions to be aborted.
I would argue that replication with NDB is more reliable that regular MySQL when configured correctly.
Text/Blobs are not ideal in Cluster, that is correct. This is something that is changing, but it wasn’t something Cluster was originally designed for.
We have some very large customers with some very large deployments of cluster. We have customers with > 99.999% uptime.
I think you’ll actually find only about 10% of the certified Cluster DBAs work for us. But this is kind of irrelevant because the cert is massively out of date (there is a new cert on the current versions coming soon).
I did not see the $200 offer, but if they had tried asking on the cluster mailing list or forum me or one of my colleagues would have seen it and we would have most likely been able to help.
If you use MySQL for NDB then yes you have to give some thought into switching API nodes. There are many possible load-balance options for client apps for this and several other HA solutions require the same thing.
Adding data nodes should not decrease performance (unless your network is slow).
Security is not something that will change, due to the way cluster works you could have problems if it was not on its own private network. This is by design and well documented. Security overhead will cause more problems than it solves.
Cluster is not designed for joins (again this can be seen in its roots). That said there is work-in-progress to improve this.

Hope this helps. Congrats on passing the cert 🙂

Rob Wultsch

LinuxJedi:
I yield to your obviously superior knowledge on the subject. I had misremembered some distribution awareness material in terms of the performance impact of adding boxes. As for running out of IndexMemory, DataMemory, etc… a read only database seems only slightly better than database that is down.

I agree that NDB has the possibility of being used for very high availability, however my point is that it complicated solution that will require both a good bit of time to setup the cluster correctly, probably change application schema and code (etc), as well as a continuing need to monitor the servers. I have spent a good bit of time in front of the docs and I have a lot to learn still.

LinuxJedi

Hi Rob,

I agree with your points there, and the developers are taking steps to help with this including better monitoring in upcoming releases.

It is a complicated soultion, and yes it could well take some time to set up. I am hoping over time this will get easier and cluster will get better as a more general-use database. Luckily there are some good blogs and webinars on it, and Johan Andersson has written an online tool to help you configure cluster. Hopefully we are gradually making it easier to pick-up (its certainly easier than back when I started with cluster in 5.0) but as always feedback is greatly received.

Gerry

Thank you all for clarifying that Cluster now supports rolling updates. I guess I need to read the docs and announcements in more detail.

: Replication, SAN / NAS, DRBD and Cluster can all achieve 5 9s (and higher) in practice and if you introduce the “Human Factor”, that can easily shave 1 ‘9’ out of the numbers if the DBA doesn’t know the chosen solution like the back of your hand. Working with the solution the DBA knows best can, on the other hand, add 1 ‘9’.

My additional $.02
G

Michael

: It may be a redundant question, but what’s the specific problem that prevents replicating MyISAM tables with DRBD?
I did a few tests recently and it worked fine. Yet it’s not a first time I come across the “you can’t use MyISAM with DRBD” statement.

Thanks for a nice Q&A!

Rob Wultsch


I have had to fail off of a master on more than one occasion because of MyISAM corruption. With MyISAM corruption is a fact of life.

At least with MySQL replication between the master and slave there is a firewall to prevent corruption from spreading to the slave.

Morgan Tocker

with DRBD (assuming mode C) you pay a heavy cost to make sure that replication is synchronous, and you don’t loose data. This only works successfully because InnoDB is ACID compliant and makes sure it can recover from every update.

Since MySQL is not ACID compliant, DRBD can not truly be synchronous – there may be some data that has been modified but still in an OS buffer somewhere. Because you can no longer guarantee that the secondary device will have all the data, it doesn’t usually make sense to pay the cost of DRBD.

Tim

Great article. I’m wondering how a requirement for geographically distributed servers might impact the choices given. Certainly, having shared disk becomes an issue, so the Heartbeat/SAN method becomes problematic. Simple MySQL replication should still work, though you’d have the caveat of needing sufficient bandwidth to keep up with the relay log (though as you indicate, the problem in keeping up with the log is rarely replicating it, but instead has to do with processing it on the slave). But how do the other options fare with this requirement?

Robert Hodges

Hi Yves,

Your write-up is pretty broad but it seems to gloss over some key issues.

1.) Maintenance. It’s the main cause of downtime for most sites. Solutions like DRBD or SAN that implement shared disk semantics don’t support it especially well. The reason is that if you only have a single copy of data you can’t do maintenance that requires both to be online.

2.) Uptime. I’m curious what is the basis for your up-time numbers. Uptime is a factor of your database, your admin procedures (e.g., heartbeat w/ VIP for failover vs. manual procedure with application reconfiguration), your ability to avoid configuration errors, etc. Also as @Ryan H pointed out, you can make unreliable things much more reliable by having many of them.

3.) Failures. Understanding types and consequences of failures is important. If NDB somehow does not work, your data are inaccessible with no easy fall-back solution. SANs fail in some strange and scary ways. They are also SPOFs. Approaches that create walled off copies are very robust across a lot of nasty failures. This is one of the big benefits of database replication, whether using MySQL or Tungsten Replicator.

4.) Backups. People tend to forget this is the first line of defense for data and jump straight to more complicated approaches.

I would like to invite you to have a look at Tungsten whenever you have time. We have a pretty broad view of availability and also take into consideration issues like data protection.

Robert

Dba Zed

Hello Ewen,
Is it possible to get the NDB Cluster for Datawarehouse Appliances. In term of performance, is this solution can perform as well as Oracle RAC? What are the limits?
Thank you. Best Regards.

Ewen Fortune

Dba Zed,

NDB cluster does not really make sense for Data warehouse applications. There was a discussion on what is good for cluster on the mailing list, check out the thread that starts here: http://lists.mysql.com/cluster/6960. In terms of performance versus Oracle RAC I think they are very different solutions and really answer different problems. Tom Hanlon’s post in the thread sums up rather well what scales well with NDB.

Baron Schwartz

In response to feedback through other channels, I’ve added a note to the end of the blog post to add some context to the time estimates. These time estimates are for HA projects, not for simple administration tasks. Anyone can follow a WikiHow tutorial and install the tools and technologies mentioned in a relatively short time. Not many people have Yves’s level of expertise in architecting a complete HA solution, which is a whole different ballgame.

Cédric

Hi, I just want to indicate that you can lose data with MySQL Cluster (NDB Cluster) !
See description of the TimeBetweenGlobalCheckpoints parameter…

LinuxJedi

Hi Cédric,

Yes, that is possible if all nodes fail simultaneously, which is not something that is likely to happen with a well configured cluster.

Cédric

Hi LinuxJedi, i agree but what’s a “well configured cluster” ?
In this case I think a “well configured cluster” is a cluster with poor performances !
“Poor performances” means a lot of disk write and so, you lose a part of interest of the cluster…

LinuxJedi

Hi Cédric,

A cluster running on good hardware with a good configuration, both tuned to the workload. We have customers who get very good performance and no downtime (ie. no total cluster failure).

There are a lot of disk writes with LCPs and redo logs but typically LCPs are quite slow writes (default 10MB/sec). With cluster being real-time the redo data (GCP) does need to be put to disk quickly, so fast disks are important for this aspect, especially on a database with a lot of writes. I would rather a lot of disk IO than the possibility of data loss.

Mischelle

I hope you guys don’t mind. How do you deal with many branches and all accessing the same database and then connection between branches (e.g. lease line and DSL) suddenly becomes offline? Real-time transactions are necessary with our operations. Do you think that any of the mentioned HA solutions would fit our current setup? or is there any other way to achieve the ‘Real-time’ requirement?

Thanks guys!

samana zaidi

average formula is what.

Kmilo

I cant’ save this url in delicious 🙁

http://www.mysqlperformanceblog.com/2009/10/16/finding-your-mysql-high-availability-solution-%E2%80%93-the-questions/

Looks like and strange character on: —