November 13, 2009

Finding your MySQL High-Availability solution – Replication

Posted by yves |

In the last 2 blog posts about High Availability for MySQL we have introduced definitions and provided a list of ( questions that you need to ask yourself before choosing a HA solution. In this new post, we will cover what is the most popular HA solution for MySQL, replication.
[read more...]

October 27, 2009

State of the art: Galera – synchronous replication for InnoDB

Posted by Vadim |

First time I heard about Galera on Percona Performance Conference 2009, Seppo Jaakola was presenting “Galera: Multi-Master Synchronous MySQL Replication Clusters”. It was impressed as I personally always wanted it for InnoDB, but we had it in plans at the bottom of the list, as this is very hard to implement properly.
The idea by itself is not new, I remember synchronous replication was announced for SolidDB on MySQL UC 2007, but later the product was killed by IBM.

So long time after PPC 2009 there was available version mysql-galera-0.6, which had serious flow, to setup a new node you had to take down whole cluster. And all this time Codership ( company that develops Galera) was working on 0.7 release that introduces node propagation keeping cluster online. You can play with 0.7pre release by yourself MySQL/Galera Release 0.7pre.
[read more...]

October 16, 2009

Finding your MySQL High-Availability solution – The questions

Posted by yves |

After having reviewed the definition my the previous post (The definitions), the next step is to respond to some questions.

Do you need MySQL High-Availability?

That question is quite obvious but some times, it is skipped. It can also be formulated “What is the downtime cost of the service?”. In the cost, you need to include lost revenue from the service and you also need to consider less direct impact like loss of corporate image and other marketing costs. If your downtime cost is under $10/h, you can stop reading this document, you don’t need HA. For the others, let’s move on!

How to determine which MySQL High-Availability solution is best?

What is really tricky with MySQL is the number of possible HA solutions. From the simplest the most complex let’s list the most common ones:

- MySQL replication with manual failover
- Master-Master with MMM manager
- Heartbeat/SAN
- Heartbeat/DRBD
- NDB Cluster

These technologies are by far, not a one size fits all and many deployments use combination of solutions. I will not cover ScaleDB and Continuent because I know almost nothing of these solutions. There are many more questions you need to ask yourself before being able to pick the right one. Below, I listed the most common questions, I might have missed some.

1. What level of HA do you need?

Since all the technologies do not offer the same level of availability, this is a first important sorting factor. Here are estimates of the level of availability offered by the various solutions.

  Level of availability
Simple replication 98 to 99.9+%
Master-Master with MMM manager 99%
Heartbeat/SAN (depends on SAN) 99.5% to 99.9%
Heartbeat/DRBD 99.9%
NDB Cluster 99.999%

From the table, if your requirements are for 99.99%, you are restricted to NDB Cluster while if it is only 99% you have more options. I recall that the level of availability is hard to estimate and subject to debate. These are the usually accepted level of availability for these technologies.

2. Can you afford to lose data?

Obviously, if you are concerned about loss of data, you are most likely using the InnoDB storage engine, since MyISAM is not transactional and do not sync data to disk. Similarly, MySQL replication is an asynchronous process and although it is fairly fast at transferring data between the master and the slaves, there is a window of time where data loss is possible.

If you can afford to lose some data, you can consider “MySQL replication” and “Master-Master with MMM manager” otherwise, you can only consider the other three solutions.

  Data 100% safe
MySQL replication no
Master-Master with MMM manager no
Heartbeat/SAN (depends on SAN) yes
Heartbeat/DRBD yes
NDB Cluster yes

3. Does your application use MyISAM only features?

There are some features like Full text indexes and GIS indexes that are supported only by MyISAM. The HA solutions that work well with MyISAM are “MySQL replication” and “Master-Master with MMM manager”. Depending on the application, the MyISAM Full text indexes might be replaced by another search engine like Sphinx in order to remove the restriction. There is no HA solution other than the ones based on replication that handles GIS indexes.

  HA solutions
Need MyISAM Full text or GIS indexes “MySQL replication” and “Master-Master with MMM manager”
Don’t use any special MyISAM feature All
Can change MyISAM Full text to Sphinx All

4. What is the write load?

The HA solutions we present are not equal in term of their write capacity. Due to the way replication is implemented, only one thread on the slave can handle the write operations. If the replication master is multi-cores servers and is heavily writing using multiple threads, the slaves will likely not be able to keep up. Replication is not the only technology that put a strain on the write capacity, DRBD, a shared storage emulator for Linux, also reduce by about 30% (very dependent on hardware) the write capacity of a database server. In term of write capacity here are you choices.

  Write capacity
MySQL replication Fair
Master-Master with MMM manager Fair
Heartbeat/SAN (depends on SAN) Excellent
Heartbeat/DRBD Good
NDB Cluster Excellent

5. For what level of growth are you planning?

Since NDB Cluster is an integrated sharding environment, if you are planning for a growth that will need sharding (splitting the database over multiple servers), then you might need to take a serious at that solution. If not, then, apart from the write capacity, all the solutions are approximately equal.

6. How qualified is your staff or support company?

There is a quite direct relationship between the level of availability and the complexity of the solution. In order to reach the promised level of availability, the staff maintaining the HA setup, either internal or external, must have the required level of expertise. The required expertise level is summarized in the table below.

  Expertise level
MySQL replication Typical, average MySQL DBA + some Sysadmin skills
Master-Master with MMM manager Good, average MySQL DBA + good Sysadmin skills
Heartbeat/SAN (depends on SAN) High, Good MySQL DBA + strong Sysadmin skills
Heartbeat/DRBD High, Good MySQL DBA + strong Sysadmin skills
NDB Cluster Very high, Specific NDB knowledge, strom MySQL skills and strong Sysadmin skills

7. How deep are your pocket?

The last aspect that needs to be considered is the budget, complexity is expensive. We will consider two types of setup. The first one is a basic proof of concept of the technology with the hardware tested, the data imported and basic testing and documentation. A proof of concept setup is a good way to get used to a technology and experiment with it in a test environment. The other type of setup we will consider is a full production setup that includes extensive testing, fire drills, full documentation, monitoring, alerting, backups, migration to production and post migration monitoring. Of course, it is the safest way to migrate an HA solution to production. All the times here are estimates based on field experience, the values presented here are fairly typical and contains some buffers for unexpected problems. Although an HA solution can be built remotely through a KVM over IP and adequate remote power management, an on site intervention with physical access to the servers is the preferred way, especially for the most complex solutions.

  Proof of concept Migration to Production
MySQL replication 4 hours 12 hours
Master-Master with MMM manager 8 hours 24 hours
Heartbeat/SAN (depends on SAN) 32 hours 120 hours
Heartbeat/DRBD 40 hours 120 hours
NDB Cluster 40 hours 120 hours+

October 9, 2009

Finding your MySQL High-Availability solution – The definitions

Posted by yves |

As my first contribution to the MySQL Performance Blog, I joined Percona at the beginning September, I chose to cover the various high-availability (HA) options available for MySQL.  I have done dozen of MySQL HA related engagements while working for Sun/MySQL over the last couple of years using Heartbeat, DRBD and NDB cluster and I’ll probably be doing the same at Percona.  I have built my first DRBD based HA solution nearly 10 years ago.

There is quite a lot of confusion surrounding HA solutions for MySQL, I will try to present them objectively, my goal here been not to sell any specific technology but to help people choose the right one for their needs.  This post is first of a series,  I don’t yet know how many I will write in the series.

Before we start, it must be stated that high-availability is not only a matter of technical solutions, good management practices covering monitoring, alerting, security and documentation are also needed to insure a successful solution. In other words, no solution is fool proof, if a high-availability solution is running in recovery mode for months without nobody caring about it, the risk of a complete failure is much higher.

In order to all be on the same page, I will first give some definitions of the key terms.  I don’t pretend those definitions are perfect but let’s build on them.

[read more...]

July 7, 2009

Is DRBD the right choice for me?

Posted by Morgan Tocker |

It seems pretty common to find customers install DRBD for the wrong reasons. There are many pros/cons to compare DRBD to replication, but I’ve managed to cut down my spiel I give to customers to these two points:

  • DRBD’s aim (assuming replication mode C) is to provide 100% consistency, and then as much uptime as possible.
  • MySQL Replication (with a manager such as MMM) aims to have 100% availability, at the potential loss of some data surrounding a failure.

So if you are installing DRBD with the aim of purely “availability”, and are not worried about losing that last write on the crash to your master database that (hopefully) happens only once every few years, you may be using the wrong technology.

While the prized “1 minute failover” is possible in DRBD, it doesn’t really explain the full picture. The typical crash recovery process in DRBD is a lot longer:

  • After resource transfer, a filesystem check runs (0-5 seconds).
  • mysqld is started (1-5 seconds)
  • InnoDB runs through crash recovery (1 minute – several hours). Peter wrote about this here.
  • The server is then ready to accept connections.

Now, having said that: If you have an application that requires 100% consistency, then DRBD is one of your best choices on the mysql-market today.

March 10, 2009

Percona at PHP Quebec 09

Posted by Morgan Tocker |

Percona presented two talks at PHP Quebec last week – one on A Tour of MySQL High Availability, and another on Performance Tuning MySQL. There was a great reaction to showcasing some of the quick-wins that can be found by using the Percona patches. Unfortunately, the one thing that I forgot to mention in the slides is that the patches are Open Source and free to use.

February 1, 2009

Faster MySQL failover with SELECT mirroring

Posted by Baron Schwartz |

One of my favorite MySQL configurations for high availability is master-master replication, which is just like normal master-slave replication except that you can fail over in both directions. Aside from MySQL Cluster, which is more special-purpose, this is probably the best general-purpose way to get fast failover and a bunch of other benefits (non-blocking ALTER TABLE, for example).

The benefit is that you have another server with all the same data, up and running, ready to serve queries. In theory, it’s a truly hot standby (stay with me — that’s not really guaranteed). You don’t get this with shared storage or DRBD, although those provide stronger guarantees against data loss if mysqld crashes. And you can use the standby (passive) master for serving some SELECT queries, taking backups, etc as usual. However, if you do this you actually compromise your high-availability plan a little, because you can mask the lack of capacity that will result when one of the servers is down and you have to rely on just one server to keep everything on its feet.

[read more...]