This post is not exactly about MySQL Performance or about Performance at all, but I guess it should be interested to many MySQL DBAs and other people involved running MySQL In production.

Recently I’ve been involved in troubleshooting Dell Poweredge 2850 system running RAID5 using 6GB internal hard drives, which give about 1.4TB of usable space.

The problem started than one of hard drive was set to “Predicted Failure” state by “Patrol Read” which is automatically done by PERC4 (LSI Logic megaraid) controller. Dell was prompt to ship replacement hard drive and drive was replaced. This should be happy end of the story but in reality troubles only began.

After hard drive is replaced RAID has to be rebuilt but the problem in this case was…. rebuild failed bringing all logical drive down because yet another hard drive got bad block. Replaced hard drive was “failed” because it could not be rebuilt and other one because of read failure. So my first advice would be – Run consistency check before replacing hard drive with predicted failure to minimize chance of double drive failure in RAID. It is good to run consistency checks on regular basics anyway but this final run would not hurt.

The next interesting thing is – there is not too much advice which could be found in Dell documentation about handling RAID with two failed hard drives. The impression is it should never happen (while it does, and not as rarely as one would hope) and if it happened you should just go and get your backup. Restoring over 1TB of data is never fun but in this case there was no backup which made recovery more important.

Interesting enough Logical drive could be brought online and used by forcing newly failed drive online. It probably just had couple of bad blocks – but there were no way to resync logical hard drive in this situation.

What one would like to do in such case is to fore SCSI drive to remap those bad blocks. Couple of files could be corrupted but it is much better than loosing everything. Unfortunately neither RAID BIOS nor RAID tools do not provide you with such feature.

Happily Dell Bios has little option which allows you to disable RAID controller and access your disks as simple SCSI. Changing this option will result in various scarry messages such as “Data loss will occure” but in reality you could change it back and forth, you just should be careful and know hat you’re doing.

In SCSI BIOS there is an option to perform “Verify Media” which can be used to scan hard drive and remap bad blocks. After remapping is done RAID mode can be enabled back again and array could be rebuld just fine. There is of course chance some data is corrupted so checking file system and MySQL database is good idea.

So my story had a happy ending with only minimal (yet to be discovered) data loss but it coul be worse.

There are few things this case reminds about:

Do not assume RAID is Reliable. RAID is more reliable than plain disk, RAID6 is more reliable than RAID5 but all they can fail, even expensive SAN systems. So make sure you have backup plan if it happens if you care about your data. This is of course not to mention software bugs and user errors which are other reasons why you want backups. Do not trust to any single piece of hardware in HA scenarios.

Have backups ready. If you care about your data backups are must whatever other HA methods you use.

Large data sets take time Restoring 1.5TB volume is likely to take hours can you afford it ? Even verifying media on 300GB hard drive took several hours. This could be one more reason to scale out and keep managable size storage on each node. At least multiple smaller RAID volumes could be used so rebuilding any of them takes less time.

There are also couple of ideas:

Dell – why do not they have Verify Media with ability to remap bad blocks in the RAID BIOS itself or RAID tools ? Should not be big deal especially for offline drive.

Backups with Instant recovery – could be interesting to try to integrate DRBD with LVM so snapshot could be taken and synchronized to network as backup. If quick recovery is needed snapshot could be connected via network and operations started, while it is gradually restored in the background back to the local volume. Local Networks are fast these days so it could perform very well.

13 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Matthew

I know your pain. We run twice monthly verifies on our arrays. Of course this slows the array down so we do it on the inactive side of our redundant pairs of machines. And of course in addition to the redundant machines we have backups. We’ve had to use them on occasion too.

If you don’t already know about the megarc tool then you should check it out. It’ll let you run a verify from the command line instead of the dellmgr/megamgr gui.

Kevin Burton

IF you can get away with it you can just have a redundant array of inexpensive database servers.

For the price of a RAID card you can buy another cheap server. If you can load balance SELECTs across the boxes and have few writes you can get RAID performance and reliability with numerous cheap MySQL boxes.

Commodity hardware is cheap and modern disks are pretty damn fast if you don’t need ONE box to exec all your queries.

Vince Hoang

With RAID10, you lose half the physical disk space to mirroring, but it would have survived up to three disk failures, provided none of those failures were on the same submirror.

At the very least, you should consider setting aside one disk as a hotspare to reduce the time window of having a second disk fail while the RAID5 array is degraded.

Apachez

2. Kevin: Thats basically what google does 😛 Having the raid on machinelevel instead of harddrivelevel.

Brice

Join BAARF (http://www.miracleas.com/BAARF/BAARF2.html) 🙂

It’s an association of knowledgeable sysadmin who won’t ever use RAID5 (or 3 or 4) anymore on production system…

RAID5 alone is dangerous, you can mitigate the risk with a hotspare drive, but frankly, RAID10 is really better (for a lots of good reasons: http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt)
Disks are cheaps nowadays…

Chris

Too bad that it isn’t possible to add a drive to a raid 5 array, and then say that the new drive will be replacing another drive, after which it should sync to the new drive while keeping redundancy during rebuild.

That way if a bad block is encountered, the data could be resolved from the redundancy.
Also, before marking a drive with a bad block as failed, it should try to write the data back to the disk with the bad block.
This should cause the disk to try and remap the bad block.

RAID 6 should be capable of this already, but i’m very doubtful that many controllers handle bad blocks in this way.

Ryan

I just wanted to comment on a well written article.
I’ve seen people go on and rant and rave about a “double fault” or “double drive failure” in the past but in all honestly, most of them don’t understand exactly what is going on exactly and why the raid array goes to a failed state once the replacement drive starts rebuilding.

I recommend a consistency check every month.
RAID is absolutely NO substitute for a backup. I think of it as a convenience; that is all.
Not that it matters, but I do work for dell as L2 support.
Backups people! 😀

Paul Meiners

From my understanding a Consistency check only checks the area of raid drives which contain data, omitting the free space. The issue is multiple physical media errors can build up in the unused areas of the raid disks, only found during a rebuild, at which point the raid controller can not handle the number of errors.
This is where patrol reads comes in. Patrol Reads checks the entire drive for media errors, so disabling it is not a good idea. What I do is disable the automatic running, and run a bat file from task scheduler to manual run it off hours, weekly or bi-weekly.