<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: How reliable RAID really is</title>
	<atom:link href="http://www.mysqlperformanceblog.com/2006/06/30/how-reliable-raid-really-is/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mysqlperformanceblog.com/2006/06/30/how-reliable-raid-really-is/</link>
	<description>Everything about MySQL Performance</description>
	<pubDate>Fri, 05 Dec 2008 08:56:43 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
		<item>
		<title>By: Ryan</title>
		<link>http://www.mysqlperformanceblog.com/2006/06/30/how-reliable-raid-really-is/#comment-263649</link>
		<dc:creator>Ryan</dc:creator>
		<pubDate>Thu, 03 Apr 2008 01:28:47 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2006/06/30/how-reliable-raid-really-is/#comment-263649</guid>
		<description>I just wanted to comment on a well written article.
I've seen people go on and rant and rave about a "double fault" or "double drive failure" in the past but in all honestly, most of them don't understand exactly what is going on exactly and why the raid array goes to a failed state once the replacement drive starts rebuilding.

I recommend a consistency check every month.
RAID is absolutely NO substitute for a backup. I think of it as a convenience; that is all.
Not that it matters, but I do work for dell as L2 support. 
Backups people! :D</description>
		<content:encoded><![CDATA[<p>I just wanted to comment on a well written article.<br />
I&#8217;ve seen people go on and rant and rave about a &#8220;double fault&#8221; or &#8220;double drive failure&#8221; in the past but in all honestly, most of them don&#8217;t understand exactly what is going on exactly and why the raid array goes to a failed state once the replacement drive starts rebuilding.</p>
<p>I recommend a consistency check every month.<br />
RAID is absolutely NO substitute for a backup. I think of it as a convenience; that is all.<br />
Not that it matters, but I do work for dell as L2 support.<br />
Backups people! <img src='http://www.mysqlperformanceblog.com/wp-includes/images/smilies/icon_biggrin.gif' alt=':D' class='wp-smiley' /></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Chris</title>
		<link>http://www.mysqlperformanceblog.com/2006/06/30/how-reliable-raid-really-is/#comment-141804</link>
		<dc:creator>Chris</dc:creator>
		<pubDate>Mon, 02 Jul 2007 05:55:59 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2006/06/30/how-reliable-raid-really-is/#comment-141804</guid>
		<description>Too bad that it isn't possible to add a drive to a raid 5 array, and then say that the new drive will be replacing another drive, after which it should sync to the new drive while keeping redundancy during rebuild.

That way if a bad block is encountered, the data could be resolved from the redundancy.
Also, before marking a drive with a bad block as failed, it should try to write the data back to the disk with the bad block. 
This should cause the disk to try and remap the bad block.

RAID 6 should be capable of this already, but i'm very doubtful that many controllers handle bad blocks in this way.</description>
		<content:encoded><![CDATA[<p>Too bad that it isn&#8217;t possible to add a drive to a raid 5 array, and then say that the new drive will be replacing another drive, after which it should sync to the new drive while keeping redundancy during rebuild.</p>
<p>That way if a bad block is encountered, the data could be resolved from the redundancy.<br />
Also, before marking a drive with a bad block as failed, it should try to write the data back to the disk with the bad block.<br />
This should cause the disk to try and remap the bad block.</p>
<p>RAID 6 should be capable of this already, but i&#8217;m very doubtful that many controllers handle bad blocks in this way.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2006/06/30/how-reliable-raid-really-is/#comment-771</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Mon, 03 Jul 2006 06:36:55 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2006/06/30/how-reliable-raid-really-is/#comment-771</guid>
		<description>Brice,

Nice to find you linking to this document. I was sending many customers to read it.

I also generally recommend using RAID10 but I avoid saying it is always must.  Each case is unique in practice.

Cost factor would often be the reason but not only that.   In some tests I've done I found RAID5 to perform _faster_ or close to RAID10 on PowerEdge 2850 for example.  You can see benchmark data in "Performance Landscape" presentation from this site.  Yes this does not make much sense but I'm not only the person to observe this behavior. Here are some benchmarks for SQL Server
http://www.developersdex.com/sql/message.asp?p=580&#038;r=4986921

My felling is Dell or LSI just spend more time optimizing RAID5 or there are some serve performance bugs in RAID10 implementation as I do not see any physical reasons why this would be happening. 

Other reason could be - certain hardware might not have proper RAID10,  sometime what is named RAID10 might be implemented as contatenated RAID1 (especially in some older models)  - this would have pretty bad performance in many cases. 

Besides hardware limitations/bugs I can see RAID5 used in replicated envinronment with low volume of writes. It gives certain sequrity for the slaves so you do not have to reclone them whenever any drive fails also this means you can promote slave to the 
master without running master on insecure storage.</description>
		<content:encoded><![CDATA[<p>Brice,</p>
<p>Nice to find you linking to this document. I was sending many customers to read it.</p>
<p>I also generally recommend using RAID10 but I avoid saying it is always must.  Each case is unique in practice.</p>
<p>Cost factor would often be the reason but not only that.   In some tests I&#8217;ve done I found RAID5 to perform _faster_ or close to RAID10 on PowerEdge 2850 for example.  You can see benchmark data in &#8220;Performance Landscape&#8221; presentation from this site.  Yes this does not make much sense but I&#8217;m not only the person to observe this behavior. Here are some benchmarks for SQL Server<br />
<a href="http://www.developersdex.com/sql/message.asp?p=580&#038;r=4986921" rel="nofollow">http://www.developersdex.com/sql/message.asp?p=580&#038;r=4986921</a></p>
<p>My felling is Dell or LSI just spend more time optimizing RAID5 or there are some serve performance bugs in RAID10 implementation as I do not see any physical reasons why this would be happening. </p>
<p>Other reason could be - certain hardware might not have proper RAID10,  sometime what is named RAID10 might be implemented as contatenated RAID1 (especially in some older models)  - this would have pretty bad performance in many cases. </p>
<p>Besides hardware limitations/bugs I can see RAID5 used in replicated envinronment with low volume of writes. It gives certain sequrity for the slaves so you do not have to reclone them whenever any drive fails also this means you can promote slave to the<br />
master without running master on insecure storage.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2006/06/30/how-reliable-raid-really-is/#comment-770</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Mon, 03 Jul 2006 06:25:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2006/06/30/how-reliable-raid-really-is/#comment-770</guid>
		<description>Vince, 

You're right with RAID10 the probability of loosing data is lower, even though it also can fail loosing second hard drive, if they  come from the same stripe.  Meaning you should still want to use hot spare to minimize such window. 

Hot spare would not have helped in this case however - first drive did not fail but was manually replaced so it just started to resync as hot spare would.  It however failed due to bad block on the second drive. 

In general the point is there is data of different level of importance and there are different decicions comming from it. Of course it is good to have data on RAID10 with spare disk,  have reduntant servers and of course backup which you can do point in time recovery from.    It is however not always the case in real systems.</description>
		<content:encoded><![CDATA[<p>Vince, </p>
<p>You&#8217;re right with RAID10 the probability of loosing data is lower, even though it also can fail loosing second hard drive, if they  come from the same stripe.  Meaning you should still want to use hot spare to minimize such window. </p>
<p>Hot spare would not have helped in this case however - first drive did not fail but was manually replaced so it just started to resync as hot spare would.  It however failed due to bad block on the second drive. </p>
<p>In general the point is there is data of different level of importance and there are different decicions comming from it. Of course it is good to have data on RAID10 with spare disk,  have reduntant servers and of course backup which you can do point in time recovery from.    It is however not always the case in real systems.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2006/06/30/how-reliable-raid-really-is/#comment-769</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Mon, 03 Jul 2006 06:18:56 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2006/06/30/how-reliable-raid-really-is/#comment-769</guid>
		<description>Kevin,

Yes of course you should have multiple servers, that is much better for high availability cases especially as RAID is one of the components which fails.   However building "Google Style" system  build from a lot of crapy hardware might not always be best solution. There are a lot of things to consider  - for example Power requirements which is often main cost factor in Colocation envinronments,  maintainance - recovering "inexpensive" box might be pretty expensive, especially if it is installed in remote location, not to mention various wierd problems you might need to be ready for - database corruption due to bad memory/cooling which sometimes could be replicated, so even replication might not save you.

So my choice is normally to have decent boxes for MySQL servers. Pretty commodify ones, no high end but reliable.  Especially for small/medium company sizes when there is no time to implement management infrastructure which would allow to replace broken servers cheap and transparently.</description>
		<content:encoded><![CDATA[<p>Kevin,</p>
<p>Yes of course you should have multiple servers, that is much better for high availability cases especially as RAID is one of the components which fails.   However building &#8220;Google Style&#8221; system  build from a lot of crapy hardware might not always be best solution. There are a lot of things to consider  - for example Power requirements which is often main cost factor in Colocation envinronments,  maintainance - recovering &#8220;inexpensive&#8221; box might be pretty expensive, especially if it is installed in remote location, not to mention various wierd problems you might need to be ready for - database corruption due to bad memory/cooling which sometimes could be replicated, so even replication might not save you.</p>
<p>So my choice is normally to have decent boxes for MySQL servers. Pretty commodify ones, no high end but reliable.  Especially for small/medium company sizes when there is no time to implement management infrastructure which would allow to replace broken servers cheap and transparently.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2006/06/30/how-reliable-raid-really-is/#comment-768</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Mon, 03 Jul 2006 06:11:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2006/06/30/how-reliable-raid-really-is/#comment-768</guid>
		<description>Matthew,

Thanks for you hints. Yes running consistency check on regular basics is surely good idea.  What is interesting in Dell/LSI docs "Patrol Read" is positioned as lower overhead alternative to consistency check.   What I've found out however is -  it does not really catches errors well enough (as in this case)  plus it has some strange performance problems - in certain cases I've seen it slowing down array to probably 20% of its capacity for 20-30min.  Could be bug but Dell just told to disable Patrol read.</description>
		<content:encoded><![CDATA[<p>Matthew,</p>
<p>Thanks for you hints. Yes running consistency check on regular basics is surely good idea.  What is interesting in Dell/LSI docs &#8220;Patrol Read&#8221; is positioned as lower overhead alternative to consistency check.   What I&#8217;ve found out however is -  it does not really catches errors well enough (as in this case)  plus it has some strange performance problems - in certain cases I&#8217;ve seen it slowing down array to probably 20% of its capacity for 20-30min.  Could be bug but Dell just told to disable Patrol read.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brice</title>
		<link>http://www.mysqlperformanceblog.com/2006/06/30/how-reliable-raid-really-is/#comment-741</link>
		<dc:creator>Brice</dc:creator>
		<pubDate>Sat, 01 Jul 2006 11:17:38 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2006/06/30/how-reliable-raid-really-is/#comment-741</guid>
		<description>Join BAARF (http://www.miracleas.com/BAARF/BAARF2.html) :-)

It's an association of knowledgeable sysadmin who won't ever use RAID5 (or 3 or 4) anymore on production system...

RAID5 alone is dangerous, you can mitigate the risk with a hotspare drive, but frankly, RAID10 is really better (for a lots of good reasons: http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt)
Disks are cheaps nowadays...</description>
		<content:encoded><![CDATA[<p>Join BAARF (http://www.miracleas.com/BAARF/BAARF2.html) <img src='http://www.mysqlperformanceblog.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>It&#8217;s an association of knowledgeable sysadmin who won&#8217;t ever use RAID5 (or 3 or 4) anymore on production system&#8230;</p>
<p>RAID5 alone is dangerous, you can mitigate the risk with a hotspare drive, but frankly, RAID10 is really better (for a lots of good reasons: <a href="http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt" rel="nofollow">http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt</a>)<br />
Disks are cheaps nowadays&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Apachez</title>
		<link>http://www.mysqlperformanceblog.com/2006/06/30/how-reliable-raid-really-is/#comment-740</link>
		<dc:creator>Apachez</dc:creator>
		<pubDate>Sat, 01 Jul 2006 07:51:58 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2006/06/30/how-reliable-raid-really-is/#comment-740</guid>
		<description>2. Kevin: Thats basically what google does :P Having the raid on machinelevel instead of harddrivelevel.</description>
		<content:encoded><![CDATA[<p>2. Kevin: Thats basically what google does <img src='http://www.mysqlperformanceblog.com/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' /> Having the raid on machinelevel instead of harddrivelevel.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Vince Hoang</title>
		<link>http://www.mysqlperformanceblog.com/2006/06/30/how-reliable-raid-really-is/#comment-739</link>
		<dc:creator>Vince Hoang</dc:creator>
		<pubDate>Sat, 01 Jul 2006 02:02:56 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2006/06/30/how-reliable-raid-really-is/#comment-739</guid>
		<description>With RAID10, you lose half the physical disk space to mirroring, but it would have survived up to three disk failures, provided none of those failures were on the same submirror.

At the very least, you should consider setting aside one disk as a hotspare to reduce the time window of having a second disk fail while the RAID5 array is degraded.</description>
		<content:encoded><![CDATA[<p>With RAID10, you lose half the physical disk space to mirroring, but it would have survived up to three disk failures, provided none of those failures were on the same submirror.</p>
<p>At the very least, you should consider setting aside one disk as a hotspare to reduce the time window of having a second disk fail while the RAID5 array is degraded.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kevin Burton</title>
		<link>http://www.mysqlperformanceblog.com/2006/06/30/how-reliable-raid-really-is/#comment-735</link>
		<dc:creator>Kevin Burton</dc:creator>
		<pubDate>Fri, 30 Jun 2006 21:32:59 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2006/06/30/how-reliable-raid-really-is/#comment-735</guid>
		<description>IF you can get away with it you can just have a redundant array of inexpensive database servers.

For the price of a RAID card you can buy another cheap server. If you can load balance SELECTs across the boxes and have few writes you can get RAID performance and reliability with numerous cheap MySQL boxes.


Commodity hardware is cheap and modern disks are pretty damn fast if you don't need ONE box to exec all your queries.</description>
		<content:encoded><![CDATA[<p>IF you can get away with it you can just have a redundant array of inexpensive database servers.</p>
<p>For the price of a RAID card you can buy another cheap server. If you can load balance SELECTs across the boxes and have few writes you can get RAID performance and reliability with numerous cheap MySQL boxes.</p>
<p>Commodity hardware is cheap and modern disks are pretty damn fast if you don&#8217;t need ONE box to exec all your queries.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
