Implementing MySQL database in 24/7 environments we typically hope for uniform component performance, or at least would like to be able to control it. Typically this is indeed the case, for example CPU will perform with same performance day and night (unless system management software decides to lower CPU frequency due to overheating).

This is also almost the case with Hard Drives – there are possible performance differences based on where data is stored on the disk, amount of remapped sectors etc. There is also database and file system fragmentation however these also tend to accumulate in predictable fashion.

If you have RAID controller this well may not be the case – to protect your data RAID controller may implement bunch of algorithms which can affect performance dramatically.

For example speaking about PERC5 (LSI MegaRaid) – Pretty typical controller from Dell installed on PowerEdge 1950, 2950 etc you should be aware of couple of things

Battery Learning and Charging Build in Battery has to pass through learning cycle every 3 months or so and this cycle takes about 7 hours according to the docs. During this time battery backed up cache will be disabled and system will operate with write through cache possibly slowing down write performance several times.

Patrol Read This is feature which should discover bad sectors before it is too late and it is doing so by doing disk read checks periodically. When it wakes up it will some IO resources (30% by default) which will affect your performance some way.

Consistency Checks This is another thing which I’ve seen initiated by controller (though I’m not sure on this one) – which pretty much checks the disks are in sync – this also can slow down performance dramatically.

So what you can do about these ?

First you should not have any of this to come as surprise for you when you discovered your server just stopped performance during the time you planned investor show case or other important event. Learn what cron jobs does your RAID card have and see how they can be controlled – may be schedule them during least busy intervals or something similar.

Also you should be ready for degraded and rebuild RAID mode – when one of the disk fails and you replace it with another one which needs to be rebuilt. This means you already should leave some slack of the system. It often would be enough for consistency check and patrol read but not for battery backed up cache being temporary disabled.

Another thing you can do is of course switch to another server and take this down for maintenance if this learning process can’t be scheduled when it is non intrusive. To do this properly however you need to know when it is about to happen.

7 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Xaprb

Peter, how do you manage and monitor your PERC controller, control Patrol Read, etc? Do you use Dell’s MegaPR utility for this? (http://support.dell.com/support/downloads/download.aspx?fileid=129694) It look like there are a lot of complaints about it crashing.

Brice Figureau

Baron,

We are also using OMSA for monitoring and management, because of it’s snmp itegration, but I used in the past the following two open-source software (for monitoring only):
* megactl:
http://sourceforge.net/projects/megactl

* safte-monitor (which checks the SAF-TE compliant disk enclosure of dell servers and not the raid card direclty):
http://oss.metaparadigm.com/safte-monitor/

Hope this helps,
Brice

Xaprb

Thanks. Looks like the LSI utility also has SNMP stuff. OMSA looks like a real pain to get working on non-RPM-based, non-Debian systems. (I am working with some Gentoo servers right now). The LSI utility is working OK. Thanks to both of you for the help.

Bill

Thanks for the run through, I didn’t realize the bbu ran through a discharge recharge phase. But form the doc you linked the BBU is only disabled during the discharge phase. Its enabled once the charging starts, so the performance should only be degraded for 3.5 hours or so. That should make it a little easier to schedule during non peak times.

Sean Kelly

Hey folks,

Once I got MegaCli installed (on cent) that gives enough info to go on to disable battery relearn.

This guy wrote it up and his instructions are explicit and effective.

http://yo61.com/dell-drac-bbu-auto-learn-tests-kill-disk-performance.html

-sean