June 30, 2006

How reliable RAID really is

Posted by peter

This post is not exactly about MySQL Performance or about Performance at all, but I guess it should be interested to many MySQL DBAs and other people involved running MySQL In production.

Recently I’ve been involved in troubleshooting Dell Poweredge 2850 system running RAID5 using 6GB internal hard drives, which give about 1.4TB of usable space.

The problem started than one of hard drive was set to “Predicted Failure” state by “Patrol Read” which is automatically done by PERC4 (LSI Logic megaraid) controller. Dell was prompt to ship replacement hard drive and drive was replaced. This should be happy end of the story but in reality troubles only began.

After hard drive is replaced RAID has to be rebuilt but the problem in this case was…. rebuild failed bringing all logical drive down because yet another hard drive got bad block. Replaced hard drive was “failed” because it could not be rebuilt and other one because of read failure. So my first advice would be - Run consistency check before replacing hard drive with predicted failure to minimize chance of double drive failure in RAID. It is good to run consistency checks on regular basics anyway but this final run would not hurt.

The next interesting thing is - there is not too much advice which could be found in Dell documentation about handling RAID with two failed hard drives. The impression is it should never happen (while it does, and not as rarely as one would hope) and if it happened you should just go and get your backup. Restoring over 1TB of data is never fun but in this case there was no backup which made recovery more important.

Interesting enough Logical drive could be brought online and used by forcing newly failed drive online. It probably just had couple of bad blocks - but there were no way to resync logical hard drive in this situation.

What one would like to do in such case is to fore SCSI drive to remap those bad blocks. Couple of files could be corrupted but it is much better than loosing everything. Unfortunately neither RAID BIOS nor RAID tools do not provide you with such feature.

Happily Dell Bios has little option which allows you to disable RAID controller and access your disks as simple SCSI. Changing this option will result in various scarry messages such as “Data loss will occure” but in reality you could change it back and forth, you just should be careful and know hat you’re doing.

In SCSI BIOS there is an option to perform “Verify Media” which can be used to scan hard drive and remap bad blocks. After remapping is done RAID mode can be enabled back again and array could be rebuld just fine. There is of course chance some data is corrupted so checking file system and MySQL database is good idea.

So my story had a happy ending with only minimal (yet to be discovered) data loss but it coul be worse.

There are few things this case reminds about:

Do not assume RAID is Reliable. RAID is more reliable than plain disk, RAID6 is more reliable than RAID5 but all they can fail, even expensive SAN systems. So make sure you have backup plan if it happens if you care about your data. This is of course not to mention software bugs and user errors which are other reasons why you want backups. Do not trust to any single piece of hardware in HA scenarios.

Have backups ready. If you care about your data backups are must whatever other HA methods you use.

Large data sets take time Restoring 1.5TB volume is likely to take hours can you afford it ? Even verifying media on 300GB hard drive took several hours. This could be one more reason to scale out and keep managable size storage on each node. At least multiple smaller RAID volumes could be used so rebuilding any of them takes less time.

There are also couple of ideas:

Dell - why do not they have Verify Media with ability to remap bad blocks in the RAID BIOS itself or RAID tools ? Should not be big deal especially for offline drive.

Backups with Instant recovery - could be interesting to try to integrate DRBD with LVM so snapshot could be taken and synchronized to network as backup. If quick recovery is needed snapshot could be connected via network and operations started, while it is gradually restored in the background back to the local volume. Local Networks are fast these days so it could perform very well.

June 29, 2006

What MySQL buffer cache hit rate should you target

Posted by peter

“What cache hit rate is good for optimal MySQL Performance” is typical question I’m asked. It could by MyISAM key_buffer or Innodb innodb_buffer_pool it does not really matter. In both cases trying to come up with constant “good” hit rate is looking for trouble. Well of course you can name 100% hit rate and all data fits in memory as perfect case but that is not the answer people are looking for.

First thing which needs to be understand is - cache hit ratio can be computed differently for different engines. For example for Innodb page hits are counted for all subsequent page accesses if page is scanned not just once. This means if you would have completely IO bound full table scan and you have 100 rows per page you will get 1 miss per page and 99 hits, making your hit ratio 99% for what would be 0% in many peoples mind.

But what is even more important is hit ratio is not directly relevant for performance. Take the following example - you have some script doing small (in memory) data processing on your server doing some traversing some 1.000.000 rows per second. And at the same time you have some performance critical transaction which frequently needs IO. Well in this case global key hit ratio may be 99.99% but you would still have problems with key buffer efficiency. In the real world even single operation may be IO bould while showing very large hit rate so even taking data for single thread is not enough.

So what should you look at ? The best answer would be to look at response time contribution - how much buffer misses (physical reads) contributed to response time ? This would take into account other things such as - how busy were drives with concurrent requests, how random were these reads/writes, how good was OS or SAN cache hit rate - something you would not see otherwise. The bad luck is - MySQL currently does not provide this information (it exists in plans).

The next best thing would be to look at the number of misses - number of IOs which MySQL needs to do. You better to look at global and local (per thread) numbers and compare them to ones from iostat and to what you system can possibly do. For example if single query is issuing 100+ random IO requests which are not cached by OS or SAN this is likely to be IO bound problem. Yes this still includes a lot of quessing but this is what we have to do until proper instrumentation is done.

In partuclar you should be looking at Key_reads and Key_writes for MyISAM tables (note this does not include row data access, so you’re a bit in trouble) or “OS file reads“, “OS file writes” from SHOW INNODB STATUS for Innodb tables. In MySQL 5.0+ you can also use Innodb_data_reads, and Innodb_data_writes counters from SHOW STATUS

So again if you have single 7200 SATA hard drive in your system which can do about 100 random IOs/sec and you see Innodb_data_reads incremented by 10 per second (mysqladmin extended -i10 -r is your friend in computing this) while there is no other IO - no matter what your hit ratio is it is unlikely the problem for MySQL Performance. If it is however 120 per second it well could be. It is still worth to check “iostat” if all of them make it to the disk. In some configurations OS cache may take all of them.

In some OS - ie Solaris you can use dtrace to do external instrumentation. You should be able to see how many reads and writes are done and where do they come from (log writes, data/index reads etc) but you would probably have trouble mapping them to particular objects and queries.

June 26, 2006

Handling big result sets

Posted by Vadim

Sometime it is needed to handle a lot of rows on client side. Usual way is send query via mysql_query and than handle the result in loop mysql_fetch_array (here I use PHP functions but they are common or similar for all APIs, including C).
Consider table:

SQL:
  1. CREATE TABLE `longf` (
  2.   `f1` int(11) NOT NULL AUTO_INCREMENT,
  3.   `f2` date DEFAULT NULL,
  4.   `f3` date DEFAULT NULL,
  5.   `f4` varchar(14) DEFAULT NULL,
  6.   `f5` varchar(6) DEFAULT NULL,
  7.   `f6` date DEFAULT NULL,
  8.   `f7` smallint(6) DEFAULT NULL,
  9.   `f8` smallint(6) DEFAULT NULL,
  10.   `f9` varchar(13) DEFAULT NULL,
  11.   `f10` varchar(39) DEFAULT NULL,
  12.   `f11` int(11) DEFAULT NULL,
  13.   `f12` float DEFAULT NULL,
  14.   `f13` int(11) DEFAULT NULL,
  15.   `f14` smallint(6) DEFAULT NULL,
  16.   `f15` varchar(39) DEFAULT NULL,
  17.   `f16` date DEFAULT NULL,
  18.   `f17` smallint(6) DEFAULT NULL,
  19.   `f18` int(11) NOT NULL,
  20.   `f19` date DEFAULT NULL,
  21.   `f20` date DEFAULT NULL,
  22.   `f21` varchar(14) DEFAULT NULL,
  23.   `f22` varchar(6) DEFAULT NULL,
  24.   `f23` date DEFAULT NULL,
  25.   `f24` smallint(6) DEFAULT NULL,
  26.   `f25` smallint(6) DEFAULT NULL,
  27.   `f26` varchar(13) DEFAULT NULL,
  28.   `f27` varchar(39) DEFAULT NULL,
  29.   `f28` int(11) DEFAULT NULL,
  30.   `f29` float DEFAULT NULL,
  31.   `f30` int(11) DEFAULT NULL,
  32.   `f31` smallint(6) DEFAULT NULL,
  33.   `f32` varchar(39) DEFAULT NULL,
  34.   `f33` date DEFAULT NULL,
  35.   `f34` smallint(6) DEFAULT NULL,
  36.   `f35` int(11) NOT NULL,
  37.   PRIMARY KEY  (`f1`)
  38. );
  39.  
  40. SELECT count(*) FROM longf;
  41. +----------+
  42. | count(*) |
  43. +----------+
  44. 5242880 |
  45. +----------+

[read more...]

Full text search for all MySQL Storage Engines

Posted by peter

As we know build in full text search is currently limited only to MyISAM search engine as well as has few other limits.

Today Sphinx Search plugin for MySQL was released which now provides fast and easy to use full text search solution for all storage engines. This version also adds a lot of other new features, including boolean search and distributed searching.

A while ago I already wrote about Sphinx Search Egine, comparing it to built in FullText search and Mnogosearch. I guess I should soon repeat tests, adding Lucene to the list for complete picture.

And if you do not feel like patching MySQL or use MySQL 5.1 beta to use sphinx as MySQL Storage Engine you can still use it old fashion way as separate server.

June 23, 2006

MySQL Sizing questions

Posted by peter

I frequently get questions along the lines of "how many transactions per second MySQL can do" or "how many servers I need to handle 100.000 users" or "which hardware would be enough to handle my 40GB" database.

There are two things which are very interesting about these questions which make them funny and annoying at the same time

1) People asking these questions demand the answer. It could be internal Sales and Marketing people or even customers saying they typically get the answers from other vendors.

2) The best answer to this question will not be a number but something like "it depends". Seriously you need much more information than given in this example to give estimate which is correct at least to the order of magnitude.

[read more...]

June 17, 2006

Using MyISAM in production

Posted by peter

There were recently number of posts about MyISAM, for example Arjen wrote pretty nice article about MyISAM features so I thought I would share my own view on using MyISAM in production.

For me it is not only about table locks. Table locks is only one of MyISAM limitations you need to consider using it in production.
Especially if you're comming from "traditional" databases you're likely to be shocked by MyISAM behavior (and default MySQL behavior due to this) - it will be corrupted by unproper shutdown, it will fail with partial statement execution if certain errors are discovered etc.

You should not think about this however as about bugs, as many MySQL features it is designed for particular load cases when it shines and it might not be good choice for others.

In 1999 for my production application (storing billions of rows in tens of thousands tables) Innodb was better choice mainly because of thouse other behaviors ... well table locks was the problem at very early stage but it was solved by using this significant number of tables.

I still have the same view on Storage Engines - Innodb is my oppinion is better choise for general purpose storage engine - it better matches what you would expect from database server and saves you from a lot of gotchas - this might be more important than performance for small application. As load increases you might convert certain tables to MyISAM and other storage engines for performance reasons.... of course keeping all limits in mind.

So here is my list of items you need to keep into account while using MyISAM tables.

[read more...]

June 15, 2006

FreeBSD tests

Posted by Vadim

I'm continuing my experiments with different OS and today I tested FreeBSD 6.0 on my box.
(more details about box and benchmark see here http://www.mysqlperformanceblog.com/2006/06/13/quick-look-at-ubuntu-606/).
Initially I was very pessimistic about FreeBSD, as results were (in transactions/sec, more is better.
for comparison the results from Suse 10.0):

InnoDB
threads FreeBSD 6 Suse 10.0 Suse/ FreeBSD ratio
1 436.97 536.91 1.23
4 322.08 816.27 2.53
16 519.94 639.05 1.23
64 crash 547.07
256 357.09
MyISAM
threads FreeBSD 6 Suse 10.0 Suse/ FreeBSD ratio
1 335.56 429.89 1.28
4 165.16 863.23 5.23
16 322.66 537.67 1.67
64 crash 516.00
256 346.65

The crash with many threads in FreeBSD is known problem and is not MySQL fault. More info is available in FreeBSD bug report

I'm not big expert in FreeBSD and did not saw http://wikitest.freebsd.org/MySQL before. This page recommends to use libthr instead of libthreads.
The results with libthr looks better:

InnoDB
threads FreeBSD 6 Suse 10.0 Suse/ FreeBSD ratio
1 483.22 536.91 1.11
4 852.21 816.27 0.96
16 748.89 639.05 0.85
64 644.45 547.07 0.85
256 273.99 357.09 1.30
MyISAM
threads FreeBSD 6 Suse 10.0 Suse/ FreeBSD ratio
1 344.72 429.89 1.25
4 531.6 863.23 1.62
16 494.19 537.67 1.09
64 451.72 516.00 1.14
256 215.84 346.65 1.61

Interesting thing with 4-64 threads FreeBSD is better than Suse in InnoDB benchmark. I think it is related to InnoDB's implementation of syncronious primitives. For MyISAM Suse is stable better.

Configuration params:
Box: Dual Core Athlon 3800+, 1Gb of RAM, Motherboard ASUS A8N-E

MySQL 5.0.22

params for InnoDB:
--innodb-buffer-pool-size=500M --max-connections=500

params for MyISAM:
--key-buffer-size=500M --max-connections=500 --skip-innodb

Suse 10.0:
kernel-smp-2.6.13-15.x86_64
NTPL

Schedulers comparsion.
By request I made tests with 4BSD scheduler:

InnoDB
threads FreeBSD 6 ULE FreeBSD 6 4BSD 4BSD / ULE
1 483.22 438.1 0.91
4 852.21 819.14 0.96
16 748.89 712.77 0.95
64 644.45 639.2 0.99
256 273.99 330.11 1.20
MyISAM
threads FreeBSD 6 ULE FreeBSD 6 4BSD 4BSD / ULE
1 344.72 324.9 0.94
4 531.6 518.96 0.98
16 494.19 476.57 0.96
64 451.72 444.77 0.98
256 215.84 258.42 1.20

Interesting with 256 threads BSD scheduler looks better.

June 13, 2006

MyISAM concurrent insert

Posted by Vadim

Arjen posted a good note about MyISAM concurrent-insert features, though I should mention concurrent-insert can be cause of
scalablity and peformance problems on SMP boxes, especially on queries processing range-queries. The reason of problems is POSIX read-write locks, which are used to protect key_buffer from concurrent changes and called for each processed row. More info you can get from my UC2006 talk, in short on main platforms (Linux, Solaris, Windows) rw-locks have a bad implementation and too many calls cause waste of CPU in user-space (Solaris) or kernel-space (Linux).

Some results for MyISAM table, query SELECT id FROM sbtest WHERE id BETWEEN N AND N+20000, id - primary key.
Boxes: Sun V40z, Solaris 10, 4 x Dual Core Opteron @ 2.2GHz (8 logical cpu), 16GB of RAM, StorEdge 3310
and Quadxeon, RedHat AS 3, 2.4.21-15.Elsmp, 4 x Intel(R) XEON(TM) MP CPU 2.00GHz, 4GB of RAM, SATA RAID 10
The results in queries per sec (more is better)

threads Quadxeon
with enabled concurrent-insert
Quadxeon
--skip-concurrent-insert
Sun V40z
with enabled concurrent-insert
Sun V40z
--skip-concurrent-insert
1 44.08 64.82 61.06 129.13
2 32.63 123.33 52.63 244.03
4 24.95 176.62 20.03 463.62
8 19.92 206.81 12.34 483.47
16 19.73 208.66 12.3 428.35
32 19.77 212.83 12.25 445.66

So if you are using range-queries and doing INSERT not often than disabling concurrent-insert with --skip-concurrent-insert can improve MyISAM performance.

As I said the reason of bad scalability is rw-locks and currently MySQL developers are working on CPU-depended rw-locks implementstation, this will be available in MySQL 5.1 - 5.2

Quick look at Ubuntu 6.06

Posted by Vadim

There are a lot of talks around new coming Ubuntu 6.06, so I decided to make quick benchmarks.
I used sysbench 0.4.6 oltp-read-only workload with 1000000 rows against InnoDB and MyISAM tables.
Such workload is CPU-bound and allows to compare CPU / OS if we are using the same version of MySQL.
So I used MySQL 5.0.22 and my box Dual Core Athlon 3800+, 1Gb of RAM.
For comparison I tested the same workload on my primary Suse 10.0.

Here are results (in transactions per sec, more is better):

InnoDB
threads Ubuntu 6.06 Suse 10.0 Suse/ Ubuntu ratio
1 478.66 536.91 1.12
4 870.50 816.27 0.94
16 582.57 639.05 1.10
64 522.04 547.07 1.05
256 344.30 357.09 1.04
MyISAM
threads Ubuntu 6.06 Suse 10.0 Suse / Ubuntu ration
1 466.13 429.89 0.92
4 765.19 863.23 1.13
16 509.32 537.67 1.06
64 495.97 516.00 1.04
256 339.01 346.65 1.02

In general Suse 10.0 looks better, though the difference is not significant.

June 9, 2006

Why MySQL could be slow with large tables ?

Posted by peter

If you've been reading enough database related forums, mailing lists or blogs you probably heard complains about MySQL being unable to handle more than 1.000.000 (or select any other number) rows by some of the users. On other hand it is well known with customers like Google, Yahoo, LiveJournal,Technocarati MySQL has installations with many billions of rows and delivers great performance. What could be the reason ?

The reason is normally table design and understanding inner works of MySQL. If you design your data wisely considering what MySQL can do and what it can't you will get great perfomance if not, you might become upset and become one of thouse bloggers. Note - any database management system is different in some respect and what works well for Oracle,MS SQL, PostgreSQL may not work well for MySQL and other way around. Even storage engines have very important differences which can affect performance dramatically.

The three main issues you should be concerned if you're dealing with very large data sets are Buffers, Indexes and Joins.

[read more...]


This page was found by: estimate ratio concu...