Indexes in MySQL
MySQL does not always make a right decision about indexes usage.
Condsider a simple table:
-
CREATE TABLE `t2` (
-
`ID` int(11) DEFAULT NULL,
-
`ID1` int(11) DEFAULT NULL,
-
`SUBNAME` varchar(32) DEFAULT NULL,
-
KEY `ID1` (`ID1`)
-
) ENGINE=MyISAM DEFAULT CHARSET=latin1
SELECT COUNT(*) FROM t2;
250001 (V1)
SELECT COUNT(*) FROM t2 WHERE ID1=1;
83036 (V2)
(execution time = 110 ms)
That is index selectivity by condition (ID1=1) is V2/V1 = 0.3321 or 33.21%
It is said (e.g. book "SQL Tuning") if selectivity over 20% then a full table scan is preferable than an index access.
As far as I know Oracle alway chooses a full table scan if selectivity over 25%.
What with MySQL:
-
mysql> EXPLAIN SELECT COUNT(SUBNAME) FROM t2 WHERE ID1=1;
-
+----+-------------+-------+------+---------------+------+---------+-------+-------+-------------+
-
| id | select_type | TABLE | type | possible_keys | KEY | key_len | ref | rows | Extra |
-
+----+-------------+-------+------+---------------+------+---------+-------+-------+-------------+
-
| 1 | SIMPLE | t2 | ref | ID1 | ID1 | 5 | const | 81371 | USING WHERE |
-
+----+-------------+-------+------+---------------+------+---------+-------+-------+-------------+
That is MySQL will use index for this query.
Let's compare the execution time with index access and with table scan:
SELECT COUNT(SUBNAME) FROM t2 WHERE ID1=1 - 410 ms
SELECT COUNT(SUBNAME) FROM t2 IGNORE INDEX (ID1) WHERE ID1=1 - 200 ms
As you see the table scan is faster by 2 times.
Consider more extremal case: selectivity ~95%:
SELECT cnt2 / cnt1 FROM (SELECT count(*) cnt1 FROM t2) d1, (SELECT count(*) cnt2 FROM t2 WHERE ID1=1) d2;
0.9492 = 94.92%
Explain still claims MySQL will use index.
Execution time:
SELECT COUNT(SUBNAME) FROM t2 WHERE ID1=1 - 1200 ms
SELECT COUNT(SUBNAME) FROM t2 IGNORE INDEX (ID1) WHERE ID1=1 - 260 ms
That is table scan is faster by 4.6 times.
Why does MySQL choose index access?
MySQL doesn't calculate index selectivity, just estimates count of logical input/output operations, and for
our case count of Logical I/O for index access is less than for table scan.
So be careful with indexes, they help in not all cases.
20 Comments











del.icio.us
digg
Actually, The problem is much more complicated than it looks. A while back I did benchmarks and depending on the situation I could get index being more optimal than full table scan even if 70% of rows would be accessed or Full tables can could be faster than retrieving 1% of rows by index - if they all end up in different locations on the disk. So MySQL is not optimal but 20% hard value would not be better ether.
For wise decision MySQL would need to consider a lot of things including types of IO (seq vs random) cache efficiency, table size relative to memory size etc.
In general much more complex cost model is required which means serious optimizer overhaul. Such changes are serious step as different optimizer will change a lot of execution plans, and some will surely be changed to worse as no optimizer is perfect in all cases. This makes it scary step besides optimizer being very complex peice of sofware.
Comment :: June 2, 2006 @ 3:45 pm
According to the oreilly “oracle sql tuning” pocket guide oracle moves to a table scan if it expects to read more than 12% of the rows. supposedly mysql does so at 30%.
Comment :: June 3, 2006 @ 12:11 am
Well, maybe for Oracle it is 12%, I don’t know for sure, but as Peter said it is not always good to have the hard values, there can be a lot of cases.
Regarding MySQL: MySQL does not use the selectivity calculation, so it’s impossible to say when MySQL prefers table scan.
Comment :: June 4, 2006 @ 3:19 am
One more thing to add - MySQL has to deal with multiple storage engines which complicates things a lot. For example for MEMORY tables there is very small penalty for “random IO” or Innodb tables which have full table scan being scan by primary index.
Comment :: June 4, 2006 @ 11:24 am
MySQL performance debate…
I must admit after SPEC jAppServer2004™ results with MySQL , my interest rose about MySQL being a good candidate for high performance database application. If Google Adsense , Flick’r , Technorati and many others use it for years, can it be…
Trackback :: June 18, 2006 @ 8:16 am
Folks, I am new to sql as was looking for some guidance on the following sql select statement
Select AttributeValue FROM DT_Attribute a WHERE AttributeName = ‘cn’ AND
ObjectID IN SELECT ObjectID FROM DT_Object o WHERE a.VersionID = o.VersionID
AND ObjectType = ‘ncpServer’)
Rows Data Length Index Length
DT_Attribute 3,243,993 280.4 MB 157.8 MB
DT_Object 79,828 291.2 MB 5.9 MB
Running this resulted in 500,000,000 IO reads per hour on a 4-way 3.06 Ghz and 3GB computer.
Is there a way to analyse such a SELECT statement?
Is there an explanation why MySQLD-NT was solidly consuming 25% of the CPU rather than asking for more?
Thanks in advance,
Roy
Comment :: September 28, 2006 @ 4:09 am
>>Is there an explanation why MySQLD-NT was solidly consuming 25% of the CPU rather than asking for more?
Comment :: October 1, 2006 @ 8:27 pm
::Is there an explanation why MySQLD-NT was solidly consuming 25% of the CPU rather than asking for more?::
Sure, it’s because the database is disk bound. The CPU has to wait for the disk I/O to complete. It could mean your disks are slower than they should be. Switching to SCSI drives (if you’re not using them already) may help. Of course if the table was small enough (
Comment :: October 1, 2006 @ 8:34 pm
Mike,
Disk is one possible problem, the other reason (for 25% in particular) would be CPU bound workload using totally one CPU out of 4. Depending on how you set up graphing you may see combined CPU usage or per CPU.
Comment :: October 3, 2006 @ 3:16 am
Possible wrong strategy decision if index strategy is a partial quantity of key strategy, too!?
Comment :: November 29, 2006 @ 1:20 pm
Labus,
I do not understand what do you mean ?
Comment :: November 29, 2006 @ 3:59 pm
dfgd
Comment :: May 23, 2007 @ 2:03 am
why dont understand
Comment :: May 23, 2007 @ 2:04 am
b g g
Comment :: May 23, 2007 @ 2:04 am
What is index….pls reply urgent
Comment :: July 16, 2007 @ 6:21 am
The Need for Rackmount Computer Cases
When businesses need to group one type of server in with similar kinds, rack mount computer cases are brought in. Holding as many as 40 single servers, they feature common temperature-monitoring systems, linked drive bays and up to a maximum of 10 air-…
Trackback :: February 18, 2008 @ 10:18 pm
What Is Index ?
Comment :: June 26, 2008 @ 2:21 am
You know ! the finger just before the middle one…
Comment :: July 4, 2008 @ 12:54 am
Hello,
Can you please help me to improve the MYSql Query Performance. Actually we have the data which is using near about 2,00,000 data . so please give me tips to improve this sql performance.
Thanks in Advance
Vijay
Comment :: August 15, 2008 @ 3:00 am
[...] L’article qui m’a incité à effectuer mes propres tests. [...]
Pingback :: September 4, 2008 @ 3:37 pm