Indexes in MySQL

MySQL does not always make a right decision about indexes usage.
Condsider a simple table:

CREATE TABLE `t2` (
`ID` int(11) default NULL,
`ID1` int(11) default NULL,
`SUBNAME` varchar(32) default NULL,
KEY `ID1` (`ID1`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

CREATE TABLE `t2` (

`ID` int(11) default NULL,

`ID1` int(11) default NULL,

`SUBNAME` varchar(32) default NULL,

KEY `ID1` (`ID1`)

) ENGINE=MyISAM DEFAULT CHARSET=latin1

SELECT COUNT(*) FROM t2

1	SELECT COUNT(*) FROM t2

;
250001 (V1)

SELECT COUNT(*) FROM t2 WHERE ID1=1

1	SELECT COUNT(*) FROM t2 WHERE ID1=1

;
83036 (V2)
(execution time = 110 ms)

That is index selectivity by condition (ID1=1) is V2/V1 = 0.3321 or 33.21%

It is said (e.g. book “SQL Tuning”) if selectivity over 20% then a full table scan is preferable than an index access.
As far as I know Oracle alway chooses a full table scan if selectivity over 25%.

What with MySQL:

mysql> EXPLAIN SELECT COUNT(SUBNAME) FROM t2 WHERE ID1=1;
+----+-------------+-------+------+---------------+------+---------+-------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+-------+-------+-------------+
| 1 | SIMPLE | t2 | ref | ID1 | ID1 | 5 | const | 81371 | Using where |
+----+-------------+-------+------+---------------+------+---------+-------+-------+-------------+

mysql> EXPLAIN SELECT COUNT(SUBNAME) FROM t2 WHERE ID1=1;

+----+-------------+-------+------+---------------+------+---------+-------+-------+-------------+

+----+-------------+-------+------+---------------+------+---------+-------+-------+-------------+

| 1 | SIMPLE | t2 | ref | ID1 | ID1 | 5 | const | 81371 | Using where |

+----+-------------+-------+------+---------------+------+---------+-------+-------+-------------+

That is MySQL will use index for this query.

Let’s compare the execution time with index access and with table scan:

SELECT COUNT(SUBNAME) FROM t2 WHERE ID1=1

1	SELECT COUNT(SUBNAME) FROM t2 WHERE ID1=1

– 410 ms

SELECT COUNT(SUBNAME) FROM t2 IGNORE INDEX (ID1) WHERE ID1=1

1	SELECT COUNT(SUBNAME) FROM t2 IGNORE INDEX (ID1) WHERE ID1=1

– 200 ms

As you see the table scan is faster by 2 times.

Consider more extremal case: selectivity ~95%:

SELECT cnt2 / cnt1 FROM (SELECT count(*) cnt1 FROM t2) d1, (SELECT count(*) cnt2 FROM t2 WHERE ID1=1) d2;

1	SELECT cnt2 / cnt1 FROM (SELECT count() cnt1 FROM t2) d1, (SELECT count() cnt2 FROM t2 WHERE ID1=1) d2;

0.9492 = 94.92%

Explain still claims MySQL will use index.

Execution time:

SELECT COUNT(SUBNAME) FROM t2 WHERE ID1=1

1	SELECT COUNT(SUBNAME) FROM t2 WHERE ID1=1

– 1200 ms

SELECT COUNT(SUBNAME) FROM t2 IGNORE INDEX (ID1) WHERE ID1=1

1	SELECT COUNT(SUBNAME) FROM t2 IGNORE INDEX (ID1) WHERE ID1=1

– 260 ms

That is table scan is faster by 4.6 times.

Why does MySQL choose index access?
MySQL doesn’t calculate index selectivity, just estimates count of logical input/output operations, and for
our case count of Logical I/O for index access is less than for table scan.

So be careful with indexes, they help in not all cases.

27 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Peter Zaitsev

Admin

17 years ago

Actually, The problem is much more complicated than it looks. A while back I did benchmarks and depending on the situation I could get index being more optimal than full table scan even if 70% of rows would be accessed or Full tables can could be faster than retrieving 1% of rows by index – if they all end up in different locations on the disk. So MySQL is not optimal but 20% hard value would not be better ether.

For wise decision MySQL would need to consider a lot of things including types of IO (seq vs random) cache efficiency, table size relative to memory size etc.

In general much more complex cost model is required which means serious optimizer overhaul. Such changes are serious step as different optimizer will change a lot of execution plans, and some will surely be changed to worse as no optimizer is perfect in all cases. This makes it scary step besides optimizer being very complex peice of sofware.

Lukas

17 years ago

According to the oreilly “oracle sql tuning” pocket guide oracle moves to a table scan if it expects to read more than 12% of the rows. supposedly mysql does so at 30%.

Vadim Tkachenko

Author

17 years ago

Well, maybe for Oracle it is 12%, I don’t know for sure, but as Peter said it is not always good to have the hard values, there can be a lot of cases.
Regarding MySQL: MySQL does not use the selectivity calculation, so it’s impossible to say when MySQL prefers table scan.

Peter Zaitsev

Admin

17 years ago

One more thing to add – MySQL has to deal with multiple storage engines which complicates things a lot. For example for MEMORY tables there is very small penalty for “random IO” or Innodb tables which have full table scan being scan by primary index.

Roy

17 years ago

Folks, I am new to sql as was looking for some guidance on the following sql select statement

Select AttributeValue FROM DT_Attribute a WHERE AttributeName = ‘cn’ AND
ObjectID IN SELECT ObjectID FROM DT_Object o WHERE a.VersionID = o.VersionID
AND ObjectType = ‘ncpServer’)

Rows Data Length Index Length
DT_Attribute 3,243,993 280.4 MB 157.8 MB

DT_Object 79,828 291.2 MB 5.9 MB

Running this resulted in 500,000,000 IO reads per hour on a 4-way 3.06 Ghz and 3GB computer.

Is there a way to analyse such a SELECT statement?
Is there an explanation why MySQLD-NT was solidly consuming 25% of the CPU rather than asking for more?

Thanks in advance,

Roy

Mike

17 years ago

>>Is there an explanation why MySQLD-NT was solidly consuming 25% of the CPU rather than asking for more?

Mike

17 years ago

::Is there an explanation why MySQLD-NT was solidly consuming 25% of the CPU rather than asking for more?::

Sure, it’s because the database is disk bound. The CPU has to wait for the disk I/O to complete. It could mean your disks are slower than they should be. Switching to SCSI drives (if you’re not using them already) may help. Of course if the table was small enough (

Peter Zaitsev

Admin

17 years ago

Mike,

Disk is one possible problem, the other reason (for 25% in particular) would be CPU bound workload using totally one CPU out of 4. Depending on how you set up graphing you may see combined CPU usage or per CPU.

Labus

17 years ago

Possible wrong strategy decision if index strategy is a partial quantity of key strategy, too!?

Peter Zaitsev

Admin

17 years ago

Labus,

I do not understand what do you mean ?

16 years ago

dfgd

16 years ago

why dont understand

16 years ago

b g g

Dinesh

16 years ago

What is index….pls reply urgent

Kishore

15 years ago

What Is Index ?

Vincent

15 years ago

You know ! the finger just before the middle one…

Vijay

15 years ago

Hello,
Can you please help me to improve the MYSql Query Performance. Actually we have the data which is using near about 2,00,000 data . so please give me tips to improve this sql performance.

Thanks in Advance
Vijay

Santiago

14 years ago

First, i’ve to say Great Post!

I’ve a Real Estate web app, where some houses can be marked as “Distinguished”, something like “Featured”. So those property is shown in a special way.

I’d like to look up in the database for all the properties with a given Feature. Supose there’s 3 kinds of feature (eg. Special, Great, Good), and every property has its own. If i’ve 500 properties, my selectivity is about 0,006, then a index wouldn’t be a good choice. But i still want to speed up my search, what can i do? I’ve been thinking to have 3 in-memory arrays containing the ids of the properties. One array for each feature. So, for example, i’d have the array of Special props, and would be like this:
SpecialsProps = [1,15,52,355,61,123,561].

Then if i need to search for all the special props, i would perform a “SELECT … WHERE id IN SpecialProps”, and then, the ID Primary Key, Unique Index would be use. But, in this case, doing so, i’d force to look several times for the index and the ids, and wouldn’t be faster than making a full scan (at least, that’s what i think). Another good strategy is, having all the properties cached, i could reference them directly.

So, to finalize this comment, a simple question. Does MySQL have any index like the BitMap from Oracle?

Thank you very much!

Vadim

14 years ago

Santiago,

In simple answer – no, MySQL does not have bitmap indexes.

Santiago

14 years ago

Ok, thanks for your quick answer Vadim. I’m taking a look at your book, it’s seems awesome. I’ll try to get it from Amazon, it’s a little dificult because i’m in Argentina, but i’ll give my best.

So, do you think i could use one of the strategies i mentioned above?

Vadim

14 years ago

Santiago,

To decide about strategy – you just benchmark each if them and decide what work better for you.

pradeep jangid

13 years ago

hi all,

i select data from table and show on one page but page load take more time bcz table have 4 lac records

so i what do for quickly fetch data from table plz tell me any solution for it immediately any one

regards,

pradeep kumar jangir

kumars

17 years ago

I Just have numbers and marks of the 50,000 students but sql query took long to respond why

balaji

17 years ago

Dear Pradeep,

Try to select restricted data from the table.
Because you can not show all 4 lacs of records at a time on the page,
possibly use LIMIT or do pagination to your page.
Also while selecting the data avoid using LIKE operator, search by = operator instead.

Thanks,
Balaji

Max Ryans

11 years ago

Thanks for sharing this information, Keep up your good work.

Eric Bianchetti

11 years ago

Hello,

sorry in advance if I am wrong, but I was under the impression the SELECT COUNT(*) FROM mytable; was a very specific case when the table is MYSAM.

Mysql will ALWAYS get the result straight from the statistic, instead of counting. Indeed , it will always be VERY fast (Mysql doc, cited from memory).

So while you presentation may be acceptable in some cases, I would tend to think you missed the very point of count(*) for Myisam.

Best regards

Eric

Aris Setyawan

9 years ago

You can use fastbit bitmap index, from mysql, using fastbit UDF. Check this link https://github.com/greenlion/FastBit_UDF

Some excerpt:

About these UDF Functions and FastBit

FastBit is a data store which implements WAH (word aligned hybrid) bitmap indexes. These UDF create, modify and query FastBit tables. The UDF treats a single directory on the filesystem as one FastBit table. Inside of the FastBit table/directory are directories representing partitions. The partitions are created automatically when data is loaded.

All functions take as the first argument the table path/directory

FastBit WAH bitmap indexes are optimal for multi-dimensional range scans, unlike b-tree indexes which are optimal only for one-dimensional queries. This means that FastBit can very efficiently handle queries that MySQL can not, like select c1 from table where c2 between 1 and 20 or c3 between 1 and 90 or c4 in (1,2,3). MySQL can not answer that query using a b-tree index and will resort to a full table scan.

All columns of a fastbit table are automatically bitmapped indexed.

The UDFs functions provided, are: fb_helper, fb_inlist, fb_create, fb_load, fb_query, fb_debug, fb_unlink, fb_delete, fb_insert, fb_insert2, fb_resort.

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Indexes in MySQL

Related

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: The Hash Datatype

Valkey/Redis Replication and Auto-Failover With Sentinel Service

Seamless Table Modifications: Leveraging pt-online-schema-change for Online Alterations

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Indexes in MySQL

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: The Hash Datatype

Valkey/Redis Replication and Auto-Failover With Sentinel Service

Seamless Table Modifications: Leveraging pt-online-schema-change for Online Alterations

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation