One schema optimization we often do is extending index when there are queries which can use more key part. Typically this is safe operation, unless index length increases dramatically queries which can use index can also use prefix of the new index are they ? It turns there are special cases when this is not the case.

The obvious optimization is to extend index from column (a) to column (a,b) right which will make it faster and should not hurt any other queries a lot, right ?

Wow. The query runs 30 times faster – not only because it has to scan less rows but also because it is index covering query now – it does not need to access the data.

It turns out it is too early to celebrate. Application also had another query which previously ran so fast it hardly could be noticed. It however became a lot slower:

So why did this query become so much slower ? The reason is its plan benefits from Innodb specific feature – index entries being sorted by primary key for each complete key value. So when you have index (a) and id is a primary key the real index is (a,id) when we extend index to (a,b) it really becomes (a,b,id). So if there is a query which used both a and id key part from original index it will quite likely be unable to use new index to full extent.

What is solution ? You can have “redundant” indexes on (a) and (a,b) at the same time. This is something what suppose to work but it well often does not:

MySQL Optimizer considers using both (a) and (a,b) indexes and in the end decides to use neither rather doing full index scan until it finds a=100. This looks like an optimizer glitch in this case because it estimates it will scan 2247 rows in the selected plan, while using (a) index you can get result scanning only 1 row guaranteed.

To really get things going you will need to use FORCE INDEX(a) to force MySQL optimizer using right plan.

These results mean you should be very careful applying index changes from mk-duplicate-key-checker key checker when it comes to redundant indexes. If you have query plans depending on Innodb ordering of data by primary key inside indexes they can become significantly affected.

Optimizer behavior may be different in different MySQL versions. These tests were done with 5.1.45 though I’ve seen same behavior with MySQL 5.0 too.

10 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Shlomi Noach

Hi Peter,
I’ve seen inconsistent behavior in MySQL/InnoDB with regard to PK being suffix of all indexes. That is, many times execution plans did not go as plan, just as you presented.
“Index entries being sorted by primary key for each complete key value.” — although tests have (almost) always supported this, and despite the fact I sometimes rely on this for query optimizations, I was unsuccessful in finding documentation for that. Is this feature documented? Is it declared to be so? Or is it just current implementation? I’ll be grateful if you could provide with the docs.

I may write about this in the future: I found temporary tables to behave differently with InnoDB plugin compressed format — but, again, inconsistently.

Regards

Shlomi Noach

Peter, thanks.

PS, instead of using FORCE INDEX, what I do is actually drop the “ORDER BY id”, when the desired order is ASC, and just rely on the fact that rows are retrieved by that order.

Shlomi Noach

No offence 😀

Rob Wultsch

It seems like a more ideal way for Innodb to deal with secondary indexes is to explicitly require that the PK be set. I guess I am becoming an enemy of all things implicit.

simon

Not quite understand one thing here:

SELECT * FROM idxitest WHERE a=100 ORDER BY id DESC LIMIT 1;

As we know, the primary key is also the clustered index in InnoDB. If MySQL want to use the primary key to query the above statement, it can’t use the id column in the clustered index, is this right? IMO, Mysql has to scan all the index leaves to find all records with a = 100. At the same time, all the leaves are just the data pages. So in this situation, is there any difference between the full index scan and full table scan?

Thanks.

simon

Thanks peter, that’s more clear.

Raghu Sastry

To support what has been said above, i found this behavior and strengthened my understanding of the index usage. This was tested on 5.5/windows.

CREATE TABLE cust2(
loginname VARCHAR(14) NOT NULL,
custname VARCHAR(255) NOT NULL,
salary FLOAT NOT NULL,
PRIMARY KEY uidx_name (custname,loginname),
KEY idx_sal(salary),
KEY idx_namesal (custname,salary)
) ENGINE=INNODB;

INSERT INTO cust2 (custname,salary,loginname) VALUES (‘raghu’,1000,’raghu’),(‘imran’,2000,’imran’), (‘doofus’,500,’doofus’),(‘joey’,400,’joey’),(‘phoebe’,200,’phoebe’),(‘monica’,200,’monica’)

EXPLAIN SELECT * FROM cust2 WHERE loginname=’raghu’ OR custname=’joey’

initially i thought the primary key would be used (deliberately reversed the order in the where clause), but the index that was used was idx_sal. Needless to say, this index actually expanded to (sal,pk ) in this case sal,custname,loginname.

“id”,”select_type”,”table”,”type”,”possible_keys”,”key”,”key_len”,”ref”,”rows”,”Extra”
“1”,”SIMPLE”,”cust2″,”index”,”PRIMARY,idx_namesal”,”idx_sal”,”4″,\N,”6″,”Using where; Using index”

reversing the order still used the same index
EXPLAIN SELECT * FROM cust2 WHERE custname=’raghu’ OR loginname =’raghu’
“id”,”select_type”,”table”,”type”,”possible_keys”,”key”,”key_len”,”ref”,”rows”,”Extra”
“1”,”SIMPLE”,”cust2″,”index”,”PRIMARY,idx_namesal”,”idx_sal”,”4″,\N,”6″,”Using where; Using index”

but changing the operator to “AND” made it use the primary key.
EXPLAIN SELECT * FROM cust2 WHERE custname=’raghu’ AND loginname =’raghu’

“id”,”select_type”,”table”,”type”,”possible_keys”,”key”,”key_len”,”ref”,”rows”,”Extra”
“1”,”SIMPLE”,”cust2″,”const”,”PRIMARY,idx_namesal”,”PRIMARY”,”811″,”const,const”,”1″,””

This indeed proves that we not only need to verify your queries but index behavior as well.

Thanks!