About every second application I look at has some tables which have redundant or duplicate indexes so its the time to speak about these a bit.

So what is duplicate index ? This is when table has multiple indexes defined on the same columns. Sometimes it is indexes with different names, sometimes it is different keywords used to define the index. For example it is quite frequite to see something like
PRIMARY KEY(id), UNIQUE KEY id(id), KEY id2(id)

The logic I heard behind this often – create primary key as object identifier, now we create UNIQUE because we want it to be UNIQUE and we create KEY so it it can be used in the queries. This is wrong and hurts MySQL Performance. It is enough to create PRIMARY KEY and it will enforce unique values and will be used in the queries.

The other case is simply having multiple keys on same column(s) – I guess someone thought key would make sense while did not notice it was already created. MySQL is very permissive and allows you to create many keys on the same column… furthermore these would be real separate keys inside of storage engine which take space on the disk and in memory and which need to be updated on update/insert delete. Duplicate keys are bad so once you find them get rid of them.

Note: Duplicate indexes apply to indexes of the same time. It may make sense to have indexes of different types to created on the same column(s) – perfect example is BTREE index and FULLTEXT index, while other combinations may also make sense.

Note: Order of columns in index is significant, index (A,B) is not duplicate to index (B,A)

So now what are Redundant indexes when ?

I call redundant indexes BTREE indexes which are prefix of other index, for example KEY(A), KEY (A,B), KEY(A(10)). – First and last are redundant indexes because they are prefix of KEY(A,B)

Do redundant indexes have right to exist ? In most cases it is good to get rid of them as well. Queries which take advantage of
redundant index will also be able to use longer index.

Unlike with duplicate indexes, there are however cases when redundant indexes are helpful – typically if longer index is just too long, for example if A is int and B is varchar(255) which holds a lot of long values using KEY(A) might be much faster than using KEY(A,B). So unlike in case of duplicate indexes it is good to give a good thought before removing them.

Typical case when leaving short index AND adding longer one could be when you want certain query to run as index covered query (retrieve all columns from the index) – such indexes may become way too long to be efficiently used by other queries.

There is currently no tool in MySQL distribution which will help you to check your schema for redundant and duplicate indexes. If you know one let me know. It should be also pretty simple to write.

32 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
pabloj

I think that a query on the information schema views should be enough.

Lukas

It should be fairly easy to write up something using the PEAR::MDB2 manager module which supports pre-information-schema mysql versions and supports reverse engineering indexes and constraints. So by simply calling a view array functions one should be able to find out any unncessary overlap.

http://pear.php.net/package/MDB2/docs/latest/MDB2/MDB2_Driver_Manager_Common.html

Todd

This is all good information – thanks for publishing it. The only caveat I would throw in here would be that FULLTEXT indexes should be considered separately from non-FULLTEXT indexes. It probably wouldn’t be good for somebody to discard an index on (title) because there is a FULLTEXT index on (title, message).

Todd

Peter – sorry I missed that!

Dathan Pattishall

There are some cases when redundent indexes cannot be avoided for instance

A
B
C
D

INDEX (A,B),
INDEX (A,C),
INDEX (A,D)

In the case where you need to do Constant lookups on

A and ORDER by (B,C, or D);

For instance

A=1 ORDER BY B
A=2 ORDER BY C
A=3 ORDER BY D

This is needed to remove filesorts which can take up to 50% or more of the execution times in large result sets.

Daniel Schneller

I have in fact written such a tool already some time ago. See my older blog post here: http://jroller.com/page/dschneller?entry=mysql_indices

I was however in the middle of tuning it a little when I saw this here. So stay tuned for an update in the next few days that will allow you to automatically create ALTER TABLES than can be used to get rid of superflous indices. It will also consider the index type. Oh, and it works with MySQL 4, because I do not rely on the presence of information_schema.

Daniel

Xaprb

It’s worth knowing that many third-party tools have duplicate indexes too. Here is one table from Mambo with a duplicate index on a long column:

CREATE TABLE #__core_acl_aro (
aro_id int(11) NOT NULL auto_increment,
section_value varchar(240) NOT NULL default ‘0’,
value varchar(240) NOT NULL default ”,
order_value int(11) NOT NULL default ‘0’,
name varchar(255) NOT NULL default ”,
hidden int(11) NOT NULL default ‘0’,
PRIMARY KEY (aro_id),
UNIQUE KEY section_value_value_aro (section_value,value),
UNIQUE KEY #__gacl_section_value_value_aro (section_value,value),
KEY hidden_aro (hidden),
KEY #__gacl_hidden_aro (hidden)
) TYPE=MyISAM;

You can’t just install third-party things without checking them for quality.

Xaprb

Sure. I have a web application that I built years ago and still maintain. It has light load and would never stand up to heavy load (for example the templating system uses a bunch of str_replace to insert variable content!). So I can’t complain too badly 🙂

Those are not temporary indexes though. I know about this because someone was trying to install Mambo and needed utf8 columns there. I think someone on the Mambo team created those indexes with a tool or something. Then the schema-create SQL script may have been created by a tool dumping the badly made tables, rather than by someone hand-writing exactly what the app really needs. I have seen many such errors, caused by copy-and-paste or export-from-tool.

One comment — in a number of programs I’ve written I prefer parsing SHOW CREATE TABLE to using DESCRIBE TABLE or SHOW INDEX. SHOW CREATE TABLE is much faster. SHOW INDEX in particular can be quite slow. One such program I rewrote to parse SHOW CREATE TABLE and it went from a solid 5 minutes just inspecting table structures to maybe a second or so. These are scripts that check for certain things about indexes or foreign keys, or generate our ORM classes.

I did write an index checker. It takes about 30 lines of Perl, though it doesn’t check for FULLTEXT etc. Any competent Perl programmer ought to be able to whip one out easily. To those trying to write such a tool: it is probably not necessary for it to do the job 100% perfectly or output statements you can use to fix the indexes. No sane DBA would ever just take those statements and run them. You just need something that can tell you where you might need to hand-inspect. For those reasons it can be OK to output false positives and leave something to be done by hand. And if you make it output reasonably verbose information when it finds a positive, such as just dumping the entire CREATE TABLE statement, the DBA can examine that output and know whether he or she needs to do anything about it.

Xaprb

I meant to say — I know about those indexes on Mambo because someone was asking my help with installing it, and because they needed to have utf8 columns the indexes were longer than the allowed length!

Xaprb

I’ll re-write it (I wrote it for work, so can’t just publish it here…) and put it on my blog.

Lukas

Peter the advantage of using MDB2 would be that this tool also works with other RDBMS 🙂
However if its only for MySQL it would of course be an unwanted dependency.

Xaprb

The tool to find duplicate/redundant indexes and foreign keys is on MySQL Forge (http://forge.mysql.com/) and at http://www.xaprb.com/blog/2006/08/28/how-to-find-duplicate-and-redundant-indexes-in-mysql/.

Roland Bouman

Hi Peter,

I was wondering whether the leftmost prefix criterion would be true for all ordered indexes.
In other words: if you have two ordered (non-hash) indexes I1, I2 each of the same type T with T in (BTREE,FULLTEXT,RTREE,TTREE) then I1 is duplicate of I2 of the entire columnlist of I1 is the leftmost prefix of the entire columnlist of I2.

I was also wondering about uniqueness.

Suppose we have a unique index UI and a nordinay index I, again of the same type. Then, clearly, UI would not be redundant even if its columnlist is a prefix of the columnlist of I. UI might be redundant in the sense that it defines the same access path, but it also maintains unicity so we cannot just get rid of it. So far, so good. I have some questions regarding this:

Since the prefix is unique, could it help the optimizer to make the longer index a UNIQUE one too? By definition, the values in the longer index are unique too, and telling the database that the longer index is unique might help the optimizer in making decisions. Or Maybe the optimizer is smart enough to infer uniqueness for the longer index? Or, Maybe the optimizer does not use uniqueness information at all?

kind regards,

Roland Bouman

Roland Bouman

Hi Peter,

thanks for your reply.

In the mean while, I found out that at least in MySQL, the prefix redundancy problem just cannot occur for RTREE Spatial index, because they cannot be composite:

mysql> create table geom (g1 geometry not null, g2 geometry not null, spatial index(g1,g2)) ENGINE=MyISAM;
ERROR 1210 (HY000): Incorrect arguments to SPATIAL INDEX

So that clears it up. I should’ve tried first of course – sorry to bother you with that.

As for the UNIQUE indexes: If I understand you correctly, you describe a case where using a longer non-unique and non-covering index could give worse performance over a shorter unique index that is the prefix of the longer one – would that be because time would be wasted reading the index, because the query still needs an extra lookup to the data? (time that would not have been wasted if the longer index would’ve been a covering index?)

I think that is an intersting case too, but I think it does not answer my question. My question was – would the optimizer somehow benefit if we would mark the longer index also as a UNIQUE one. I mean, potentially, it gives the optimizer more information, possible enabling it to create better execution plans.

BTW: I really think your idea of an index with a unique prefix is interesting. I mean, it looks simple, elegant, and at the same time I can imagine how it would help in a lot of cases. I think it kind of approaches Oracle’s index organized tables (IOT’s), which are essentially a big covering index instead of a proper table.

It would be great if you could think a bit about my original question regarding UNIQUE indexes. But either way: thanks!

Roland Bouman

Peter,

thanks for your clear and concise answer!
I appreciate it a lot you took the time. 🙂

Roland Bouman

Olivier

Hi,

And what about Innodb PK ? I heard that to speed UP queries, I should declare an additionnal key on the PK, and force the use of this key instead of the primary (because innodb PK are stored in fragments). So, true or false ?

Vaibogam S

A simple way to find out duplicate indexes in 5.0.x

SELECT * FROM information_schema.statistics WHERE table_schema = ‘?’ GROUP BY table_schema, table_name, column_name HAVING count(column_name) > 1;

It would be better to include ‘index_type’ in GROUP BY clause to more accurate answer.

In the above query bind the ‘database name’ parameter.

-Bog

Aaron Schmidt

“Order of columns in index is significant, index (A,B) is not duplicate to index (B,A)”

Let’s say my Primary Key is: PRIMARY KEY(A,B)

I also need a standard BTREE index on B: KEY(B)

Now, would it be in my interest to change PRIMARY KEY(A,B) to PRIMARY KEY(B,A) and thus get the covering index on B and have no need for the additional KEY(B) ?

Thanks for your help and all the great information on this site.

Will

Will this setup allow the query below to properly use both indexes? Does the order of where clauses matter in such a scenario?

index alpha (B,C)
index beta (A)

select A, B from x where A = x and B = x

Jai kamalindu

hai peter,
i have 2 doubts..
1:
in the following queries:
1) select * from customer where name=’jai’ or salary>1000;
2) select * from customer where name=’jai’ and salary>1000;

if there are individual indexes on name,salary and an index on (name,salary) which index will be used?
on what basis is it decided on?

2:
i have 3 column in my customer table (Id, name,salary). i have an index indNameSal on (name,salary)

in the following queries..
1) explain select * from customer where name=’jai’ and salary>1000;
2) explain select * from customer where salary>1000 and name=’jai’;
in both these cases i got the result that index indNameSal is used.
y is that so?? i am using mysql 5.5.6, is it something to do with optimization in the version.

amit shah

Hi,

Please help me to know how to drop duplicate index with same name.
=== Error ===
alter table emp drop index fk_deptid;ERROR 1553 (HY000): Cannot drop index ‘fk_deptid’: needed in a foreign key constraint
=============

Below is scenario.

drop table if exists emp;
drop table if exists dept;
create table dept
(
deptid integer auto_increment primary key,
deptname varchar(20)
);

create table emp
(
empid integer auto_increment primary key,
empname varchar(20),
deptid integer,
key fk_deptid (deptid),
constraint fk_deptid foreign key(deptid) references dept(deptid)
);

* Find below output which shows indexes in the database. (Query for below output is at the end)

mysql> source list.sql
+————–+————+—————–+—————–+————-+———————–+————————+
| table_schema | table_name | constraint_name | constraint_type | column_list | referenced_table_name | referenced_column_name |
+————–+————+—————–+—————–+————-+———————–+————————+
| test | dept | PRIMARY | PRIMARY KEY | deptid | NULL | NULL |
| test | emp | fk_deptid | NON UNIQUE | deptid | NULL | NULL |
| test | emp | fk_deptid | FOREIGN KEY | deptid | dept | deptid |
| test | emp | PRIMARY | PRIMARY KEY | empid | NULL | NULL |
+————–+————+—————–+—————–+————-+———————–+————————+

alter table emp drop index fk_deptid;ERROR 1553 (HY000): Cannot drop index ‘fk_deptid’: needed in a foreign key constraint

+++Query++++
SELECT a.table_schema, a.table_name, a.constraint_name,
a.constraint_type,
convert(group_concat(DISTINCT b.column_name
ORDER BY b.ordinal_position SEPARATOR ‘, ‘), char)
as column_list,
b.referenced_table_name, b.referenced_column_name
FROM information_schema.table_constraints a
INNER JOIN information_schema.key_column_usage b
ON a.constraint_name = b.constraint_name AND
a.table_schema = b.table_schema AND
a.table_name = b.table_name
WHERE a.table_schema=’test’
GROUP BY a.table_schema, a.table_name, a.constraint_name,
a.constraint_type, b.referenced_table_name,
b.referenced_column_name
UNION
SELECT table_schema, table_name, index_name as constraint_name,
if(index_type=’FULLTEXT’, ‘FULLTEXT’, ‘NON UNIQUE’)
as constraint_type,
convert(group_concat(column_name
ORDER BY seq_in_index separator ‘, ‘), char) as column_list,
null as referenced_table_name, null as referenced_column_name
FROM information_schema.statistics
WHERE non_unique = 1
AND table_schema=’test’
GROUP BY table_schema, table_name, constraint_name, constraint_type,
referenced_table_name, referenced_column_name
ORDER BY table_schema, table_name, column_list

Thanks in advance.

Regards,
Amit.

CSev

I just wanted to let you know that there is a tool which checks for redundant indexes, SQLyog Ultimate. It’s probably a bit pricy for just that functionality though.