July 28, 2014

Duplicate indexes and redundant indexes

About every second application I look at has some tables which have redundant or duplicate indexes so its the time to speak about these a bit.

So what is duplicate index ? This is when table has multiple indexes defined on the same columns. Sometimes it is indexes with different names, sometimes it is different keywords used to define the index. For example it is quite frequite to see something like
PRIMARY KEY(id), UNIQUE KEY id(id), KEY id2(id)

The logic I heard behind this often – create primary key as object identifier, now we create UNIQUE because we want it to be UNIQUE and we create KEY so it it can be used in the queries. This is wrong and hurts MySQL Performance. It is enough to create PRIMARY KEY and it will enforce unique values and will be used in the queries.

The other case is simply having multiple keys on same column(s) – I guess someone thought key would make sense while did not notice it was already created. MySQL is very permissive and allows you to create many keys on the same column… furthermore these would be real separate keys inside of storage engine which take space on the disk and in memory and which need to be updated on update/insert delete. Duplicate keys are bad so once you find them get rid of them.

Note: Duplicate indexes apply to indexes of the same time. It may make sense to have indexes of different types to created on the same column(s) – perfect example is BTREE index and FULLTEXT index, while other combinations may also make sense.

Note: Order of columns in index is significant, index (A,B) is not duplicate to index (B,A)

So now what are Redundant indexes when ?

I call redundant indexes BTREE indexes which are prefix of other index, for example KEY(A), KEY (A,B), KEY(A(10)). – First and last are redundant indexes because they are prefix of KEY(A,B)

Do redundant indexes have right to exist ? In most cases it is good to get rid of them as well. Queries which take advantage of
redundant index will also be able to use longer index.

Unlike with duplicate indexes, there are however cases when redundant indexes are helpful – typically if longer index is just too long, for example if A is int and B is varchar(255) which holds a lot of long values using KEY(A) might be much faster than using KEY(A,B). So unlike in case of duplicate indexes it is good to give a good thought before removing them.

Typical case when leaving short index AND adding longer one could be when you want certain query to run as index covered query (retrieve all columns from the index) – such indexes may become way too long to be efficiently used by other queries.

There is currently no tool in MySQL distribution which will help you to check your schema for redundant and duplicate indexes. If you know one let me know. It should be also pretty simple to write.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. Jai kamalindu says:

    hai peter,
    i have 2 doubts..
    1:
    in the following queries:
    1) select * from customer where name=’jai’ or salary>1000;
    2) select * from customer where name=’jai’ and salary>1000;

    if there are individual indexes on name,salary and an index on (name,salary) which index will be used?
    on what basis is it decided on?

    2:
    i have 3 column in my customer table (Id, name,salary). i have an index indNameSal on (name,salary)

    in the following queries..
    1) explain select * from customer where name=’jai’ and salary>1000;
    2) explain select * from customer where salary>1000 and name=’jai’;
    in both these cases i got the result that index indNameSal is used.
    y is that so?? i am using mysql 5.5.6, is it something to do with optimization in the version.

  2. pabloj says:

    I think that a query on the information schema views should be enough.

  3. peter says:

    Yeah. That is for smarties.

    I would like a script which works on 4.1 though. There is still a lot of production on 4.1 and for good reason.

  4. Lukas says:

    It should be fairly easy to write up something using the PEAR::MDB2 manager module which supports pre-information-schema mysql versions and supports reverse engineering indexes and constraints. So by simply calling a view array functions one should be able to find out any unncessary overlap.

    http://pear.php.net/package/MDB2/docs/latest/MDB2/MDB2_Driver_Manager_Common.html

  5. peter says:

    Lukas,

    Honestly speaking SHOW INDEX FROM ‘table’ has all required info so there is no need to do any parsing of SHOW CREATE TABLE. If you want to run it for all instance SHOW DATABASES and SHOW TABLES is your friend.

    I guess yout can use something like MDB2 but I would not add extra dependencies without needs.

  6. Todd says:

    This is all good information – thanks for publishing it. The only caveat I would throw in here would be that FULLTEXT indexes should be considered separately from non-FULLTEXT indexes. It probably wouldn’t be good for somebody to discard an index on (title) because there is a FULLTEXT index on (title, message).

  7. peter says:

    Todd,

    You’re completely right. The article however says about it :)
    “Note: Duplicate indexes apply to indexes of the same time. It may make sense to have indexes of different types to created on the same column(s) – perfect example is BTREE index and FULLTEXT index, while other combinations may also make sense.”

    You also might want to have BTREE and RTREE indexes on the same columns or even BTREE and HASH in certain rare cases.

  8. Todd says:

    Peter – sorry I missed that!

  9. There are some cases when redundent indexes cannot be avoided for instance

    A
    B
    C
    D

    INDEX (A,B),
    INDEX (A,C),
    INDEX (A,D)

    In the case where you need to do Constant lookups on

    A and ORDER by (B,C, or D);

    For instance

    A=1 ORDER BY B
    A=2 ORDER BY C
    A=3 ORDER BY D

    This is needed to remove filesorts which can take up to 50% or more of the execution times in large result sets.

  10. peter says:

    Dathan,

    These indexes are not redundant. Might be I did not define it completely – one index has to be prefix of another one to be redundant, if indexes just have same prefix they are not redundant.

    What you’re describing is actually very common indexing example.

  11. I have in fact written such a tool already some time ago. See my older blog post here: http://jroller.com/page/dschneller?entry=mysql_indices

    I was however in the middle of tuning it a little when I saw this here. So stay tuned for an update in the next few days that will allow you to automatically create ALTER TABLES than can be used to get rid of superflous indices. It will also consider the index type. Oh, and it works with MySQL 4, because I do not rely on the presence of information_schema.

    Daniel

  12. Xaprb says:

    It’s worth knowing that many third-party tools have duplicate indexes too. Here is one table from Mambo with a duplicate index on a long column:

    CREATE TABLE #__core_acl_aro (
    aro_id int(11) NOT NULL auto_increment,
    section_value varchar(240) NOT NULL default ’0′,
    value varchar(240) NOT NULL default ”,
    order_value int(11) NOT NULL default ’0′,
    name varchar(255) NOT NULL default ”,
    hidden int(11) NOT NULL default ’0′,
    PRIMARY KEY (aro_id),
    UNIQUE KEY section_value_value_aro (section_value,value),
    UNIQUE KEY #__gacl_section_value_value_aro (section_value,value),
    KEY hidden_aro (hidden),
    KEY #__gacl_hidden_aro (hidden)
    ) TYPE=MyISAM;

    You can’t just install third-party things without checking them for quality.

  13. peter says:

    This one looks strange – these look like temporary indexes or something like it.

    But in general do not think third party software perfect. I’ve seen design errors much worse than duplicate indexes.

    Some is better then other of course. And the interesting thing even popular software can be quite bad – if it is used under light load by most users it may stay untuned for years.

  14. Xaprb says:

    Sure. I have a web application that I built years ago and still maintain. It has light load and would never stand up to heavy load (for example the templating system uses a bunch of str_replace to insert variable content!). So I can’t complain too badly :-)

    Those are not temporary indexes though. I know about this because someone was trying to install Mambo and needed utf8 columns there. I think someone on the Mambo team created those indexes with a tool or something. Then the schema-create SQL script may have been created by a tool dumping the badly made tables, rather than by someone hand-writing exactly what the app really needs. I have seen many such errors, caused by copy-and-paste or export-from-tool.

    One comment — in a number of programs I’ve written I prefer parsing SHOW CREATE TABLE to using DESCRIBE TABLE or SHOW INDEX. SHOW CREATE TABLE is much faster. SHOW INDEX in particular can be quite slow. One such program I rewrote to parse SHOW CREATE TABLE and it went from a solid 5 minutes just inspecting table structures to maybe a second or so. These are scripts that check for certain things about indexes or foreign keys, or generate our ORM classes.

    I did write an index checker. It takes about 30 lines of Perl, though it doesn’t check for FULLTEXT etc. Any competent Perl programmer ought to be able to whip one out easily. To those trying to write such a tool: it is probably not necessary for it to do the job 100% perfectly or output statements you can use to fix the indexes. No sane DBA would ever just take those statements and run them. You just need something that can tell you where you might need to hand-inspect. For those reasons it can be OK to output false positives and leave something to be done by hand. And if you make it output reasonably verbose information when it finds a positive, such as just dumping the entire CREATE TABLE statement, the DBA can examine that output and know whether he or she needs to do anything about it.

  15. Xaprb says:

    I meant to say — I know about those indexes on Mambo because someone was asking my help with installing it, and because they needed to have utf8 columns the indexes were longer than the allowed length!

  16. peter says:

    Yeah. Anyway if you publish your index checked it would be good.
    Daniel also has one but yours might work for Javafobic people :)

  17. Xaprb says:

    I’ll re-write it (I wrote it for work, so can’t just publish it here…) and put it on my blog.

  18. Lukas says:

    Peter the advantage of using MDB2 would be that this tool also works with other RDBMS :)
    However if its only for MySQL it would of course be an unwanted dependency.

  19. Xaprb says:

    The tool to find duplicate/redundant indexes and foreign keys is on MySQL Forge (http://forge.mysql.com/) and at http://www.xaprb.com/blog/2006/08/28/how-to-find-duplicate-and-redundant-indexes-in-mysql/.

  20. Hi Peter,

    I was wondering whether the leftmost prefix criterion would be true for all ordered indexes.
    In other words: if you have two ordered (non-hash) indexes I1, I2 each of the same type T with T in (BTREE,FULLTEXT,RTREE,TTREE) then I1 is duplicate of I2 of the entire columnlist of I1 is the leftmost prefix of the entire columnlist of I2.

    I was also wondering about uniqueness.

    Suppose we have a unique index UI and a nordinay index I, again of the same type. Then, clearly, UI would not be redundant even if its columnlist is a prefix of the columnlist of I. UI might be redundant in the sense that it defines the same access path, but it also maintains unicity so we cannot just get rid of it. So far, so good. I have some questions regarding this:

    Since the prefix is unique, could it help the optimizer to make the longer index a UNIQUE one too? By definition, the values in the longer index are unique too, and telling the database that the longer index is unique might help the optimizer in making decisions. Or Maybe the optimizer is smart enough to infer uniqueness for the longer index? Or, Maybe the optimizer does not use uniqueness information at all?

    kind regards,

    Roland Bouman

  21. peter says:

    Roland,

    First I only spoke about BTREE/TTREE indexes. FullText search indexes are different. RTREE should also be the same but I’m not 100% sure. These are used only with spatial extenrsions anyway.

    Speaking about UNIQUE it is interesting. If you have UNIQUE(A) and KEY(A,B) it is quite possible you do not need
    KEY(A,B) anyway as A is already unique so if you have something like “WHERE A=5 and B=6″ longer index is not really helpful.

    It is of course different if you have longer index to have index covered queries but It is other story.

    UNIQUE is both Index and constraint, but unfortunately they come together even if they do not need to. Ie for some cases I would really like to be able to create KEY(A,B,C), which has prefix (A,B) unique. Currently it is not possible and one ether have to create extra index or do unique checking on application.

  22. Hi Peter,

    thanks for your reply.

    In the mean while, I found out that at least in MySQL, the prefix redundancy problem just cannot occur for RTREE Spatial index, because they cannot be composite:

    mysql> create table geom (g1 geometry not null, g2 geometry not null, spatial index(g1,g2)) ENGINE=MyISAM;
    ERROR 1210 (HY000): Incorrect arguments to SPATIAL INDEX

    So that clears it up. I should’ve tried first of course – sorry to bother you with that.

    As for the UNIQUE indexes: If I understand you correctly, you describe a case where using a longer non-unique and non-covering index could give worse performance over a shorter unique index that is the prefix of the longer one – would that be because time would be wasted reading the index, because the query still needs an extra lookup to the data? (time that would not have been wasted if the longer index would’ve been a covering index?)

    I think that is an intersting case too, but I think it does not answer my question. My question was – would the optimizer somehow benefit if we would mark the longer index also as a UNIQUE one. I mean, potentially, it gives the optimizer more information, possible enabling it to create better execution plans.

    BTW: I really think your idea of an index with a unique prefix is interesting. I mean, it looks simple, elegant, and at the same time I can imagine how it would help in a lot of cases. I think it kind of approaches Oracle’s index organized tables (IOT’s), which are essentially a big covering index instead of a proper table.

    It would be great if you could think a bit about my original question regarding UNIQUE indexes. But either way: thanks!

  23. peter says:

    Roland,

    Right about spatial indexes. I did not play with them for a while. Some time ago they also worked for non geometry fields but I think that code was later dropped.

    Right. My point is unless you’re getting covered index it does not make sense to add columns to unique index as it does not improve cardinality.

    Regarding your question about optimizer – yes it may help a bit if all columns in index are used with = as this is only case when execution can be better with unique index – after row is read MySQL will not query if there are more rows as in unique key there is no more than one row for each not null combination. It may help also with stats in this case, but not in other case.

    In general I do not think it is good idea as UNIQUE indexes are more expensive to maintain – for MyISAM tables repair by sort typically would not be used on LOAD DATA INFILE/ALTER TABLE for Innodb tables insert buffer would not work.

  24. Peter,

    thanks for your clear and concise answer!
    I appreciate it a lot you took the time. :)

    Roland Bouman

  25. Olivier says:

    Hi,

    And what about Innodb PK ? I heard that to speed UP queries, I should declare an additionnal key on the PK, and force the use of this key instead of the primary (because innodb PK are stored in fragments). So, true or false ?

  26. peter says:

    Olivier.

    This is pretty strange idea. Innodb PK is usually most efficient key to use for row data retrieval.

  27. Vaibogam S says:

    A simple way to find out duplicate indexes in 5.0.x

    SELECT * FROM information_schema.statistics WHERE table_schema = ‘?’ GROUP BY table_schema, table_name, column_name HAVING count(column_name) > 1;

    It would be better to include ‘index_type’ in GROUP BY clause to more accurate answer.

    In the above query bind the ‘database name’ parameter.

    -Bog

  28. peter says:

    Vaibogam,

    This will catch indexes (A,B) and (A,C) as duplicate would not it ?
    It would be simple if indexes would be single column only :)

  29. “Order of columns in index is significant, index (A,B) is not duplicate to index (B,A)”

    Let’s say my Primary Key is: PRIMARY KEY(A,B)

    I also need a standard BTREE index on B: KEY(B)

    Now, would it be in my interest to change PRIMARY KEY(A,B) to PRIMARY KEY(B,A) and thus get the covering index on B and have no need for the additional KEY(B) ?

    Thanks for your help and all the great information on this site.

  30. Will says:

    Will this setup allow the query below to properly use both indexes? Does the order of where clauses matter in such a scenario?

    index alpha (B,C)
    index beta (A)

    select A, B from x where A = x and B = x

  31. amit shah says:

    Hi,

    Please help me to know how to drop duplicate index with same name.
    === Error ===
    alter table emp drop index fk_deptid;ERROR 1553 (HY000): Cannot drop index ‘fk_deptid’: needed in a foreign key constraint
    =============

    Below is scenario.

    drop table if exists emp;
    drop table if exists dept;
    create table dept
    (
    deptid integer auto_increment primary key,
    deptname varchar(20)
    );

    create table emp
    (
    empid integer auto_increment primary key,
    empname varchar(20),
    deptid integer,
    key fk_deptid (deptid),
    constraint fk_deptid foreign key(deptid) references dept(deptid)
    );

    * Find below output which shows indexes in the database. (Query for below output is at the end)

    mysql> source list.sql
    +————–+————+—————–+—————–+————-+———————–+————————+
    | table_schema | table_name | constraint_name | constraint_type | column_list | referenced_table_name | referenced_column_name |
    +————–+————+—————–+—————–+————-+———————–+————————+
    | test | dept | PRIMARY | PRIMARY KEY | deptid | NULL | NULL |
    | test | emp | fk_deptid | NON UNIQUE | deptid | NULL | NULL |
    | test | emp | fk_deptid | FOREIGN KEY | deptid | dept | deptid |
    | test | emp | PRIMARY | PRIMARY KEY | empid | NULL | NULL |
    +————–+————+—————–+—————–+————-+———————–+————————+

    alter table emp drop index fk_deptid;ERROR 1553 (HY000): Cannot drop index ‘fk_deptid’: needed in a foreign key constraint

    +++Query++++
    SELECT a.table_schema, a.table_name, a.constraint_name,
    a.constraint_type,
    convert(group_concat(DISTINCT b.column_name
    ORDER BY b.ordinal_position SEPARATOR ‘, ‘), char)
    as column_list,
    b.referenced_table_name, b.referenced_column_name
    FROM information_schema.table_constraints a
    INNER JOIN information_schema.key_column_usage b
    ON a.constraint_name = b.constraint_name AND
    a.table_schema = b.table_schema AND
    a.table_name = b.table_name
    WHERE a.table_schema=’test’
    GROUP BY a.table_schema, a.table_name, a.constraint_name,
    a.constraint_type, b.referenced_table_name,
    b.referenced_column_name
    UNION
    SELECT table_schema, table_name, index_name as constraint_name,
    if(index_type=’FULLTEXT’, ‘FULLTEXT’, ‘NON UNIQUE’)
    as constraint_type,
    convert(group_concat(column_name
    ORDER BY seq_in_index separator ‘, ‘), char) as column_list,
    null as referenced_table_name, null as referenced_column_name
    FROM information_schema.statistics
    WHERE non_unique = 1
    AND table_schema=’test’
    GROUP BY table_schema, table_name, constraint_name, constraint_type,
    referenced_table_name, referenced_column_name
    ORDER BY table_schema, table_name, column_list

    Thanks in advance.

    Regards,
    Amit.

  32. CSev says:

    I just wanted to let you know that there is a tool which checks for redundant indexes, SQLyog Ultimate. It’s probably a bit pricy for just that functionality though.

Speak Your Mind

*