July 22, 2014

Find and remove duplicate indexes

Having duplicate keys in our schemas can hurt the performance of our database:

    • They make the optimizer phase slower because MySQL needs to examine more query plans.
    • The storage engine needs to maintain, calculate and update more index statistics
    • DML and even read queries can be slower because MySQL needs update fetch more data to Buffer Pool for the same load
    • Our data needs more disk space so our backups will be bigger and slower
  • In this post I’m going to explain the different types of duplicate indexes and how to find and remove them.


    Duplicate keys on the same column

    This is the easiest one. You can create multiple indexes on the same column and MySQL won’t complain. Let’s see this example:

    mysql> alter table t add index(name);
    mysql> alter table t add index(name);
    mysql> alter table t add index(name);

    mysql> show create table t\G
    [...]
    KEY `name` (`name`),
    KEY `name_2` (`name`),
    KEY `name_3` (`name`)
    [...]

    MySQL detects that the ‘name’ index already exists so it creates the new one appending a number at the end of the name. This type of indexes are easy to find out and to avoid them. How? Just specify the index name and MySQL will avoid to create duplicate indexes:

    mysql> alter table t add index key_for_first_name(name);
    Query OK, 0 rows affected (0.01 sec)
    mysql> alter table t add index key_for_first_name(name);
    ERROR 1061 (42000): Duplicate key name 'key_for_first_name'

    Using custom names with indexes is a good practice because they can avoid duplicates and help you to identify them with more meaningful names.


    Redundant keys on composite indexes

    Let’s start with an example:

    mysql> show create table t;
    [...]
    KEY `key_name` (`name`),
    KEY `key_name_2` (`name`,`age`)

    The redundant index is on the ‘name‘ column. To take benefit from a composite index MySQL doesn’t need to use all the columns of that index, the leftmost prefix is enough. For example an index on columns (A,B,C) can be used to satisfy the combinations (A), (A,B), (A,B,C) but not (B) or (B,C).

    Therefore in the previous example the index ‘key_name‘ is redundant and ‘key_name_2‘ is enough in most of the cases. Sometimes the redundant keys make sense, for example if full index is a lot longer when both long and short index might be good for the query execution.


    Redundant suffixes on clustered index

    InnoDB uses a clustered index and that means that the secondary keys contains the primary key column. Let’s see the following example:

    mysql> show create table t;
    [...]
    PRIMARY KEY (`i`),
    KEY `key_name` (`name`,`i`)

    In this example the index ‘key_name‘ includes the primary key so that last column ‘i‘ is usually not necessary on the secondary key because it’s redundant. I said “usually” because there are some cases where the redundant key can be useful.

    SELECT * FROM t WHERE name='kahxailo' AND i > 100000;

    With a index on (name) the execution plan is Using intersect(name,PRIMARY); Using where and with an index on (name,id) the execution plan changes to Using where. There is also a big difference in the number of Handler_read_next requests, so less data needs to be read.

    It’s also worth to mention that some people tends to create a UNIQUE(i) key on the primary key because they think even if you defined something as PRIMARY KEY would also want to define it as UNIQUE to ensure there is no duplicated. That’s not necessary because the PRIMARY KEY is indeed UNIQUE.


    How can I find all those keys?

    There is a tool in Percona Toolkit that can help you to find all those keys in your schema, its name is pt-duplicate-key-checker and it can find the three different types of keys explained before. Lets try it:

    mysql> show create table t;
    [...]
    PRIMARY KEY (`i`),
    KEY `name` (`name`,`i`),
    KEY `name_2` (`name`),
    KEY `name_3` (`name`),
    KEY `name_4` (`name`,`age`),
    KEY `age` (`age`,`name`,`i`)

    I run the tool on the ‘test’ database:

    root@debian:~# pt-duplicate-key-checker --database=test
    # ########################################################################
    # test.t
    # ########################################################################

    # name_2 is a left-prefix of name_4
    [...]
    # To remove this duplicate index, execute:
    ALTER TABLE `test`.`t` DROP INDEX `name_2`;

    # name_3 is a left-prefix of name_4
    [...]
    # To remove this duplicate index, execute:
    ALTER TABLE `test`.`t` DROP INDEX `name_3`;

    # Key name ends with a prefix of the clustered index
    [...]
    # To shorten this duplicate clustered index, execute:
    ALTER TABLE `test`.`t` DROP INDEX `name`, ADD INDEX `name` (`name`);

    # Key age ends with a prefix of the clustered index
    [...]
    # To shorten this duplicate clustered index, execute:
    ALTER TABLE `test`.`t` DROP INDEX `age`, ADD INDEX `age` (`age`,`name`);

    The tool gives us the reason why the keys are duplicated and provides us the SQL command to solve the problem.


    Conclusion

    Indexes are good for our queries but too many indexes or duplicate or redundant indexes can hurt the performance. Checking the schemas periodically to catch this duplicates can help us to maintain a good overall performance. Before removing indexes on production test it on a testing environment and check the performance of your queries with tools like pt-upgrade. As we saw in previous sections sometimes redundant keys can help us to improve the execution time and decrease the size of the data read.

    About Miguel Angel Nieto

    Miguel joined Percona in October 2011. He has worked as a System Administrator for a Free Software consultant and in the supporting area of the biggest hosting company in Spain. His current focus is improving MySQL and helping the community of Free Software to grow. Miguel's roles inside Percona are Senior Support Engineer and Manager of EMEA Support Team.

    Comments

    1. Kenn Ejima says:

      Question on the “Duplicate keys on clustered index” section:

      Consider a chat room app, we have the messages table, schema being (id, room_id, text, created_at) with index of (room_id) and (room_id, id). id is an autoinc primary key.

      In this case (room_id, id) seems just redundant, but in fact dramatically improves performance of the following query. (Confirmed on MySQL 5.1 and 5.5)

      SELECT * FROM messages WHERE room_id = 10 ORDER BY id DESC LIMIT 100

      If you have millions of messages in a room, you could stop scanning index just reading 100 entries, otherwise you’d have to read the entire index of (room_id) to find 100 most recent messages.

    2. xuancan says:

      In the following scenario,sometimes,we ceate a secondary keys contains the primary key column
      (id is an autoinc primary key ) :
      select * from t where a=x and b=x;
      select * from t where a=x order by id;
      alter table t add index ind_a_b(a,b);
      alter table t add index ind_a(a);
      so,in this scene,duplicate keys on clustered index need the customer make a proper judgment. ^_^

    3. xuancan says:

      In the following scenario,sometimes,duplicate keys on composite indexes need the customer make a proper judgment. ^_^
      select * from t where a=x and b=x;
      select * from t where a=x order by id;
      alter table t add index ind_a_b(a,b);
      alter table t add index ind_a(a);

    4. There are more cases when you would need to have redundant indexes. I would suggest never to use pt-duplicate-key-checker output to apply to production but rather apply it to testing and run benchmarks with your current workload. Most likely you will need to add 1-2 indexes back for every 100 which are being dropped when it is a special case.

    5. ruochen says:

      @Kenn Ejima ,I meet with you the same question。

      @Miguel
      Question on the “Duplicate keys on composite indexes” section:
      i meet with inner(left) join on only one join condition(int or varchar data type),i found only use single-column index but not leftmost prefix index from explain output sometimes

    6. With regard to “Duplicate keys on composite indexes”,

      Mathematically this is correct. “Physically” one may have added the shorter index to make for a faster covering index. This is typically not the case — usually two such indexes make for an unintentional mistake. In most cases Math wins over Physics in terms of intentions.

      There are plenty other scenarios for duplicate indexes, such as a UNIQUE KEY (a) and KEY (a,b). There is a redundant index here. Which?

      See more examples on common_schema’s redundant_keys documentation.

    7. @kenn and @xuancan thank you. Yes, there are some edge cases where a redundant key on PK is usefull. I’ve modified the post to add another example and to point out that no always is “redundant” :)

      @rouchen that can be for several reasons. for example, if the composite index and the single column are both useful for the query, the optimizer may decide to use the “small” one. I’ve added a paragraph to mention that. Thank you.

    8. One source of duplicate keys I encountered is when a duplicate foreign key was created. This was due to some bug in phpMyAdmin.

    9. Lavesh says:

      for some db error is coming as
      Use of uninitialized value $ddl in pattern match (m//) at /usr/bin/pt-duplicate-key-checker line 552.

    Speak Your Mind

    *