Having duplicate keys in our schemas can hurt the performance of our database:

    • They make the optimizer phase slower because MySQL needs to examine more query plans.
    • The storage engine needs to maintain, calculate and update more index statistics
    • DML and even read queries can be slower because MySQL needs update fetch more data to Buffer Pool for the same load
    • Our data needs more disk space so our backups will be bigger and slower
  • In this post I’m going to explain the different types of duplicate indexes and how to find and remove them.

    Duplicate keys on the same column

    This is the easiest one. You can create multiple indexes on the same column and MySQL won’t complain. Let’s see this example:

    mysql> alter table t add index(name);
    mysql> alter table t add index(name);
    mysql> alter table t add index(name);

    mysql> show create table t\G
    [...]
    KEY name (name),
    KEY name_2 (name),
    KEY name_3 (name)
    [...]

    MySQL detects that the ‘name’ index already exists so it creates the new one appending a number at the end of the name. This type of indexes are easy to find out and to avoid them. How? Just specify the index name and MySQL will avoid to create duplicate indexes:

    mysql> alter table t add index key_for_first_name(name);
    Query OK, 0 rows affected (0.01 sec)
    mysql> alter table t add index key_for_first_name(name);
    ERROR 1061 (42000): Duplicate key name 'key_for_first_name'

    Using custom names with indexes is a good practice because they can avoid duplicates and help you to identify them with more meaningful names.

    Redundant keys on composite indexes

    Let’s start with an example:

    mysql> show create table t;
    [...]
    KEY key_name (name),
    KEY key_name_2 (name,age)

    The redundant index is on the ‘name‘ column. To take benefit from a composite index MySQL doesn’t need to use all the columns of that index, the leftmost prefix is enough. For example an index on columns (A,B,C) can be used to satisfy the combinations (A), (A,B), (A,B,C) but not (B) or (B,C).

    Therefore in the previous example the index ‘key_name‘ is redundant and ‘key_name_2‘ is enough in most of the cases. Sometimes the redundant keys make sense, for example if full index is a lot longer when both long and short index might be good for the query execution.

    Redundant suffixes on clustered index

    InnoDB uses a clustered index and that means that the secondary keys contains the primary key column. Let’s see the following example:

    mysql> show create table t;
    [...]
    PRIMARY KEY (i),
    KEY key_name (name,i)

    In this example the index ‘key_name‘ includes the primary key so that last column ‘i‘ is usually not necessary on the secondary key because it’s redundant. I said “usually” because there are some cases where the redundant key can be useful.

    SELECT * FROM t WHERE name='kahxailo' AND i > 100000;

    With a index on (name) the execution plan is Using intersect(name,PRIMARY); Using where and with an index on (name,id) the execution plan changes to Using where. There is also a big difference in the number of Handler_read_next requests, so less data needs to be read.

    It’s also worth to mention that some people tends to create a UNIQUE(i) key on the primary key because they think even if you defined something as PRIMARY KEY would also want to define it as UNIQUE to ensure there is no duplicated. That’s not necessary because the PRIMARY KEY is indeed UNIQUE.

    How can I find all those keys?

    There is a tool in Percona Toolkit that can help you to find all those keys in your schema, its name is pt-duplicate-key-checker and it can find the three different types of keys explained before. Lets try it:

    mysql> show create table t;
    [...]
    PRIMARY KEY (i),
    KEY name (name,i),
    KEY name_2 (name),
    KEY name_3 (name),
    KEY name_4 (name,age),
    KEY age (age,name,i)

    I run the tool on the ‘test’ database:

    root@debian:~# pt-duplicate-key-checker --database=test
    # ########################################################################
    # test.t
    # ########################################################################

    # name_2 is a left-prefix of name_4
    [...]
    # To remove this duplicate index, execute:
    ALTER TABLE test.t DROP INDEX name_2;

    # name_3 is a left-prefix of name_4
    [...]
    # To remove this duplicate index, execute:
    ALTER TABLE test.t DROP INDEX name_3;

    # Key name ends with a prefix of the clustered index
    [...]
    # To shorten this duplicate clustered index, execute:
    ALTER TABLE test.t DROP INDEX name, ADD INDEX name (name);

    # Key age ends with a prefix of the clustered index
    [...]
    # To shorten this duplicate clustered index, execute:
    ALTER TABLE test.t DROP INDEX age, ADD INDEX age (age,name);

    The tool gives us the reason why the keys are duplicated and provides us the SQL command to solve the problem.

    Conclusion

    Indexes are good for our queries but too many indexes or duplicate or redundant indexes can hurt the performance. Checking the schemas periodically to catch this duplicates can help us to maintain a good overall performance. Before removing indexes on production test it on a testing environment and check the performance of your queries with tools like pt-upgrade. As we saw in previous sections sometimes redundant keys can help us to improve the execution time and decrease the size of the data read.

    12 Comments
    Oldest
    Newest Most Voted
    Inline Feedbacks
    View all comments
    Kenn Ejima

    Question on the “Duplicate keys on clustered index” section:

    Consider a chat room app, we have the messages table, schema being (id, room_id, text, created_at) with index of (room_id) and (room_id, id). id is an autoinc primary key.

    In this case (room_id, id) seems just redundant, but in fact dramatically improves performance of the following query. (Confirmed on MySQL 5.1 and 5.5)

    SELECT * FROM messages WHERE room_id = 10 ORDER BY id DESC LIMIT 100

    If you have millions of messages in a room, you could stop scanning index just reading 100 entries, otherwise you’d have to read the entire index of (room_id) to find 100 most recent messages.

    xuancan

    In the following scenario,sometimes,we ceate a secondary keys contains the primary key column
    (id is an autoinc primary key ) :
    select * from t where a=x and b=x;
    select * from t where a=x order by id;
    alter table t add index ind_a_b(a,b);
    alter table t add index ind_a(a);
    so,in this scene,duplicate keys on clustered index need the customer make a proper judgment. ^_^

    xuancan

    In the following scenario,sometimes,duplicate keys on composite indexes need the customer make a proper judgment. ^_^
    select * from t where a=x and b=x;
    select * from t where a=x order by id;
    alter table t add index ind_a_b(a,b);
    alter table t add index ind_a(a);

    Peter Zaitsev

    There are more cases when you would need to have redundant indexes. I would suggest never to use pt-duplicate-key-checker output to apply to production but rather apply it to testing and run benchmarks with your current workload. Most likely you will need to add 1-2 indexes back for every 100 which are being dropped when it is a special case.

    ruochen

    @Kenn Ejima ,I meet with you the same question。

    @Miguel
    Question on the “Duplicate keys on composite indexes” section:
    i meet with inner(left) join on only one join condition(int or varchar data type),i found only use single-column index but not leftmost prefix index from explain output sometimes

    Shlomi Noach

    With regard to “Duplicate keys on composite indexes”,

    Mathematically this is correct. “Physically” one may have added the shorter index to make for a faster covering index. This is typically not the case — usually two such indexes make for an unintentional mistake. In most cases Math wins over Physics in terms of intentions.

    There are plenty other scenarios for duplicate indexes, such as a UNIQUE KEY (a) and KEY (a,b). There is a redundant index here. Which?

    See more examples on common_schema’s redundant_keys documentation.

    Daniël van Eeden

    One source of duplicate keys I encountered is when a duplicate foreign key was created. This was due to some bug in phpMyAdmin.

    Lavesh

    for some db error is coming as
    Use of uninitialized value $ddl in pattern match (m//) at /usr/bin/pt-duplicate-key-checker line 552.

    Ganesh

    How to rectify this error could you please anyone tel me the answer for this.
    Executing SQL script in server
    ERROR: Error 1022: Can’t write; duplicate key in table ‘warehouse’
    SQL Code:
    CREATE TABLE IF NOT EXISTS newproject.warehouse (
    internalId INT(11) NOT NULL COMMENT ”,
    warehouseId INT(11) NOT NULL COMMENT ”,
    name VARCHAR(45) NULL DEFAULT NULL COMMENT ”,
    addressInternalId INT(11) NOT NULL COMMENT ”,
    contactInternalId INT(11) NOT NULL COMMENT ”,
    INDEX contactInternalId_idx (contactInternalId ASC) COMMENT ”,
    PRIMARY KEY (internalId) COMMENT ”,
    UNIQUE INDEX internalId_UNIQUE (internalId ASC) COMMENT ”,
    UNIQUE INDEX warehouseId_UNIQUE (warehouseId ASC) COMMENT ”,
    UNIQUE INDEX addressInternalId_UNIQUE (addressInternalId ASC) COMMENT ”,
    UNIQUE INDEX contactInternalId_UNIQUE (contactInternalId ASC) COMMENT ”,
    CONSTRAINT contactInternalId
    FOREIGN KEY (contactInternalId)
    REFERENCES newproject.Contact (internalId)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION,
    CONSTRAINT addressInternalId
    FOREIGN KEY (addressInternalId)
    REFERENCES newproject.Address (internalId)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
    ENGINE = InnoDB
    DEFAULT CHARACTER SET = utf8
    COLLATE = utf8_general_ci

    SQL script execution finished: statements: 3 succeeded, 1 failed

    KRISHAN KUMAR

    Please read this entire article from the beginning and try to figure out by yourself. I am sure you will get your answers. Also read about Primary Key and Unique keys and see what all indexes you really need.

    Thanks

    Nikolay Dimitrov

    @Kenn Ejima the redundant index in your case is (room_id), not (room_id, id)