Having duplicate keys in our schemas can hurt the performance of our database:
- They make the optimizer phase slower because MySQL needs to examine more query plans.
- The storage engine needs to maintain, calculate and update more index statistics
- DML and even read queries can be slower because MySQL needs update fetch more data to Buffer Pool for the same load
- Our data needs more disk space so our backups will be bigger and slower
In this post I’m going to explain the different types of duplicate indexes and how to find and remove them.
Duplicate keys on the same column
This is the easiest one. You can create multiple indexes on the same column and MySQL won’t complain. Let’s see this example:
mysql> alter table t add index(name);
mysql> alter table t add index(name);
mysql> alter table t add index(name);mysql> show create table t\G
[...]
KEYname
(name
),
KEYname_2
(name
),
KEYname_3
(name
)
[...]
MySQL detects that the ‘name’ index already exists so it creates the new one appending a number at the end of the name. This type of indexes are easy to find out and to avoid them. How? Just specify the index name and MySQL will avoid to create duplicate indexes:
mysql> alter table t add index key_for_first_name(name);
Query OK, 0 rows affected (0.01 sec)
mysql> alter table t add index key_for_first_name(name);
ERROR 1061 (42000): Duplicate key name 'key_for_first_name'
Using custom names with indexes is a good practice because they can avoid duplicates and help you to identify them with more meaningful names.
Redundant keys on composite indexes
Let’s start with an example:
mysql> show create table t;
[...]
KEYkey_name
(name
),
KEYkey_name_2
(name
,age
)
The redundant index is on the ‘name‘ column. To take benefit from a composite index MySQL doesn’t need to use all the columns of that index, the leftmost prefix is enough. For example an index on columns (A,B,C) can be used to satisfy the combinations (A), (A,B), (A,B,C) but not (B) or (B,C).
Therefore in the previous example the index ‘key_name‘ is redundant and ‘key_name_2‘ is enough in most of the cases. Sometimes the redundant keys make sense, for example if full index is a lot longer when both long and short index might be good for the query execution.
Redundant suffixes on clustered index
InnoDB uses a clustered index and that means that the secondary keys contains the primary key column. Let’s see the following example:
mysql> show create table t;
[...]
PRIMARY KEY (i
),
KEYkey_name
(name
,i
)
In this example the index ‘key_name‘ includes the primary key so that last column ‘i‘ is usually not necessary on the secondary key because it’s redundant. I said “usually” because there are some cases where the redundant key can be useful.
SELECT * FROM t WHERE name='kahxailo' AND i > 100000;
With a index on (name) the execution plan is Using intersect(name,PRIMARY); Using where and with an index on (name,id) the execution plan changes to Using where. There is also a big difference in the number of Handler_read_next requests, so less data needs to be read.
It’s also worth to mention that some people tends to create a UNIQUE(i) key on the primary key because they think even if you defined something as PRIMARY KEY would also want to define it as UNIQUE to ensure there is no duplicated. That’s not necessary because the PRIMARY KEY is indeed UNIQUE.
How can I find all those keys?
There is a tool in Percona Toolkit that can help you to find all those keys in your schema, its name is pt-duplicate-key-checker and it can find the three different types of keys explained before. Lets try it:
mysql> show create table t;
[...]
PRIMARY KEY (i
),
KEYname
(name
,i
),
KEYname_2
(name
),
KEYname_3
(name
),
KEYname_4
(name
,age
),
KEYage
(age
,name
,i
)
I run the tool on the ‘test’ database:
root@debian:~# pt-duplicate-key-checker --database=test
# ########################################################################
# test.t
# ######################################################################### name_2 is a left-prefix of name_4
[...]
# To remove this duplicate index, execute:
ALTER TABLEtest
.t
DROP INDEXname_2
;# name_3 is a left-prefix of name_4
[...]
# To remove this duplicate index, execute:
ALTER TABLEtest
.t
DROP INDEXname_3
;# Key name ends with a prefix of the clustered index
[...]
# To shorten this duplicate clustered index, execute:
ALTER TABLEtest
.t
DROP INDEXname
, ADD INDEXname
(name
);# Key age ends with a prefix of the clustered index
[...]
# To shorten this duplicate clustered index, execute:
ALTER TABLEtest
.t
DROP INDEXage
, ADD INDEXage
(age
,name
);
The tool gives us the reason why the keys are duplicated and provides us the SQL command to solve the problem.
Conclusion
Indexes are good for our queries but too many indexes or duplicate or redundant indexes can hurt the performance. Checking the schemas periodically to catch this duplicates can help us to maintain a good overall performance. Before removing indexes on production test it on a testing environment and check the performance of your queries with tools like pt-upgrade. As we saw in previous sections sometimes redundant keys can help us to improve the execution time and decrease the size of the data read.
Question on the “Duplicate keys on clustered index” section:
Consider a chat room app, we have the messages table, schema being (id, room_id, text, created_at) with index of (room_id) and (room_id, id). id is an autoinc primary key.
In this case (room_id, id) seems just redundant, but in fact dramatically improves performance of the following query. (Confirmed on MySQL 5.1 and 5.5)
SELECT * FROM messages WHERE room_id = 10 ORDER BY id DESC LIMIT 100
If you have millions of messages in a room, you could stop scanning index just reading 100 entries, otherwise you’d have to read the entire index of (room_id) to find 100 most recent messages.
In the following scenario,sometimes,we ceate a secondary keys contains the primary key column
(id is an autoinc primary key ) :
select * from t where a=x and b=x;
select * from t where a=x order by id;
alter table t add index ind_a_b(a,b);
alter table t add index ind_a(a);
so,in this scene,duplicate keys on clustered index need the customer make a proper judgment. ^_^
In the following scenario,sometimes,duplicate keys on composite indexes need the customer make a proper judgment. ^_^
select * from t where a=x and b=x;
select * from t where a=x order by id;
alter table t add index ind_a_b(a,b);
alter table t add index ind_a(a);
There are more cases when you would need to have redundant indexes. I would suggest never to use pt-duplicate-key-checker output to apply to production but rather apply it to testing and run benchmarks with your current workload. Most likely you will need to add 1-2 indexes back for every 100 which are being dropped when it is a special case.
@Kenn Ejima ,I meet with you the same question。
@Miguel
Question on the “Duplicate keys on composite indexes” section:
i meet with inner(left) join on only one join condition(int or varchar data type),i found only use single-column index but not leftmost prefix index from explain output sometimes
With regard to “Duplicate keys on composite indexes”,
Mathematically this is correct. “Physically” one may have added the shorter index to make for a faster covering index. This is typically not the case — usually two such indexes make for an unintentional mistake. In most cases Math wins over Physics in terms of intentions.
There are plenty other scenarios for duplicate indexes, such as a UNIQUE KEY (a) and KEY (a,b). There is a redundant index here. Which?
See more examples on common_schema’s redundant_keys documentation.
@kenn and @xuancan thank you. Yes, there are some edge cases where a redundant key on PK is usefull. I’ve modified the post to add another example and to point out that no always is “redundant” 🙂
@rouchen that can be for several reasons. for example, if the composite index and the single column are both useful for the query, the optimizer may decide to use the “small” one. I’ve added a paragraph to mention that. Thank you.
One source of duplicate keys I encountered is when a duplicate foreign key was created. This was due to some bug in phpMyAdmin.
for some db error is coming as
Use of uninitialized value $ddl in pattern match (m//) at /usr/bin/pt-duplicate-key-checker line 552.
How to rectify this error could you please anyone tel me the answer for this.
Executing SQL script in server
ERROR: Error 1022: Can’t write; duplicate key in table ‘warehouse’
SQL Code:
CREATE TABLE IF NOT EXISTS
newproject
.warehouse
(internalId
INT(11) NOT NULL COMMENT ”,warehouseId
INT(11) NOT NULL COMMENT ”,name
VARCHAR(45) NULL DEFAULT NULL COMMENT ”,addressInternalId
INT(11) NOT NULL COMMENT ”,contactInternalId
INT(11) NOT NULL COMMENT ”,INDEX
contactInternalId_idx
(contactInternalId
ASC) COMMENT ”,PRIMARY KEY (
internalId
) COMMENT ”,UNIQUE INDEX
internalId_UNIQUE
(internalId
ASC) COMMENT ”,UNIQUE INDEX
warehouseId_UNIQUE
(warehouseId
ASC) COMMENT ”,UNIQUE INDEX
addressInternalId_UNIQUE
(addressInternalId
ASC) COMMENT ”,UNIQUE INDEX
contactInternalId_UNIQUE
(contactInternalId
ASC) COMMENT ”,CONSTRAINT
contactInternalId
FOREIGN KEY (
contactInternalId
)REFERENCES
newproject
.Contact
(internalId
)ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT
addressInternalId
FOREIGN KEY (
addressInternalId
)REFERENCES
newproject
.Address
(internalId
)ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8
COLLATE = utf8_general_ci
SQL script execution finished: statements: 3 succeeded, 1 failed
Please read this entire article from the beginning and try to figure out by yourself. I am sure you will get your answers. Also read about Primary Key and Unique keys and see what all indexes you really need.
Thanks
@Kenn Ejima the redundant index in your case is (room_id), not (room_id, id)