Redundant index is not always bad

About year ago Peter wrote about redundant indexes and mentioned sometimes it is good to leave two indexes, even one is first part of another. I’m speaking about BTREE indexes, for example, KEY (A), and KEY (A,B). From SQL point of view KEY(A) is not needed, as for queries like WHERE A=5 the index (A,B) also can be used.

But there is case when for performance it would be good to have both

Let we have the table

CREATE TABLE `userinfo` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(64) NOT NULL DEFAULT '',
  `email` varchar(64) NOT NULL DEFAULT '',
  `password` varchar(64) NOT NULL DEFAULT '',
  `dob` date DEFAULT NULL,
  `address` varchar(255) NOT NULL DEFAULT '',
  `city` varchar(64) NOT NULL DEFAULT '',
  `state_id` tinyint(3) unsigned NOT NULL DEFAULT '0',
  `zip` varchar(8) NOT NULL DEFAULT '',
  `country_id` smallint(5) unsigned NOT NULL DEFAULT '0',
  `gender` enum('M','F') NOT NULL DEFAULT 'M',
  `account_type` varchar(32) NOT NULL DEFAULT '',
  `verified` tinyint(4) NOT NULL DEFAULT '0',
  `allow_mail` tinyint(3) unsigned NOT NULL DEFAULT '0',
  `parrent_account` int(10) unsigned NOT NULL DEFAULT '0',
  `closest_airport` varchar(3) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  UNIQUE KEY `email` (`email`),
  KEY `country_id` (`country_id`),
  KEY `state_id` (`state_id`),
) ENGINE=MyISAM

CREATE TABLE `userinfo` (

`id` int(10) unsigned NOT NULL AUTO_INCREMENT,

`name` varchar(64) NOT NULL DEFAULT '',

`email` varchar(64) NOT NULL DEFAULT '',

`password` varchar(64) NOT NULL DEFAULT '',

`dob` date DEFAULT NULL,

`address` varchar(255) NOT NULL DEFAULT '',

`city` varchar(64) NOT NULL DEFAULT '',

`state_id` tinyint(3) unsigned NOT NULL DEFAULT '0',

`zip` varchar(8) NOT NULL DEFAULT '',

`country_id` smallint(5) unsigned NOT NULL DEFAULT '0',

`gender` enum('M','F') NOT NULL DEFAULT 'M',

`account_type` varchar(32) NOT NULL DEFAULT '',

`verified` tinyint(4) NOT NULL DEFAULT '0',

`allow_mail` tinyint(3) unsigned NOT NULL DEFAULT '0',

`parrent_account` int(10) unsigned NOT NULL DEFAULT '0',

`closest_airport` varchar(3) NOT NULL DEFAULT '',

PRIMARY KEY (`id`),

UNIQUE KEY `email` (`email`),

KEY `country_id` (`country_id`),

KEY `state_id` (`state_id`),

) ENGINE=MyISAM

with 1,000,000 rows, and for each state_id there are about 20,000 records.

In the benchmark the query (Q1)

SELECT count(*) FROM userinfo WHERE state_id=5

1	SELECT count(*) FROM userinfo WHERE state_id=5

which uses KEY (state_id), shows the result 115 queries/sec

And now we have the task to execute the query (Q2)

SELECT state_id,city,address FROM userinfo WHERE state_id=5

1	SELECT state_id,city,address FROM userinfo WHERE state_id=5

which extracts additional info – city and address, both long strings columns

In the benchmark for this query we have 10 queries per sec.

Usuall technique we use for such queries is covering index, which store all needed columns in the index, and there is no need to read row data from the table.

So – we can extend index state_id_idx (state_id) by two columns:

ALTER TABLE userinfo DROP KEY state_id, ADD KEY `state_id_2` (state_id, city, address)

1	ALTER TABLE userinfo DROP KEY state_id, ADD KEY `state_id_2` (state_id, city, address)

And now for Q2 we have 16.34 q/s, that is almost +100% comparing with non-convering index.

But if run the test for Q1 we have: 25.40 q/s, that in 4 times slower than in first run.
(This is for MyISAM, for InnoDB difference will be less. MyISAM uses key-compression for VARCHAR columns, so makes things worse).

So to have good performance for both case we should leave both indexes:
KEY state_id (state_id) and
KEY state_id_2 (state_id, city, address)

The drawback is worse INSERT performance, here is time to INSERT 1million records:
only with state_id KEY – 72 sec
with state_id and state_id_2 KEYs – 470sec
(Here I limited available memory to fit only one index, that shows how can degrade performance if we balance on memory limits)

If you are interested in, the numbers for InnoDB in the same tests:
only state_id KEY:
Q1 – 108 q/s
Q2 – 12.12 q/s
only state_id_2 KEY:
Q1 – 100 q/s
Q2 – 24.08 q/s

And INSERT of 1mil records:
only with state_id KEY – 80 sec
with state_id and state_id_2 KEYs – 136sec
(but in this case the memory was enough for both indexes)

7 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Xaprb

16 years ago

For those who want to find duplicate indexes, check out MySQL Toolkit: http://mysqltoolkit.sourceforge.net/

Vadim Tkachenko

Author

16 years ago

Xaprb,

What about checking selectivity of indexes ? 😉
http://www.mysqlperformanceblog.com/2007/08/28/do-you-always-need-index-on-where-column/

Ryan

16 years ago

The main reason I keep around duplicate indexes is because the merge intersect will not use just the first key part of an index. But this also has the disadvantage that it will often use both indexes in the intersect which is a bug in my opinion.

Peter Zaitsev

Admin

16 years ago

Vadim,

Actually speaking about selectivity I’d like to see distribution rather than pure selectivity.
If index has 2 values and one of them is in 99% and other in 1% it well may be index is a good idea to look up for second value – status field is typical type of such field.

Peter Zaitsev

Admin

16 years ago

Ryan,

Are you speaking about duplicate indexes or redundant (as vadim is speaking) ?
What problem do you mean ?

Ryan

16 years ago

I mean redundant indexes. Say one with ‘cola’ and another with ‘(cola, colb)’. The merge intersect will not work in a query that doesn’t use ‘colb’ so you need to add both indexes if you want to use the merge intersect. Also I don’t want to optimize by adding muti-key indexes to every possible combinations of filters a user might have and would rather use the merge intersect for many things. Then this gets the the secondary problem that if you have two redundant indexes and you are filtering on ‘cola’ and ‘colb’ it will often choose to use both indexes with merge intersect which is completly a waste of time as only the muti-key index should be used and not the single ‘cola’ index.

bob

16 years ago

I would be very upset if someone wanted to index on (state,city,address). Address is too big to be useful in an index (it will limit the # records/page and just doesn’t make sense.

It would be interesting to repeat the tests with just (state,city).

This of course ignores all international issues which should impose an index of (country_id,state,city).

note that this is also messy since it makes for a very deep index as opposed to a wide index (i.e. you have to traverse more btree nodes to get to your data).

Also the inserts should take longer, you’re building a much larger index in the 2nd case and causing many more page balances just due to the fact that each page will hold fewer index items.

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Redundant index is not always bad

Related

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: Not-So-Good Practices

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

Valkey/Redis: Configuration Best Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Redundant index is not always bad

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: Not-So-Good Practices

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

Valkey/Redis: Configuration Best Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation