Using index for ORDER BY vs restricting number of rows.

One interesting problem with MySQL Optimizer I frequently run into is making poor decision when it comes to choosing between using index for ORDER BY or using index for restriction.

Consider we’re running web site which sell goods, goods may be from different categories, different sellers different locations which can be filtered on, and there are also bunch of fields which sorting can be performed on such as seller, price, date added etc.

Such configuration often causes serious challenge choosing proper index configuration as it is hard to add all combinations of restrictions and order by to be fully indexed.

An extra problem comes from the fact MySQL prefers when it is possible to use index for further restriction and than using file sort, rather than using index for sorting and doing non-index based filtering for further restrictions. Here is example:

CREATE TABLE `goods` (
  `cat_id` int(10) unsigned NOT NULL,
  `seller_id` int(10) unsigned NOT NULL,
  `price` decimal(10,2) NOT NULL,
  KEY `cat_id` (`cat_id`,`price`),
  KEY `cat_id_2` (`cat_id`,`seller_id`
)

mysql> explain  select * from goods where cat_id=5 and seller_id=1 order by price desc limit 10 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: goods
         type: ref
possible_keys: cat_id,cat_id_2
          key: cat_id_2
      key_len: 8
          ref: const,const
         rows: 296338
        Extra: Using where; Using filesort
1 row in set (0.00 sec)

mysql> explain select * from goods force index(cat_id) where cat_id=5 and seller_id=1 order by price desc limit 10 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: goods
         type: ref
possible_keys: cat_id
          key: cat_id
      key_len: 4
          ref: const
         rows: 989171
        Extra: Using where
1 row in set (0.00 sec)

CREATE TABLE `goods` (

`cat_id` int(10) unsigned NOT NULL,

`seller_id` int(10) unsigned NOT NULL,

`price` decimal(10,2) NOT NULL,

KEY `cat_id` (`cat_id`,`price`),

KEY `cat_id_2` (`cat_id`,`seller_id`

)

mysql> explain select * from goods where cat_id=5 and seller_id=1 order by price desc limit 10 \G

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: goods

type: ref

possible_keys: cat_id,cat_id_2

key: cat_id_2

key_len: 8

ref: const,const

rows: 296338

Extra: Using where; Using filesort

1 row in set (0.00 sec)

mysql> explain select * from goods force index(cat_id) where cat_id=5 and seller_id=1 order by price desc limit 10 \G

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: goods

type: ref

possible_keys: cat_id

key: cat_id

key_len: 4

ref: const

rows: 989171

Extra: Using where

1 row in set (0.00 sec)

As you can see if given no hint MySQL will prefer to use index on (cat_id,seller_id) and sort all result set by price. This will be good choice if seller_id is selective, if it is not as in this case MySQL needs to sort a lot of rows to display only few.

If we force index as in second query explain will look scary with estimated million of rows to analyze but we got rid of filesort so MySQL can stop as soon as 10 rows are sent. In this case with seller_id being not really selective it is likely it will need to scan less than 100 rows to generate result.

The speed difference between these two example queries is about 100 times so it may be quite serious.

To fix this issue MySQL would need to better take into account column selectivity together with LIMIT range. If there are only few values for given seller_id (as it well can be skewed) using filesort is better as otherwise very large portion of index may need to be scanned to find 10 matching rows, if there are a lot of values of given seller_id, so it is badly selective using index scan is much better idea.

Until MySQL is able to handle this you will have to use force index hint.

The other problem you may have however is calculating count of matching rows which may be even trickier to slow for complex searches which generate a lot of rows.

Another interesting technique is to use sphinx search to accelerate sorting and retrieval which I should explain in details some time in the future.

8 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Sheeri

17 years ago

You have to make sure, when you’re doing something like forcing an index due to the character of your data, that you check and make sure the character of the data does not change.

Peter Zaitsev

Author

17 years ago

Sheeri,

Yes that is the bummer with force index hint. It is helpful it you know it is best plan to run the query for all constants, if it is not you’re in trouble.

Dmitri Mikhailov

17 years ago

Runtime optimization in OLTP systems has always been expensive; I always try to turn CBO off by all means available. There is nothing wrong about it if: a) the data distribution is well known and b) is not going to be changed and (c) the database design is solid.

Alexey

17 years ago

Why not change cat_id index to include price also? like (cat_id,seller_id, price)?

Also, I’ve always wondered why don’t DB developers implement something like “partial sort” algorithm, which doesn’t sort the whole set, but instead picks top N values. I think such algorithm is O(N) and also it doesn’t any tempfiles, one simple scan.

Peter Zaitsev

Author

17 years ago

Alexey,

This is just example I used. Imagine real case with for example 10 different filters and 5 fields you may sort on… you simply can’t build indexes to cover all combinations and as soon as you skip something you start to get the problem.

It may be with sorting by the date for example etc.

You’re right about sorting – priority queue (for example) based sort would be possible to use without changing semantics for many LIMIT queries. It was even discussed for years by Optimizer team but was not done.

This would not exactly help in this case though as even with partial sort you will need to scan full result set while with scan in index order you can stop as soon as needed number of rows was delivered.

Rob

14 years ago

That you’re aware of, are other RDMBS’s better at selecting indexes when using ORDER BY queries or is this a universal limitation? How do the big players (Oracle, MSSQL) and competing open source products (Postgres) compare?

Vlad Fratila

13 years ago

Hi!
I’m also very interested in Rob’s question about other RDBMSes.
Also, have the latest versions of MySQL improved on this issue at all?

heasily

11 years ago

it’s fuuny…..but i got a trouble。。。。

the table:
ID my_id follow
1 1 16
2 1 15
3 1 14
4 1 14

I just want to find out The highest number of follow and order by follow。

I wrote:
SELECT follow,count(*) AS NUM FROM wp_fans GROUP BY follow order by NUM desc limit 5

Follow was index。

the explain: type:index，Extra:Using index; Using temporary; Using filesort

i am chinese, so my english is bad…..sorry…..

THANKS.

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Using index for ORDER BY vs restricting number of rows.

Related

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis Replication and Auto-Failover With Sentinel Service

Valkey/Redis: Sets and Sorted Sets

Hello World… Hello Valkey! Let’s Get Started!

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Using index for ORDER BY vs restricting number of rows.

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis Replication and Auto-Failover With Sentinel Service

Valkey/Redis: Sets and Sorted Sets

Hello World… Hello Valkey! Let’s Get Started!

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation