If you were interviewing to work at Percona, and I asked you “what does Using filesort mean in EXPLAIN,” what would you say?

I have asked this question in a bunch of interviews so far, with smart people, and not one person has gotten it right. So I consider it to be a bad interview question, and I’m going to put the answer here. If anyone gets it wrong from now on, I know they don’t read this blog!

Using filesort in MySQL

The usual answer is something like “rows are being placed into a temporary table which is too big to fit in memory, so it gets sorted on disk.” Unfortunately, this is not the same thing. First of all, this is Using temporary. Secondly, temporary tables may go to disk if they are too big, but EXPLAIN doesn’t show that. (If I interview you, I might ask you what “too big” means, or I might ask you the other reason temporary tables go to disk!)

The truth is, filesort is badly named. Anytime a sort can’t be performed from an index, it’s a filesort. It has nothing to do with files. Filesort should be called “sort.” It is quicksort at heart.

If the sort is bigger than the sort buffer, it is performed a bit at a time, and then the chunks are merge-sorted to produce the final sorted output. There is a lot more to it than this. I refer you to Sergey Petrunia’s article on How MySQL executes ORDER BY. You can also read about it in our book, but if you read Sergey’s article you won’t need to.

28 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Milos

You should give tips like this more often. 🙂

p.s. Does me using a calculator to calculate my CAPTCHA question mae me any less smart?
.. Maybe because I have a dedicated button for it on my laptop. 🙂

William Newton

I’m sort of surprised more people didn’t get that right. Its often seen with using temporary ( on a poorly optimized query), but still. Oh wait, File – Sort. So they think its going to a File, file.

George

Creating a temp table still requires there to be a table definition file created in the tmpdir.

Salle

Trivial one ..

My standard answer is “filesort refers to the algorithm MySQL is using and has nothing to do with files on disk” and if I’m in a mood to explain further or obliged to I add “you always get filesort with ORDER BY for example”. Yes it’s one of many bad names you can find in MySQL.

It’s bad question for job interviews though especially the way you present it here.

I will hire at once candidate who says “I’m under impression it means writing temp table to disk, but I might be wrong and will check it” and will show the door to a candidate who says “it’s badly named because your blog says so”.

Matter of style indeed. Cheers.

Ryan Lowe

@Salle Not to be pedantic you will not always get filesort with ORDER BY:

mysql> CREATE TABLE t1 (id int unsigned NOT NULL);
Query OK, 0 rows affected (0.05 sec)

mysql> INSERT INTO t1 VALUES (1),(2);
Query OK, 2 rows affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0

mysql> EXPLAIN SELECT * FROM t1 ORDER BY id\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 2
Extra: Using filesort
1 row in set (0.00 sec)

mysql> ALTER TABLE t1 ADD INDEX (id);
Query OK, 2 rows affected (0.03 sec)
Records: 2 Duplicates: 0 Warnings: 0

mysql> EXPLAIN SELECT * FROM t1 ORDER BY id\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: index
possible_keys: NULL
key: id
key_len: 4
ref: NULL
rows: 2
Extra: Using index
1 row in set (0.00 sec)

It is, as Baron noted, when an index cannot be used.

— Ryan Lowe

How much space is required for the filesort? Presumably it needs all the columns in the result plus any used by HAVING?

I am fairly convinced from observing its behaviour that it uses fixed-length records even for VARCHAR etc, which is very bad from the disc usage (and sort_buffer usage) perspective.

Salle

@Ryan

You are right indeed. I missed couple of words in that post. Maybe not the words you expect because there are cases when “you always get filesort with ORDER BY”. There’s example below without the table structure which I leave to you to guess.

The truth is that “always” and “never” are dangerous words in RDBMS realm. The proper answer in almost all cases is “it depends”, but even that is not always true.

Example:

mysql> INSERT INTO fs (id) VALUES (1),(2);
Query OK, 2 rows affected (0.06 sec)
Records: 2 Duplicates: 0 Warnings: 0

mysql> EXPLAIN SELECT * FROM t1 ORDER BY id\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 3
Extra: Using filesort
1 row in set (0.03 sec)

mysql> ALTER TABLE t1 ADD INDEX (id);
Query OK, 3 rows affected (0.09 sec)
Records: 3 Duplicates: 0 Warnings: 0

mysql> EXPLAIN SELECT * FROM t1 ORDER BY id\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 3
Extra: Using filesort
1 row in set (0.00 sec)

Fernando Ipar

@Salle

If t1 has a compound primary key which doesn’t include id, then the index you’re creating won’t be used for the optimization.
Here’s an example with another schema:

mysql> create table t1 (var int not null, id int not null, data int not null, primary key (var,data));
Query OK, 0 rows affected (0.27 sec)

mysql> insert into t1 values (1,1,1),(1,1,5),(1,2,4),(2,1,3);
Query OK, 4 rows affected (0.00 sec)
Records: 4 Duplicates: 0 Warnings: 0

mysql> explain select * from t1 order by id\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 4
Extra: Using filesort
1 row in set (0.00 sec)

mysql> alter table t1 add index(id);
Query OK, 4 rows affected (0.01 sec)
Records: 4 Duplicates: 0 Warnings: 0

mysql> explain select * from t1 order by id\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 4
Extra: Using filesort
1 row in set (0.00 sec)

You can hint the optimizer:

mysql> explain select * from t1 force index (id) order by id\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: index
possible_keys: NULL
key: id
key_len: 4
ref: NULL
rows: 4
Extra:
1 row in set (0.00 sec)

Regards,

sky.jian

ohh,:D

it looks like i my comprehension is right :)

and i had write a blog about “mysql order by” some month ago:

http://www.jianzhaoyang.com/database/mysql_order_by_implement

Explore Oracle

so i think big sorting in mysql is some what same as binary search procedure where searching is done in chunks.

Maurice

Clears up some confusion indeed.

Thanks for posting! 🙂

Dmitry Dulepov

If people do not answer this question incorrectly, it simply means they do not read. The answer is clean and clear at the MySQL web site:

=========================
Using filesort

MySQL must do an extra pass to find out how to retrieve the rows in sorted order. The sort is done by going through all rows according to the join type and storing the sort key and pointer to the row for all rows that match the WHERE clause. The keys then are sorted and the rows are retrieved in sorted order.
=========================

http://dev.mysql.com/doc/refman/5.0/en/using-explain.html

In fact, such quaestions are VERY bad on the interview. It is ok to test skills during the interview but it is bad to ask for very low details. Much better to test if the person can ~find~ the answer to the question. Someone who knows is not necessarily the best. Someone who does not know but can find the answer is typically better.

Salle

Dmitry,

You will be amazed how many MySQL “experts” don’t read the manual. They either “know” or assume.

At the other had digging out such information is difficult. MySQL manual is not easy to navigate and it changes all the time. What’s in there now could be added just couple of hours ago.

So don’t blame people who didn’t know how to find such information.

As for such questions being bad for interviews it depends on what do you want to understand about the candidate and even more what job you are interviewing for.

If it’s about on-site consulting you definitely prefer one being able to answer everything on top of his head.

thomas

Gee, I wonder what it is like to be in an interview and get this question from Baron?

I think I always had the general idea, but reading these posts gave an extra layer.

T

Joey

Thanks for this helpful article and the even more helpful discussion!

rob

@Salle

Reading through this thread what scares me is it appears you also have some misconceptions about MySQL and yet you “show people to the door” who in this case seem to have a good grasp of the answer. Hope I don’t have to cross paths I wouldn’t want to waste my time.

Claudio Nanni

@Salle: Your Post #8

I see you insert two records in the table ‘fs’ then EXPLAIN table ‘t1’ which has three records.
In my opinion your example is not very clear. Moreover I get the behaviour as explained by Baron.
Am I missing something?

>>>
(root@localhost) [test] CREATE TABLE t1 (id int unsigned NOT NULL);
Query OK, 0 rows affected (0.13 sec)

(root@localhost) [test] INSERT INTO t1 (id) VALUES (1),(2);
Query OK, 2 rows affected (0.14 sec)
Records: 2 Duplicates: 0 Warnings: 0

(root@localhost) [test] EXPLAIN SELECT * FROM t1 ORDER BY id\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 2
Extra: Using filesort
1 row in set (0.00 sec)

(root@localhost) [test] ALTER TABLE t1 ADD INDEX (id);
Query OK, 2 rows affected (0.33 sec)
Records: 2 Duplicates: 0 Warnings: 0

(root@localhost) [test] EXPLAIN SELECT * FROM t1 ORDER BY id\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: index
possible_keys: NULL
key: id
key_len: 4
ref: NULL
rows: 2
Extra: Using index
1 row in set (0.00 sec)

(root@localhost) [test]
<<<

Michael Baker

This was very interesting to find, and it lead me to Percona for a consult. I knew what the filesort meant, but am having a hard time composing the index to reduce the number of rows searched.

I would very much like the chance to interview to become a freelance consultant for Percona. Although my regular business keeps me very busy, I’m always willing to help other people when I can, and I have a son in college – so I could use the extra income!

Jeremy

The MySQL manual has a full description of what happens during a filesort in http://dev.mysql.com/doc/refman/5.1/en/order-by-optimization.html.

Karthikeyan

This EXPLAIN could be the right fit to the above note. Even though its a 25 record, it takes prolonged time.

query result
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE sol ALL (NULL) (NULL) (NULL) (NULL) 3 Using temporary; Using filesort
1 SIMPLE dom ALL (NULL) (NULL) (NULL) (NULL) 36
1 SIMPLE str ALL (NULL) (NULL) (NULL) (NULL) 45
1 SIMPLE ctd ALL (NULL) (NULL) (NULL) (NULL) 61
1 SIMPLE ctf eq_ref PRIMARY PRIMARY 4 lds.ctd.ctd_cou_id 1
1 SIMPLE cli eq_ref PRIMARY PRIMARY 4 lds.ctf.ctf_cli_id 1
1 SIMPLE cog eq_ref PRIMARY PRIMARY 4 lds.ctf.ctf_cog_id 1
1 SIMPLE vis eq_ref PRIMARY PRIMARY 4 lds.ctf.ctf_vis_id 1
1 SIMPLE ind eq_ref PRIMARY PRIMARY 4 lds.ctf.ctf_ind_id 1
1 SIMPLE cts ALL (NULL) (NULL) (NULL) (NULL) 99 Using where
1 SIMPLE cb eq_ref PRIMARY,cou_id PRIMARY 4 lds.ctf.ctf_cou_id 1 Using where

Mikhail

Here’s a trivial but interesting example:

EXPLAIN SELECT 20 AS C UNION SELECT 10 AS C ORDER BY C

id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY NULL NULL NULL NULL NULL NULL NULL No tables used
2 UNION NULL NULL NULL NULL NULL NULL NULL No tables used
NULL UNION RESULT ALL NULL NULL NULL NULL NULL Using filesort

Obviously there’s no merge-sorting here, and no hard drive files are in use.

Salman

I looked at the “ORDER BY OPTIMIZATION” section in MySQL manual but could not find a reason why MySQL does not use index in the following scenario:

CREATE TABLE table1 (
id bigint(20) NOT NULL AUTO_INCREMENT,
a varchar(100) DEFAULT NULL,
b varchar(100) DEFAULT NULL,
PRIMARY KEY (id),
KEY (a),
KEY (b)
) ENGINE=InnoDB

Table 1 has 512000 randomly generated rows. This query uses index (type: index, key: a, key_len: 103, Extra: ”):

EXPLAIN SELECT * FROM table1 ORDER BY a LIMIT 1698, 1

This one uses filesort (Extra: Using filesort):

EXPLAIN SELECT * FROM table1 ORDER BY a LIMIT 1699, 1

Can someone EXPLAIN if there is a rule that MySQL uses to determine whether to use index or filesort? On mysql manual I notice that the above query satisfies all conditions required for MySQL to use an index but it does not. I just see this note:

With EXPLAIN SELECT … ORDER BY, you can check whether MySQL can use indexes to resolve the query. It cannot if you see Using filesort in the Extra column.

human

Salman: Your experience is probably because your index does not fix in memory – read the manual with respect to innodb buffer and check your config.