July 24, 2006

Extended EXPLAIN

Posted by peter |

One nice feature added for EXPLAIN statement in MySQL 4.1 is EXTENDED keyword which provides you with some helpful additional information on query optimization. It should be used together with SHOW WARNINGS to get information about how query looks after transformation as well as what other notes optimizer may wish to tell us. It is best seen by example:

SQL:
  1. mysql> EXPLAIN extended SELECT * FROM sbtest WHERE id>5 AND id>6 AND c="a" AND pad=c;
  2. +----+-------------+--------+-------+---------------+---------+---------+------+--------+-------------+
  3. | id | select_type | TABLE  | type  | possible_keys | KEY     | key_len | ref  | rows   | Extra       |
  4. +----+-------------+--------+-------+---------------+---------+---------+------+--------+-------------+
  5. 1 | SIMPLE      | sbtest | range | PRIMARY       | PRIMARY | 4       | NULL | 422510 | USING WHERE |
  6. +----+-------------+--------+-------+---------------+---------+---------+------+--------+-------------+
  7. 1 row IN SET, 1 warning (0.02 sec)
  8.  
  9. mysql> SHOW warnings \G
  10. *************************** 1. row ***************************
  11.   Level: Note
  12.    Code: 1003
  13. Message: SELECT `test`.`sbtest`.`id` AS `id`,`test`.`sbtest`.`k` AS `k`,`test`.`sbtest`.`c` AS `c`,`test`.`sbtest`.`pad` AS `pad` FROM `test`.`sbtest` WHERE ((`test`.`sbtest`.`id`> 5) AND (`test`.`sbtest`.`id`> 6) AND (`test`.`sbtest`.`c` = _utf8'a') AND (`test`.`sbtest`.`pad` = _utf8'a'))
  14. 1 row IN SET (0.00 sec)

As you see after query transformation "*" was expanded to full column list, each column was fully quantified plus MySQL optimizer converted
c="a" and pad=c; to c="a" and pad="a". MySQL optimizer could not however catch id>5 and id>6 can be converted just to id>6.

Lets see another example:

SQL:
  1. mysql> EXPLAIN extended SELECT t1.id,t2.pad FROM sbtest t1, sbtest t2 WHERE t1.id=5 AND t2.k=t1.k;
  2. +----+-------------+-------+-------+---------------+---------+---------+-------+-------+-------+
  3. | id | select_type | TABLE | type  | possible_keys | KEY     | key_len | ref   | rows  | Extra |
  4. +----+-------------+-------+-------+---------------+---------+---------+-------+-------+-------+
  5. 1 | SIMPLE      | t1    | const | PRIMARY,k     | PRIMARY | 4       | const |     1 |       |
  6. 1 | SIMPLE      | t2    | ref   | k             | k       | 4       | const | 55561 |       |
  7. +----+-------------+-------+-------+---------------+---------+---------+-------+-------+-------+
  8. 2 rows IN SET, 1 warning (0.00 sec)
  9.  
  10. mysql> SHOW warnings \G
  11. *************************** 1. row ***************************
  12.   Level: Note
  13.    Code: 1003
  14. Message: SELECT `test`.`t1`.`id` AS `id`,`test`.`t2`.`pad` AS `pad` FROM `test`.`sbtest` `t1` JOIN `test`.`sbtest` `t2` WHERE ((`test`.`t2`.`k` = `test`.`t1`.`k`) AND (`test`.`t1`.`id` = 5))
  15. 1 row IN SET (0.00 sec)

In this case we can see table t1 is accessed using "const" access type, which means it should pre-read value for "t1.k" and place it into the query. This transformation however is not shown by EXTENDED EXPLAIN for some reason, while it would be quite helpful.

Finally lets look at third example:

SQL:
  1. mysql> EXPLAIN extended SELECT * FROM sbtest WHERE id IN (SELECT id FROM sbtest WHERE id BETWEEN 1 AND 10);
  2. +----+--------------------+--------+-----------------+---------------+---------+---------+------+---------+--------------------------+
  3. | id | select_type        | TABLE  | type            | possible_keys | KEY     | key_len | ref  | rows    | Extra                    |
  4. +----+--------------------+--------+-----------------+---------------+---------+---------+------+---------+--------------------------+
  5. 1 | PRIMARY            | sbtest | ALL             | NULL          | NULL    | NULL    | NULL | 1000109 | USING WHERE              |
  6. 2 | DEPENDENT SUBQUERY | sbtest | unique_subquery | PRIMARY       | PRIMARY | 4       | func |       1 | USING INDEX; USING WHERE |
  7. +----+--------------------+--------+-----------------+---------------+---------+---------+------+---------+--------------------------+
  8. 2 rows IN SET, 1 warning (0.03 sec)
  9.  
  10. mysql> SHOW warnings \G
  11. *************************** 1. row ***************************
  12.   Level: Note
  13.    Code: 1003
  14. Message: SELECT `test`.`sbtest`.`id` AS `id`,`test`.`sbtest`.`k` AS `k`,`test`.`sbtest`.`c` AS `c`,`test`.`sbtest`.`pad` AS `pad` FROM `test`.`sbtest` WHERE <in_optimizer>(`test`.`sbtest`.`id`,<exists>(<primary_index_lookup>(<cache>(`test`.`sbtest`.`id`) IN sbtest ON PRIMARY WHERE (`test`.`sbtest`.`id` BETWEEN 1 AND 10))))
  15. 1 row IN SET (0.00 sec)

As you see in this case there is planty of extra information in the message. We can see query is using , it is converted to using . And there is some in being done.

Unfortnuately there is yet not too much does available about what each of these means so you should guess from names or take a look at the sources.

To note about last query - even if it looks like there is a whole big deal of optimization being done query is executed extremely poor, completing in 16 seconds, while if you just run this subquery manually and use IN (1,2,3,...10) it completes in tiny fraction of the second.

Anyway EXPLAIN EXTENDED is very valuable addition to EXPLAIN for MySQL Performance troubleshooting.

MySQL EXPLAIN limits and errors.

Posted by peter |

Running EXPLAIN for problematic queries is very powerful tool for MySQL Performance optimization. If you've been using this tool a lot you probably noticed it is not always provide adequate information. Here is list of things you may wish to watch out.

EXPLAIN can be wrong - this does not happen very often but it does. EXPLAIN may not be telling you the truth, or full truth about query execution. The reason is - MySQL does not really have the special "plan generating" step, instead plan is generated as a part of query execution, sometimes being dynamic based on data. With EXPLAIN MySQL kind of simulates execution but it obviously does not access the data so does not have access to this dynamic component. If you suspect EXPLAIN is lieing you you can use SHOW STATUS "Handler" statistics to see if number of operations match.

EXPLAIN works for SELECT only This is in works to be fixed in some of next version but it is so as of MySQL 5.0 The workaround is to modify statement to become SELECT with appropriate where clause. This however would not tell you full story, especially now as MySQL has triggers which can make updates very different. Also there is no way to tell you how ALTER table would be executed - will in add index live as it can now do on Cluster ? Is it expected to rebuild index by sort of key_cache as it can for MyISAM.

EXPLAIN may take long time In MySQL 5.0 with addition of greedy join the problem on looking at too many table join combinations in joins with very large tables is elevated, however there are still cases when it may take too long. Most practically important one is if Derived tables (selects in FROM clause) are used. MySQL will execute these subselects on explain to provide plan for top most statement, so if you made an error in select in from clause EXPLAIN might never complete. Solution is to run EXPLAIN for statements in FROM clause separately before looking at main query.

Estimated number of rows may be very inaccurate It is well known this number is aproximate but I guess not everyone knows how much wrong it could be. If number of rows is estimated performing btree dive with certainly structured tree you still can get number of rows off 3-10 times. It gets even worse if cardinality is used. In this case if data distribution is skewed and you statement looks at portion of data with different distribution results can be off by 10-100 times and even more.

Computing rows to be examined is more complicated It is frequent approach to take number of rows estimated from each row in a join and multiply them to see how many rows will be examined in total. This can be rather wrong. One thing to consider is number of rows analyzed for given table might not be the same as passed to the next step - there could be some non-index based filtering applied. The other reason for difference is join_buffer which reduces number of passes MySQL had to perform and so number of row combinations really examined.

LIMIT is not taken into account while estimating number of rows Even if you have LIMIT which restricts how many rows will be examined MySQL will still print full number. Here is example:

SQL:
  1. mysql> EXPLAIN SELECT * FROM lt LIMIT 10;
  2. +----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
  3. | id | select_type | TABLE | type  | possible_keys | KEY     | key_len | ref  | rows | Extra       |
  4. +----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
  5. 1 | SIMPLE      | lt    | INDEX | NULL          | PRIMARY |       4 | NULL | 1600 | USING INDEX |
  6. +----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
  7. 1 row IN SET (0.00 sec)

This statement obviously will not scan more than 10 rows but EXPLAIN shows it will scan over 1000. This is not the problem if you examine your queries manually but if you have automatic tool which catches possibly complex queries it can give you head ache.

filesort, using temporary are always in the first line So it is impossible to tell at which stage MySQL actually perform the sort or creates temporary table and so how much rows will be stored to temporary table. Sometimes it is enough to perform sorting or use temporary table with partial result set (partial rows) and than compose full rows by reading data from other tables.

No information about using disk for sort or temporary tables MySQL always calls it "filesort" even if it is done in memory. Also it would not tell you if temporary table will be required on disk. In many cases it is impossible to tell as we do not know how much data we'll need to sort or accumulate. This information is not available even if it is well known - for example BLOB fields can't work with in memory temporary table. The other thing - since MySQL 4.1 sorting can be done with storing row data in the sort file, not only key values and row pointers. If this is going to be used in particular sort is not seen in explain.

Limited cost information Yes, number of rows to be scanned is shown but it does not tell much about cost of the operation - full table scan of table in memory may be done at speed of millions of rows per second, while retrieval of large rows from uncached table by random index values (or even range) may be perfoming at some 100 rows/sec.

No info about functions cost Since MySQL 5.0 you may use user functions in your query which may also perform queries themselves. These actions are totally hidden in EXPLAIN which may seriously affect the outcome. The cost of functions is also not used in optimization which may produce suboptimal plans if stored functions are used in queries.

In general in my opinion EXPLAIN needs serious overhaul so it can be used with GUI tool to provide tree like structure of opertions applied to the data, methods used, costs etc. Simple table like output format is good but with growing MySQL Server complexity it is getting impossible to present all relevant information this way, so we need at least an alternative view.

If you want to find more information about understanding EXPLAIN output, MySQL manual EXPLAIN page is good place to start.