September 18, 2007

Possible optimization for sort_merge and UNION ORDER BY LIMIT

Posted by peter

Every so often you need to perform sort results retrieved from MySQL when your WHERE clause goes beyound col=const values which would allow MySQL to still use second portion of the index for the order by. Ranges as well as IN lists make this optimization impossible, not even speaking about index merge optimization. Lets look at this example:

SQL:
  1. CREATE TABLE `utest` (
  2.   `c1` int(10) UNSIGNED NOT NULL,
  3.   `c2` int(10) UNSIGNED NOT NULL,
  4.   `ord` int(10) UNSIGNED NOT NULL,
  5.   KEY `c1` (`c1`,`ord`),
  6.   KEY `c2` (`c2`,`ord`)
  7. ) ENGINE=MyISAM DEFAULT CHARSET=latin1
  8.  
  9. mysql> EXPLAIN SELECT * FROM utest WHERE c1=5 OR c2=5 ORDER BY ord DESC LIMIT 10 \G
  10. *************************** 1. row ***************************
  11.            id: 1
  12.   select_type: SIMPLE
  13.         TABLE: utest
  14.          type: index_merge
  15. possible_keys: c1,c2
  16.           KEY: c1,c2
  17.       key_len: 4,4
  18.           ref: NULL
  19.          rows: 162380
  20.         Extra: USING sort_union(c1,c2); USING WHERE; USING filesort
  21. 1 row IN SET (0.00 sec)

As you can see MySQL 5.1.21 uses sort_merge to access the rows but it can't use it together with order by efficiently.
I'd say this is limitation as for this case you well could be retrieving sorted streams by the indexes and doing merge sort to get resulted rows in sorted order. Once this is implemented similar approach could be used for optimizing ORDER BY with IN.

This original query (in memory data) takes 1sec. Let's see how classic pre MySQL 5.0 solution - using UNION instead of single query works in this case:

SQL:
  1. mysql> EXPLAIN (SELECT * FROM utest WHERE c1=5) union (SELECT * FROM utest WHERE c2=5) ORDER BY ord DESC LIMIT 10 \G
  2. *************************** 1. row ***************************
  3.            id: 1
  4.   select_type: PRIMARY
  5.         TABLE: utest
  6.          type: ref
  7. possible_keys: c1
  8.           KEY: c1
  9.       key_len: 4
  10.           ref: const
  11.          rows: 108112
  12.         Extra:
  13. *************************** 2. row ***************************
  14.            id: 2
  15.   select_type: UNION
  16.         TABLE: utest
  17.          type: ref
  18. possible_keys: c2
  19.           KEY: c2
  20.       key_len: 4
  21.           ref: const
  22.          rows: 54268
  23.         Extra:
  24. *************************** 3. row ***************************
  25.            id: NULL
  26.   select_type: UNION RESULT
  27.         TABLE: <union1,2>
  28.          type: ALL
  29. possible_keys: NULL
  30.           KEY: NULL
  31.       key_len: NULL
  32.           ref: NULL
  33.          rows: NULL
  34.         Extra: USING filesort
  35. 3 rows IN SET (0.00 sec)

The query looks scare but in fact in completes in 1.05 sec not significantly faster than sort index merge in this case.

As the query time implies MySQL is not smart enough in this case to "dive into" the union and add ORDER BY ORD LIMIT 10 to individual queries.
What if we do int manually ?

SQL:
  1. mysql> EXPLAIN (SELECT * FROM utest WHERE c1=5 ORDER BY ord DESC LIMIT 10) union (SELECT * FROM utest WHERE c2=5 ORDER BY ord DESC LIMIT 10) ORDER BY ord DESC LIMIT 10 \G
  2. *************************** 1. row ***************************
  3.            id: 1
  4.   select_type: PRIMARY
  5.         TABLE: utest
  6.          type: ref
  7. possible_keys: c1
  8.           KEY: c1
  9.       key_len: 4
  10.           ref: const
  11.          rows: 108112
  12.         Extra: USING WHERE
  13. *************************** 2. row ***************************
  14.            id: 2
  15.   select_type: UNION
  16.         TABLE: utest
  17.          type: ref
  18. possible_keys: c2
  19.           KEY: c2
  20.       key_len: 4
  21.           ref: const
  22.          rows: 54268
  23.         Extra: USING WHERE
  24. *************************** 3. row ***************************
  25.            id: NULL
  26.   select_type: UNION RESULT
  27.         TABLE: <union1,2>
  28.          type: ALL
  29. possible_keys: NULL
  30.           KEY: NULL
  31.       key_len: NULL
  32.           ref: NULL
  33.          rows: NULL
  34.         Extra: USING filesort
  35. 3 rows IN SET (0.00 sec)

As you can see explain does not really change (it does not show if index is used for finding rows or sorting as well) but query speed goes to less than
0.01 sec.

So it is great optimization which you need to do manuall for the time being.

What is also interesting is the fact MySQL is unable to handle even basic UNION with limit (without order by) optimally - in creates result set for the union fully and when only takes 10 rows from it:

SQL:
  1. mysql> EXPLAIN (SELECT * FROM utest WHERE c1=5 ) union (SELECT * FROM utest WHERE c2=5 ) LIMIT 10 \G
  2. *************************** 1. row ***************************
  3.            id: 1
  4.   select_type: PRIMARY
  5.         TABLE: utest
  6.          type: ref
  7. possible_keys: c1
  8.           KEY: c1
  9.       key_len: 4
  10.           ref: const
  11.          rows: 108112
  12.         Extra:
  13. *************************** 2. row ***************************
  14.            id: 2
  15.   select_type: UNION
  16.         TABLE: utest
  17.          type: ref
  18. possible_keys: c2
  19.           KEY: c2
  20.       key_len: 4
  21.           ref: const
  22.          rows: 54268
  23.         Extra:
  24. *************************** 3. row ***************************
  25.            id: NULL
  26.   select_type: UNION RESULT
  27.         TABLE: <union1,2>
  28.          type: ALL
  29. possible_keys: NULL
  30.           KEY: NULL
  31.       key_len: NULL
  32.           ref: NULL
  33.          rows: NULL
  34.         Extra:
  35. 3 rows IN SET (0.00 sec)

Such query takes about 0.9 sec on the test data. Another possible optimizer improvement to do.

P.S This post is inspired by Does MySQL Optimize UNION with LIMIT clause topic on our MySQL Forums.

Related posts: :UNION vs UNION ALL Performance::MySQL: Followup on UNION for query optimization, Query profiling::ORDER BY … LIMIT Performance Optimization:
 

9 Comments »

  1. Peter,

    Sorry, but you made a very small mistake …. It should be UNION ALL and not just UNION . As you know, UNION without attribute is UNION DISTINCT.

    Comment :: September 18, 2007 @ 10:00 am

  2. Yes, Sinisa I know

    But this is closer to what A=5 or B=5 does. If you have UNION ALL rows which have A=5 and B=5 will be included twice in result set :)

    Comment :: September 18, 2007 @ 11:22 am

  3. 3. David

    testing on my notebook and utest with more than 1,000,000 rows, with mysql 5.0.45

    (SELECT * FROM utest WHERE c1=5) union (SELECT * FROM utest WHERE c2=5) ORDER BY ord DESC LIMIT 10;
    10 rows in set (0.31 sec)

    (SELECT * FROM utest WHERE c1=5 ORDER BY ord DESC LIMIT 10) union (SELECT * FROM utest WHERE c2=5 ORDER BY ord DESC LIMIT 10) ORDER BY ord DESC LIMIT 10
    10 rows in set (0.25 sec)

    the speed is not so big differenet than you did

    Comment :: September 18, 2007 @ 8:19 pm

  4. David,

    Can you do EXPLAIN on your queries ?

    0.25 sec is just very wrong time for such query as all what needs to be done in second query is fetching 10 rows by one index when 10 rows by another index and them sorting them.

    Comment :: September 19, 2007 @ 2:35 am

  5. Peter,

    You are right about rows with A=5 and B=5 being included twice, but you can add … AND B != 5 .. etc… It would still be significantly faster then with UNION DISTINCT.

    Comment :: September 19, 2007 @ 6:58 am

  6. 6. David

    Peter,
    Sorry I made a mistake. Finally the query
    (SELECT * FROM utest WHERE c1=5 ORDER BY ord DESC LIMIT 10) union (SELECT * FROM utest WHERE c2=5 ORDER BY ord DESC LIMIT 10) ORDER BY ord DESC LIMIT 10,takes only 0.03

    Comment :: September 19, 2007 @ 6:11 pm

  7. [...] I was comparing performance of UNION vs MySQL 5.0 index merge algorithm Sinisa pointed out I should be using UNION ALL instead of simple UNION in my benchmarks, and he was [...]

    Pingback :: October 5, 2007 @ 2:01 pm

  8. 8. Deepali

    thanks a lot i was searching for one of my query, i would love to specially thanks to Devid. because i just wanted query which he has posted here.
    many thanks.

    Comment :: February 7, 2008 @ 11:37 pm

  9. 9. Ken

    I’m creating a shopping cart type system where users can add a bunch of products. It’s important that users can change the display order of products. This way they can showcase certain items on top.

    All of this is stored in the database obviously.

    There is a problem though, and it’s with the sort order storage in the database. For instance, if I have 5 products, I can set it like this:

    01 A
    10 B
    20 C
    30 D
    40 E

    The number is the “sort” field, and the letter is the “product” field. I’d then set an index for the “sort” field and the listings would be fast.

    Now, suppose I wanted to re-order E to position number two (right now it’s position number 5). I could do this:

    01 A
    05 E
    10 B
    20 C
    30 D

    The problem here is, if I keep re-sorting, I’d run out of numbers in between and will have to re-order everything. This can be a potential nightmare if I have thousands of products.

    Is there a better way of doing this?

    Comment :: June 27, 2008 @ 12:45 am

 

Subscribe without commenting


This page was found by: order by and union i... possible use of unio... sql union order by l... mysql 5 select_type mysql optimizing uni...