April 16, 2014

Optimizing repeated subexpressions in MySQL

How smart is the MySQL optimizer? If it sees an expression repeated many times, does it realize they’re all the same and not calculate the result for each of them?

I had a specific case where I needed to find out for sure, so I made a little benchmark. The query looks something like this:

I wrote a bash script to execute the query 500 times and redirect the results to /dev/null. Results: 31s wall-clock time.

The following is an alternative query that returns the same result:

This query runs 500 iterations in 12s. So it looks like the optimizer was computing the SUM() 18 times in the first query. (A response on the internals mailing list agreed with this finding. Thanks Konstantin.)

Along the same lines, I benchmarked POW() and found that squaring a number is more than twice as fast to do with ordinary multiplication as with POW(). Does this matter? Sometimes… we had a client who was doing vector math in a many-dimensional space, and the POW() seemed to be taking a pretty significant amount of time; changing to ordinary multiplication made the query faster. (It was not the ultimate suggestion though.)

About Baron Schwartz

Baron is the lead author of High Performance MySQL. He maintains a personal blog at Xaprb. Follow him at @xaprb or connect with him on LinkedIn.

Comments

  1. Lukas says:

    Its always worthwhile to look at expensive expressions like that if they are repeated. In some cases I have turned these into subqueries in the SELECT clause. Even though MySQL is very badly optimized for these kinds of subqueries, usually the result sets are very small for these cases and so it does not hurt that much and is radically offset by not having to do the same computation multiple times.

  2. Sergei Golubchik says:

    Just to say that your workaround isn’t guaranteed to work. The server is not required to evaluate expressions in the SELECT clause in the order they are listed. The fact that it does is an implementation detail that may change any time.

  3. Lukas says:

    @Sergei: So the subquery solution is safer?

  4. Sergei, indeed: in cases where it matters, such as in mk-table-checksum, I run test queries to find out which place the “real” expression needs to go and where the user variables can be substituted. The result is often that the real expression goes LAST, not first! It looks odd but it’s true — you end up using @variable apparently before it’s assigned to. In this case it needed to be first.

  5. Sergei Golubchik says:

    sorry, bad timing :) we were commenting at the same time, apparently, at least, your comment wasn’t here when I was writing mine.

    I mean that Baron’s workaround relies on the order in which SELECT expressions are evaluated. And this order is undefined, afaik.

  6. Hi,

    To solve the issue of whether @sum is calculated first, the query can be rewritten as:
    SELECT sql_no_cache
    pow(@sum, 1),
    pow(@sum, 2),
    pow(@sum, 3),
    pow(@sum, 4),
    pow(@sum, 5),
    pow(@sum, 6),
    pow(@sum, 7),
    pow(@sum, 8),
    pow(@sum, 10),
    pow(@sum, 11),
    pow(@sum, 12),
    pow(@sum, 13),
    pow(@sum, 14),
    pow(@sum, 15),
    pow(@sum, 16),
    pow(@sum, 17),
    pow(@sum, 18)
    FROM
    (SELECT @sum := (SELECT sum(rental_id) FROM sakila.rental)) a

    In this query @sum is guaranteed to be calculated before all the pow(…) take place.

  7. Sergei Golubchik says:

    You don’t need a user variable in your solution:

    SELECT
    pow(sum, 1),
    pow(sum, 2),
    pow(sum, 3),
    pow(sum, 4),
    pow(sum, 5),
    pow(sum, 6),
    pow(sum, 7),
    pow(sum, 8),
    pow(sum, 10),
    pow(sum, 11),
    pow(sum, 12),
    pow(sum, 13),
    pow(sum, 14),
    pow(sum, 15),
    pow(sum, 16),
    pow(sum, 17),
    pow(sum, 18)
    FROM (SELECT sum(rental_id) as sum FROM sakila.rental) a

    you can even write ”


    SELECT sum(rental_id) as sum FROM sakila.rental GROUP BY customer_id

    which isn’t possible with user variables.

  8. Ms. Anthrope says:

    Notably, Postgres does not have this particular deficiency. I often run into optimizer deficiencies in MySQL with my clients (this being one of the more minor sorts) and find myself longing for my younger days with the PGSQL.

  9. Lukas says:

    @Sergei: thats exactly the approach I meant in my original approach.

    @Ms. Anthrope: right .. then again in some cases its nicer for the eyes to still do this subquery if the expression is very long.

  10. peter says:

    Sergei,

    Do you know if this is on the roadmap to be fixed natively by the way ?

  11. Sergei Golubchik says:

    To recognize common subexpressions ? No, I don’t know.

Speak Your Mind

*