July 28, 2014

How expensive is a WHERE clause in MySQL?

This is a fun question I’ve been wanting to test for some time.  How much overhead does a trivial WHERE clause add to a MySQL query?  To find out, I set my InnoDB buffer pool to 256MB and created a table that’s large enough to test, but small enough to fit wholly in memory:

I repeated the last statement until it got slow enough that I thought I could do a reasonable test; at this point there were 8388608 rows. I then optimized the table and restarted MySQL. The table ended up at 237MB.

Now let’s see how long a table scan with no WHERE clause takes:

I repeated this ten times and averaged the time: 5.01 seconds, ranging from 4.83 to 5.17 seconds. Next I ran it with a trivial WHERE clause:

(Pop quiz: do I need SQL_NO_CACHE on this query?) The average execution time for this query is 7.96, ranging from 7.65 to 8.15 seconds. So apparently the overhead of the WHERE clause is about 2.95 seconds, or 351 nanoseconds per row. It doesn’t seem like much per row, but it adds up to about an extra 60% cost for the query. If I add another WHERE clause,

The average time is 9.39 seconds, or an added 87% overhead.

It would be interesting, in an academic kind of way, to test different data types and various complexities of WHERE clauses, but I’m not really interested enough to spend much time on it. I just wondered whether the WHERE clause would even be noticeable. This test doesn’t really reveal anything you can do to make your queries faster — you usually can’t optimize the WHERE clause itself, only the way that the MySQL optimizer chooses to apply the clause to indexes. (However, as I found out some time ago, some date functions are faster than others. That would be interesting to look into more.)

About Baron Schwartz

Baron is the lead author of High Performance MySQL.
He is a former Percona employee.

Comments

  1. Martin says:

    re Pop quiz: do I need SQL_NO_CACHE on this query?
    I’m still new to this but I’m going to say ‘NO’ as you are using a variable (current_date) in the query instead of a fixed date (ie ’2008-10-29′). Is that correct.

    I am amazed at that overhead! Would using MyIsam and/or adding an INDEX on the date field make any difference?

    Cheers
    Martin

  2. Martin, you’re right about current_date.

    The WHERE is implemented at the server level so it ought not to depend on the engine or indexes.

  3. Mark says:

    I just ran some tests, and on my system the performance is improved by an index:
    CREATE INDEX a_idx ON t(a);

    Rows: 8388608
    Time without WHERE: 4.8 seconds
    Time with WHERE, without index: 5.9 seconds
    Time with WHERE, with index: 5.2 seconds

    My results are that the index reduces the additional overhead from the WHERE clause by half.

  4. And with the index, but without WHERE?

  5. Mark says:

    That’s unchanged – 4.8 seconds.

  6. Of course — because it’s using the primary key.

    My point in asking those questions was to get you thinking about the query plan. You can’t compare the primary key to the secondary key and say that the difference is due to where clauses or not having where clauses. I think you’re misunderstanding how the storage engine api and InnoDB indexes work.

  7. Martin says:

    I’m confused. Where did the Primary Key come from in:
    CREATE TABLE t (a date NOT NULL) ENGINE=InnoDB DEFAULT CHARSET=latin1;
    ?

  8. InnoDB has a hidden 6-byte primary key.

  9. Martin says:

    Well that is three things I have learned in one post – awesome!
    Thx
    Marty

  10. MC.Spring says:

    I think you should use none function in the where case for the last statement, while that maybe took a little time down.

  11. How much of this is actually overhead because of the comparison versus the where?

    ie, can you compare with

    WHERE TRUE
    and
    WHERE 1=1
    and
    WHERE ‘a’=’a’? That would be much more interesting, because the 60% or 87% are likely because of the access and comparison costs. But how much is the actual WHERE costing?

    There’s also explain plan overhead.

  12. Pep Pla says:

    Sheri,

    MySQL Optimizer does something called: Constant condition removal.
    This means that a condition like this TRUE or 5=5 is going to be
    removed by the optimizer prior to query execution.

    Mark,
    Is it possible that using a index is faster because counting index records is
    faster than counting table records? It is important to realize that all the
    index keys are the same so probably the table is larger than the index and
    executing the query takes less io -> faster. (I’ll check table size vs. index size)

    Pep

  13. Pep Pla says:

    Show table status gives half the size of the table to the index.

    Also explain gives half (???) the rows if using the index.

    I can understand that the index takes less space than the table as all the keys are the same but why the optimizer believes that using the index requieres half the rows? because is a b-tree?

    Pep

  14. Pep, yes, the index is half the physical size. The row estimate is only an estimate. The stats are inexact.

  15. gizg says:

    All well and good being a student with plenty of time on your hands to do such trivial (read: useless) tests!
    HOWEVER, try and write a query in a real world web app that can do without a WHERE clause!?!?!?!?!

  16. Cpt. Contrarian says:

    @gizg
    >HOWEVER, try and write a query in a real world web app that can do without a WHERE clause!?!?!?!?!

    I’ve got one. I found this page because I was looking for an explanation for how my JOIN operation could get faster when I took out the WHERE clause even though that greatly multiplied the JOIN size.

    Because the other side of the JOIN already limited the records as needed, the WHERE clause on the first subquery was unneeded. I assumed that having a smaller resultset would increase performance because less memory was copied and used; however, instead, having the WHERE clause on that one table tripled the overall runtime of the query. Upon reflection, I presume that’s because the first subquery requires the database engine to set up a temporary table to store the resultset, while a SELECT without a WHERE means that it can read directly from the source database (likely cached) when it creates the JOINed resultset.

    So, this is interesting research; and I am keen enough to find out more that I’m going to do tests of my own.

  17. Michael says:

    If you use functions on the ‘left hand side’ of a where clause, you ignore any indexes. “and left(a, 10) = ’2008-10-29′” should be “and ’2008-10-29′ = left(a, 10)”

Speak Your Mind

*