This is a fun question I’ve been wanting to test for some time.  How much overhead does a trivial WHERE clause add to a MySQL query?  To find out, I set my InnoDB buffer pool to 256MB and created a table that’s large enough to test, but small enough to fit wholly in memory:

I repeated the last statement until it got slow enough that I thought I could do a reasonable test; at this point there were 8388608 rows. I then optimized the table and restarted MySQL. The table ended up at 237MB.

Now let’s see how long a table scan with no WHERE clause takes:

I repeated this ten times and averaged the time: 5.01 seconds, ranging from 4.83 to 5.17 seconds. Next I ran it with a trivial WHERE clause:

(Pop quiz: do I need SQL_NO_CACHE on this query?) The average execution time for this query is 7.96, ranging from 7.65 to 8.15 seconds. So apparently the overhead of the WHERE clause is about 2.95 seconds, or 351 nanoseconds per row. It doesn’t seem like much per row, but it adds up to about an extra 60% cost for the query. If I add another WHERE clause,

The average time is 9.39 seconds, or an added 87% overhead.

It would be interesting, in an academic kind of way, to test different data types and various complexities of WHERE clauses, but I’m not really interested enough to spend much time on it. I just wondered whether the WHERE clause would even be noticeable. This test doesn’t really reveal anything you can do to make your queries faster — you usually can’t optimize the WHERE clause itself, only the way that the MySQL optimizer chooses to apply the clause to indexes. (However, as I found out some time ago, some date functions are faster than others. That would be interesting to look into more.)

17 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Martin

re Pop quiz: do I need SQL_NO_CACHE on this query?
I’m still new to this but I’m going to say ‘NO’ as you are using a variable (current_date) in the query instead of a fixed date (ie ‘2008-10-29’). Is that correct.

I am amazed at that overhead! Would using MyIsam and/or adding an INDEX on the date field make any difference?

Cheers
Martin

Mark

I just ran some tests, and on my system the performance is improved by an index:
CREATE INDEX a_idx ON t(a);

Rows: 8388608
Time without WHERE: 4.8 seconds
Time with WHERE, without index: 5.9 seconds
Time with WHERE, with index: 5.2 seconds

My results are that the index reduces the additional overhead from the WHERE clause by half.

Mark

That’s unchanged – 4.8 seconds.

Martin

I’m confused. Where did the Primary Key come from in:
CREATE TABLE t (a date NOT NULL) ENGINE=InnoDB DEFAULT CHARSET=latin1;
?

Martin

Well that is three things I have learned in one post – awesome!
Thx
Marty

MC.Spring

I think you should use none function in the where case for the last statement, while that maybe took a little time down.

Sheeri K. Cabral

How much of this is actually overhead because of the comparison versus the where?

ie, can you compare with

WHERE TRUE
and
WHERE 1=1
and
WHERE ‘a’=’a’? That would be much more interesting, because the 60% or 87% are likely because of the access and comparison costs. But how much is the actual WHERE costing?

There’s also explain plan overhead.

Pep Pla

Sheri,

MySQL Optimizer does something called: Constant condition removal.
This means that a condition like this TRUE or 5=5 is going to be
removed by the optimizer prior to query execution.

Mark,
Is it possible that using a index is faster because counting index records is
faster than counting table records? It is important to realize that all the
index keys are the same so probably the table is larger than the index and
executing the query takes less io -> faster. (I’ll check table size vs. index size)

Pep

Pep Pla

Show table status gives half the size of the table to the index.

Also explain gives half (???) the rows if using the index.

I can understand that the index takes less space than the table as all the keys are the same but why the optimizer believes that using the index requieres half the rows? because is a b-tree?

Pep

gizg

All well and good being a student with plenty of time on your hands to do such trivial (read: useless) tests!
HOWEVER, try and write a query in a real world web app that can do without a WHERE clause!?!?!?!?!

Cpt. Contrarian

@gizg
>HOWEVER, try and write a query in a real world web app that can do without a WHERE clause!?!?!?!?!

I’ve got one. I found this page because I was looking for an explanation for how my JOIN operation could get faster when I took out the WHERE clause even though that greatly multiplied the JOIN size.

Because the other side of the JOIN already limited the records as needed, the WHERE clause on the first subquery was unneeded. I assumed that having a smaller resultset would increase performance because less memory was copied and used; however, instead, having the WHERE clause on that one table tripled the overall runtime of the query. Upon reflection, I presume that’s because the first subquery requires the database engine to set up a temporary table to store the resultset, while a SELECT without a WHERE means that it can read directly from the source database (likely cached) when it creates the JOINed resultset.

So, this is interesting research; and I am keen enough to find out more that I’m going to do tests of my own.

Michael

If you use functions on the ‘left hand side’ of a where clause, you ignore any indexes. “and left(a, 10) = ‘2008-10-29′” should be “and ‘2008-10-29’ = left(a, 10)”