This is a fun question I’ve been wanting to test for some time. How much overhead does a trivial WHERE clause add to a MySQL query? To find out, I set my InnoDB buffer pool to 256MB and created a table that’s large enough to test, but small enough to fit wholly in memory:
1 2 3 4 5 6 | CREATE TABLE `t` ( `a` date NOT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1; insert into t(a) values(current_date); insert into t select * from t; |
I repeated the last statement until it got slow enough that I thought I could do a reasonable test; at this point there were 8388608 rows. I then optimized the table and restarted MySQL. The table ended up at 237MB.
Now let’s see how long a table scan with no WHERE clause takes:
1 2 3 4 5 6 7 | mysql> select sql_no_cache count(*) from t; +----------+ | count(*) | +----------+ | 8388608 | +----------+ 1 row in set (5.23 sec) |
I repeated this ten times and averaged the time: 5.01 seconds, ranging from 4.83 to 5.17 seconds. Next I ran it with a trivial WHERE clause:
1 | mysql> select count(*) from t where a = current_date; |
(Pop quiz: do I need SQL_NO_CACHE on this query?) The average execution time for this query is 7.96, ranging from 7.65 to 8.15 seconds. So apparently the overhead of the WHERE clause is about 2.95 seconds, or 351 nanoseconds per row. It doesn’t seem like much per row, but it adds up to about an extra 60% cost for the query. If I add another WHERE clause,
1 | mysql> select count(*) from t where a = current_date and left(a, 10) = '2008-10-29'; |
The average time is 9.39 seconds, or an added 87% overhead.
It would be interesting, in an academic kind of way, to test different data types and various complexities of WHERE clauses, but I’m not really interested enough to spend much time on it. I just wondered whether the WHERE clause would even be noticeable. This test doesn’t really reveal anything you can do to make your queries faster — you usually can’t optimize the WHERE clause itself, only the way that the MySQL optimizer chooses to apply the clause to indexes. (However, as I found out some time ago, some date functions are faster than others. That would be interesting to look into more.)
re Pop quiz: do I need SQL_NO_CACHE on this query?
I’m still new to this but I’m going to say ‘NO’ as you are using a variable (current_date) in the query instead of a fixed date (ie ‘2008-10-29’). Is that correct.
I am amazed at that overhead! Would using MyIsam and/or adding an INDEX on the date field make any difference?
Cheers
Martin
Martin, you’re right about current_date.
The WHERE is implemented at the server level so it ought not to depend on the engine or indexes.
I just ran some tests, and on my system the performance is improved by an index:
CREATE INDEX a_idx ON t(a);
Rows: 8388608
Time without WHERE: 4.8 seconds
Time with WHERE, without index: 5.9 seconds
Time with WHERE, with index: 5.2 seconds
My results are that the index reduces the additional overhead from the WHERE clause by half.
And with the index, but without WHERE?
That’s unchanged – 4.8 seconds.
Of course — because it’s using the primary key.
My point in asking those questions was to get you thinking about the query plan. You can’t compare the primary key to the secondary key and say that the difference is due to where clauses or not having where clauses. I think you’re misunderstanding how the storage engine api and InnoDB indexes work.
I’m confused. Where did the Primary Key come from in:
CREATE TABLE
t
(a
date NOT NULL) ENGINE=InnoDB DEFAULT CHARSET=latin1;?
InnoDB has a hidden 6-byte primary key.
Well that is three things I have learned in one post – awesome!
Thx
Marty
I think you should use none function in the where case for the last statement, while that maybe took a little time down.
How much of this is actually overhead because of the comparison versus the where?
ie, can you compare with
WHERE TRUE
and
WHERE 1=1
and
WHERE ‘a’=’a’? That would be much more interesting, because the 60% or 87% are likely because of the access and comparison costs. But how much is the actual WHERE costing?
There’s also explain plan overhead.
Sheri,
MySQL Optimizer does something called: Constant condition removal.
This means that a condition like this TRUE or 5=5 is going to be
removed by the optimizer prior to query execution.
Mark,
Is it possible that using a index is faster because counting index records is
faster than counting table records? It is important to realize that all the
index keys are the same so probably the table is larger than the index and
executing the query takes less io -> faster. (I’ll check table size vs. index size)
Pep
Show table status gives half the size of the table to the index.
Also explain gives half (???) the rows if using the index.
I can understand that the index takes less space than the table as all the keys are the same but why the optimizer believes that using the index requieres half the rows? because is a b-tree?
Pep
Pep, yes, the index is half the physical size. The row estimate is only an estimate. The stats are inexact.
All well and good being a student with plenty of time on your hands to do such trivial (read: useless) tests!
HOWEVER, try and write a query in a real world web app that can do without a WHERE clause!?!?!?!?!
@gizg
>HOWEVER, try and write a query in a real world web app that can do without a WHERE clause!?!?!?!?!
I’ve got one. I found this page because I was looking for an explanation for how my JOIN operation could get faster when I took out the WHERE clause even though that greatly multiplied the JOIN size.
Because the other side of the JOIN already limited the records as needed, the WHERE clause on the first subquery was unneeded. I assumed that having a smaller resultset would increase performance because less memory was copied and used; however, instead, having the WHERE clause on that one table tripled the overall runtime of the query. Upon reflection, I presume that’s because the first subquery requires the database engine to set up a temporary table to store the resultset, while a SELECT without a WHERE means that it can read directly from the source database (likely cached) when it creates the JOINed resultset.
So, this is interesting research; and I am keen enough to find out more that I’m going to do tests of my own.
If you use functions on the ‘left hand side’ of a where clause, you ignore any indexes. “and left(a, 10) = ‘2008-10-29′” should be “and ‘2008-10-29’ = left(a, 10)”