August 28, 2007

To SQL_CALC_FOUND_ROWS or not to SQL_CALC_FOUND_ROWS?

Posted by Alexey Kovyrin

When we optimize clients' SQL queries I pretty often see a queries with SQL_CALC_FOUND_ROWS option used. Many people think, that it is faster to use this option than run two separate queries: one - to get a result set, another - to count total number of rows. In this post I'll try to check, is this true or not and when it is better to run two separate queries.

For my tests I've created following simple table:

SQL:
  1. CREATE TABLE `count_test` (
  2.   `a` int(10) NOT NULL AUTO_INCREMENT,
  3.   `b` int(10) NOT NULL,
  4.   `c` int(10) NOT NULL,
  5.   `d` varchar(32) NOT NULL,
  6.   PRIMARY KEY  (`a`),
  7.   KEY `bc` (`b`,`c`)
  8. ) ENGINE=MyISAM

Test data has been created with following script (which creates 10M records):

PHP:
  1. mysql_connect("127.0.0.1", "root");
  2.  
  3. for ($i = 0; $i <10000000; $i++) {
  4.     $b = $i % 1000;
  5.     mysql_query("INSERT INTO count_test SET b=$b, c=ROUND(RAND()*10), d=MD5($i)");
  6. }

First of all, let's try to perform some query on this table using indexed column b in where clause:

SQL:
  1. mysql> SELECT SQL_NO_CACHE SQL_CALC_FOUND_ROWS * FROM count_test WHERE b = 555 ORDER BY c LIMIT 5;

Results with SQL_CALC_FOUND_ROWS are following: for each b value it takes 20-100 sec to execute uncached and 2-5 sec after warmup. Such difference could be explained by the I/O which required for this query - mysql accesses all 10k rows this query could produce without LIMIT clause.

Let's check, how long it'd take if we'll try to use two separate queries:

SQL:
  1. mysql> SELECT SQL_NO_CACHE * FROM count_test WHERE b = 666 ORDER BY c LIMIT 5;

The results are following: it takes 0.01-0.11 sec to run this query first time and 0.00-0.02 sec for all consecutive runs.

And now - we need too check how long our COUNT query would take:

SQL:
  1. mysql> SELECT SQL_NO_CACHE count(*) FROM count_test WHERE b = 666;

Result is really impressive here: 0.00-0.04 sec for all runs.

So, as we can see, total time for SELECT+COUNT (0.00-0.15 sec) is much less than execution time for original query (2-100 sec). Let's take a look at EXPLAINs:

SQL:
  1. mysql> EXPLAIN SELECT SQL_CALC_FOUND_ROWS * FROM count_test WHERE b = 999 ORDER BY c LIMIT 5;
  2. +----+-------------+------------+------+---------------+------+---------+-------+-------+-------------+
  3. | id | select_type | TABLE      | type | possible_keys | KEY  | key_len | ref   | rows  | Extra       |
  4. +----+-------------+------------+------+---------------+------+---------+-------+-------+-------------+
  5. 1 | SIMPLE      | count_test | ref  | bc            | bc   | 4       | const | 75327 | USING WHERE |
  6. +----+-------------+------------+------+---------------+------+---------+-------+-------+-------------+
  7. 1 row IN SET (0.00 sec)
  8.  
  9. mysql> EXPLAIN SELECT SQL_NO_CACHE count(*) FROM count_test WHERE b = 666;
  10. +----+-------------+------------+------+---------------+------+---------+-------+------+-------------+
  11. | id | select_type | TABLE      | type | possible_keys | KEY  | key_len | ref   | rows | Extra       |
  12. +----+-------------+------------+------+---------------+------+---------+-------+------+-------------+
  13. 1 | SIMPLE      | count_test | ref  | bc            | bc   | 4       | const | 5479 | USING INDEX |
  14. +----+-------------+------------+------+---------------+------+---------+-------+------+-------------+
  15. 1 row IN SET (0.00 sec)

Here is why our count was much faster - MySQL accessed our table data when calculated result set size even when this was not needed (after the first 5 rows specified in LIMIT clause). With count(*) it used index scan inly which is much faster here.

Just to be objective I've tried to perform this test without indexes (full scan) and with index on b column. Results were following:

  1. Full-scan:
    • 7 seconds for SQL_CALC_FOUND_ROWS.
    • 7+7 seconds in case when two queries used.
  2. Filesort:
    • 1.8 seconds for SQL_CALC_FOUND_ROWS.
    • 1.8+0.05 seconds in case when two queries used.

So, obvious conclusion from this simple test is: when we have appropriate indexes for WHERE/ORDER clause in our query, it is much faster to use two separate queries instead of one with SQL_CALC_FOUND_ROWS.

Related posts: :Four ways to optimize paginated displays::Bug fix of InnoDB scalability problem::Are you designing IO bound or CPU bound application ?:
 

35 Comments »

  1. Have you tried this same test under high concurrency. Say 12 active threads? I did a benchmark a while back (2 years ago using INNODB of course) under concurrency the trend of using SQL_CALC_FOUND_ROWS was a bit faster, but then again this finding may have been tainted by the INNODB scalability bug you found.

    Comment :: August 29, 2007 @ 12:20 am

  2. 2. Nima

    IMHO comparison is not true, in “EXPLAIN SELECT SQL_NO_CACHE count(*) FROM count_test WHERE b = 666;”, count is performed on index without access physical data. But in “EXPLAIN SELECT SQL_CALC_FOUND_ROWS * FROM count_test WHERE b = 999 ORDER BY c LIMIT 5;”, mysql uses index to determine number of rows to examine (through compound index on b and c) and returns a result set by picking physical data which takes some time (because columns are not included in index). I think use SQL_CALC_FOUND_ROWS in queries that use complex where (for example search queries with many parameters and paging result set) is much faster than sending two queries. I use a stored procedure that accept a dynamic sql query, an offset and a limit then use prepared statement to execute query and assigns FOUND_ROW() to a output parameter then give it to paging class to build paging links. I tested that my approach is much faster than issuing two queries. because second query which accepts limit and offset do the same thing but one with SQL_CALC_FOUND_ROWS just stores number of records before restricting rows. If I am wrong, please clarify it for me.
    thanks.

    Comment :: August 29, 2007 @ 1:38 am

  3. Which version of mysql server did you use? Does this apply to all current versions or is the SQL_CALC_FOUND_ROWS option more optimized in mysql 5 ?

    Depending on the availability of indexes, mysql should be able to optimize a SQL_CALC_FOUND_ROWS query in such a way that users do not need to think about using a separate count(*) query anymore.

    Comment :: August 29, 2007 @ 6:19 am

  4. Dathan,

    The things can be well different for different situations and different storage engines. In this case there is simply much more work which needs to be done if sql_calc_found_rows is used. And it is frequently so. In some cases you may have the fact you need only one real query instead of 2 (select found_rows() does not touch data) may affect things ie if innodb had to spend a lot of time in the queue.

    Comment :: August 29, 2007 @ 6:37 am

  5. Nima,

    There are two very different things - using index to restict amount of rows examined and using index ONLY to get all data query needs.
    For count(*) the data is not even touched, for original select it has to be accessed and accessed for each row rather than only few 5 rows which is where the difference comes from.

    There are different cases - filesort and full table scan are mentioned examples.

    use of Innodb primary key is another case worth to mention - it effectively has all columns so touching data does not add much.

    Belive me SQL_CALC_FOUND_ROWS can make things slower in many cases - if this is your case I do not know, check it what makes sense. Just do not assume it is ALWAYS faster as I’ve seen people doing.

    Comment :: August 29, 2007 @ 6:42 am

  6. Marten,

    This issues of being unable to switch to traversing index only exists in MySQL 5.0 and 5.1 In MySQL 5.2 there are some optimizations in similar area so I have not tested if it is the case.

    Comment :: August 29, 2007 @ 6:43 am

  7. Marten,

    All tests were performed on MySQL 5.0.45 on MacOS X 10.4.8.

    Comment :: August 29, 2007 @ 7:25 am

  8. 8. Ryan

    take a look at http://bugs.mysql.com/bug.php?id=18454. This was submitted off a support case I opened with mysql last year. It’s currectly marked as an enhancement

    Comment :: August 29, 2007 @ 9:10 pm

  9. 9. Perrin Harkins

    I tested this on an application a few months ago and found the same: SQL_CALC_FOUND_ROWS was always much slower than two queries. The reason appears to be that MySQL does a good job of optimizing LIMIT queries by stopping them when the desired number of rows has been found. Using SQL_CALC_FOUND_ROWS defeats this optimization.

    Comment :: August 30, 2007 @ 9:31 am

  10. I happen to be the one who has introduced this feature in the first place. Even for MyISAM there are cases when using this option is faster then two queries. For InnoDB this option is even more usefull. But in any case, every query should be checked out for few different ranges. Largest problem that I have percieved with some customers is that they forget to remove this option when (due to application change) number of rows is no longer required.

    Comment :: August 30, 2007 @ 10:36 am

  11. Sure Sinisa, in some cases this feature makes sense - when filesort is done or temporary table is used for count(*) query as well as select query and few others - but it is too frequently assumed it always makes sense to use instead of 2 queries, which is how it logically should be. Unfortunately this is not the case.

    And you’re right. Forgetting sql_calc_found_rows when you do not need it any more is a bad mistake to do.

    Comment :: August 31, 2007 @ 3:28 am

  12. [...] Apparently using FOUND_ROWS() can be slower than running a second, non LIMIT-ed SELECT when you are trying to [...]

    Pingback :: December 6, 2007 @ 3:58 pm

  13. Hello,

    You seem to compare 2 different queries (I’m not talking about the ORDER), as you filter on 2 different values. One is reading 75327 rows (the slower of course, and that’s your SQL_CALC_FOUND_ROWS), the other has only 5479 rows. So it’s obviously faster for the second one, even if you’re using indexes.
    Can you try again the same test with the very same amount of rows limited by the WHERE clause?

    Comment :: December 10, 2007 @ 2:39 am

  14. [...] чтобы понять в чём причина, пока не наткнулся на эту статью. Пересказывать её не буду, сами прочитаете, но мораль [...]

    Pingback :: January 30, 2008 @ 7:45 am

  15. [...] how you do it. As an example I take the queries from this mysql performance blog article article. Because I’d like to learn what excactly SQL_CALC_FOUND_ROWS [...]

    Pingback :: February 5, 2008 @ 9:08 am

  16. The significance of attentive job search in the human resources market.

    Nowadays this problem is very widespread. The agreement at present can be oral or written, and the most important conditions in them can be marked in an oral, written or mixed form.

    Trackback :: February 15, 2008 @ 4:44 am

  17. [...] 另外从此贴中也可以分析得到,存在适当的索引在查询条件语句中,使用2个单独的查询语句要比单独使用SQL_CALC_FOUND_ROWS快的多 [...]

    Pingback :: March 28, 2008 @ 10:40 am

  18. [...] Un test dettagliato è visionabile a questo indirizzo su mysqlperformanceblog [...]

    Pingback :: April 15, 2008 @ 6:56 pm

  19. [...] in two seperate queries while the second is just one query, however it is not that obvious which one is better for performance. For our case we’ll use the first method, duplicating the original query while stripping out [...]

    Pingback :: June 10, 2008 @ 6:02 pm

  20. [...] through a post on SQL_CALC_FOUND_ROWS actually being slower that running one query followed by a count query I decided I needed to do a little benchmarking on [...]

    Pingback :: June 12, 2008 @ 7:42 am

  21. Yeap, I’ve tested and found that “2 query mode” (0,015 sec*) are about 10x times faster than “one with SQL_CALC_FOUND_ROWS” (0,153 sec*).

    The application that I’ve tested has 164.520 rows (42MB table).

    * generation time of the page where the queries were executed. That time is the minimal value which was gained after some page refreshes.

    Comment :: July 22, 2008 @ 4:40 pm

  22. 22. Sunfox

    I have to say that COUNT(*) is faster so far.

    Query with LIMIT and SQL_CALC_FOUND_ROWS: 0.80 - 1.00 seconds
    Query without SQL_CALC_FOUND_ROWS: 0.06 seconds
    Added COUNT(*) query: 0.09-0.12 seconds

    Comment :: August 23, 2008 @ 12:22 am

  23. In first sql you are using “ORDER BY c”, this column is not indexed and mysql must pass all rows to order result

    Comment :: August 26, 2008 @ 2:05 pm

  24. [...] a different 20 are retrieved. And to show the list of pages and the total count, you either a) use SQL_CALC_FOUND_ROWS (see our post on this) or b) execute a separate SELECT to count the [...]

    Pingback :: September 24, 2008 @ 7:34 am

  25. [...] a different 20 are retrieved. And to show the list of pages and the total count, you either a) use SQL_CALC_FOUND_ROWS (see our post on this) or b) execute a separate SELECT to count the [...]

    Pingback :: September 24, 2008 @ 10:58 am

  26. [...] $sql = “SELECT FOUND_ROWS()”; $rows_count = $this->fetchOne($sql); ?> 后记:这种方法刚开始我也考虑过查询并发的问题,但后来被否定了,因为一个WEB程序执行的时候会新建一个MYSQL的连接,程序结束的时候会断开这个连接。它是一个单独的线程,所以应该不会受其它线程SQL语句的影响。但在使用连接池或者长连接的时候就不太清楚了,还没有做测试。   :( 经过我在实际使用中的测试,发现它并不像MYSQL官方说的那样效率更高,反而从两次查询的效率更低,查了相关资料发现,这是MYSQL的一个BUG。详见:http://www.mysqlperformanceblog.com/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/ Mysql [...]

    Pingback :: November 27, 2008 @ 2:29 am

  27. [...] post last week, after discovering what I thought was a nifty trick. Shortly afterwards, I found contradicting evidence that suggested the traditional route of using a separate query of COUNT(*) wa…, and/or it depends on your individual database design. Instead of deleting the post, I figured I [...]

    Pingback :: December 10, 2008 @ 2:27 pm

  28. [...] http://www.mysqlperformanceblog.com/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/ [...]

    Pingback :: January 6, 2009 @ 8:45 pm

  29. I guess the post is quite old, but the topic will stay alive…

    @Nirma, it depends. I got a live example from a new service of mine, which applies tables with a few millions of rows. Something was killing it and I recognized it was SQL_CALC_FOUND_ROWS. After some analysis I found the following:

    I’m retrieving paginated results from the quite large tables (WHERE column like pattern ORDER BY column LIMIT small OFFSET huge). When using SQL_CALC_FOUND_ROWS, the query will always need to calculate the complete result set for “column linke pattern ORDER BY column”. But actually the tables are updated in regular intervals, and the number of results would stay constant in the mean time. So I decided to drop SQL_CALC_FOUND_ROWS. Instead I will do a second query on a (stored) function, which delivers a count(*) for the pattern or, if the pattern was used before, a cached value of the count. Queries are now simply: quick.

    In short: when using large tables, which stay constant for a while, avoid SQL_CALC_FOUND_ROWS; cache the counts of the result sets instead.

    Comment :: February 6, 2009 @ 6:49 am

  30. [...] up to date reliably.So I did what any self-respecting programmer does and turned to Google. I found some interesting info on the subject of getting row counts, including some optimizations that I felt [...]

    Pingback :: February 19, 2009 @ 3:54 am

  31. Could you please repeat this test with a proper case featuring 2x JOIN, a WHERE clause, a GROUP_BY, and an ORDER_BY, please?
    my point is, your test is most probably misleading.

    Comment :: February 26, 2009 @ 5:58 am

  32. 32. Brian

    In my case COUNT(*) was the hog that’s been driving my search slow, and SQL_CALC_FOUND_ROWS saved me, and like george suggests I’ve been using ORDER BY, JOIN, not some childish case like the one presented here.

    Comment :: March 19, 2009 @ 7:50 pm

  33. [...] To SQL_CALC_FOUND_ROWS or not to SQL_CALC_FOUND_ROWS? 事实证明,当查询语句中的WHERE和ORDER的条件有合适的索引,选择两次查询而不是SQL_CALC_FOUND_ROWS [...]

    Pingback :: May 3, 2009 @ 7:31 am

  34. I found that when doing complex multitable selects, SQL_CALC_FOUND_ROWS adds virtually no extra execution time to the query.
    Like Brian said, the example mentioned in this blog post is about as far from a real world scenario as possible.

    Comment :: June 30, 2009 @ 11:43 am

  35. I think the comments on this blog can be answered with “your mileage may vary.” In the “real world” I have seen severe performance degradation many times. I’m sure there are cases where it doesn’t degrade performance, and I don’t notice because I’m looking at what IS the performance problem on a server.

    Comment :: June 30, 2009 @ 6:06 pm

 



Subscribe without commenting