August 21, 2014

COUNT(*) vs COUNT(col)

Looking at how people are using COUNT(*) and COUNT(col) it looks like most of them think they are synonyms and just using what they happen to like, while there is substantial difference in performance and even query result.

Lets look at the following series of examples:

As this is MYISAM table MySQL has cached number of rows in this table. This is why it is able to instantly answer COUNT(*) and
COUNT(val2) queries, but not COUNT(val). Why ? Because val column is not defined as NOT NULL there can be some NULL values in it and so MySQL have to perform table scan to find out. This is also why result is different for the second query.

So COUNT(*) and COUNT(col) queries not only could have substantial performance performance differences but also ask different question.

MySQL Optimizer does good job in this case doing full table scan only if it is needed because column can be NULL.

Now lets try few more queries:

As you can see even if you have where clause performance for count(*) and count(col) can be significantly different. In fact this example shows just 3 times performance difference because all data fits in memory, for IO bound workloads you frequently can see 10 and even 100 times performance difference in this case.

The thing is count(*) query can use covering index even while count(col) can’t. Of course you can extend index to be (i,val) and get query to be index covered again but I would use this workaround only if you can’t change the query (ie it is third party application) or in case column name is in the query for reason, and you really need count of non-NULL values.

It is worth to note in this case MySQL Optimizer does not do too good job optimizing the query. One could notice (val2) column is not null so count(val2) is same as count(*) and so the query could be run as index covered query. It does not and both queries have to perform row reads in this case.

As you can see extending index helps in this case but it makes query about 2 times slower compared to count(*) one. This is probably because index becomes about two times longer in this case.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. deepa says:

    1. SELECT 1 FROM table_name WHERE condition LIMIT 1
    2. SELECT count(*) FROM table_name WHERE condition

    In Innodb which query will be faster.

  2. Peter Zaitsev says:

    Hi,

    You’re looking at full index scan or full table scan in this case. Innodb has some protection from full table scan polluting cache, though it is done differently than using same page

  3. Foobarista says:

    In InnoDB, does a simple count(*) do anything evil with the database cache? I ask because some database engine’s I’ve seen don’t do caching carefully and tablescans (or count(*) that does a tablescan) can end up pulling the whole table into the cache (and pushing more useful pages out). Smarter db engines will just reuse the same buffer page to do table scans (after making sure it isn’t in-cache already), but many don’t.

  4. And what about “SELECT count(1) FROM fact;” – is that treated the same way as count(*) ?
    I was taught (15 yrs ago) that count(1) was faster

  5. Doki says:

    What about innodb ? innodb doesnt have row count cached or ?

  6. peter says:

    Right. Innodb does not have cached row count. so count(*) without where clause is slow with Innodb tables.

  7. peter says:

    Peter,

    I can’t see any reason why count(1) would be faster. May be it was specific to the database you was using.

    SELECT 1 FROM TABLE WHERE ID=123 is faster way to check if row exists than SELECT * but it is other story.

  8. Ken Jacobs says:

    The statement that “COUNT(*) with InnoDB is slow without a WHERE clause because InnoDB doesn’t cache the row count” only tells part of the story. No, it doesn’t cache the row count. Wish it were that simple. The whole situation is more complex. The points Peter makes about use of indexes, covering indexes and NOT NULL are true, and apply to InnoDB as well as MyISAM. But there is more to the story.

    It is easy for MyISAM to maintain the count of rows since only one transaction executes at any time, because MyISAM does table-level locking. After a transaction commits, it is easy for MyISAM to update the row count for a table (taking into account inserts and deletes done by the transaction) before the next transaction starts. Of course, the downside to table-level locking is a limit on throughput in the presence of updates (including deletes and inserts).

    However, with InnoDB, which uses row-level locking, there are multiple concurrently executing (and as-yet uncommitted) transactions. This obviously improves throughput when transactions are making changes in the tables. To ensure consistency, each transaction has to see the table (including the number of rows in the table) as of the beginning of the transaction, plus its own changes. Thus the number of rows (and indeed the data) in the table is obviously potentially different for each concurrent transaction. Therefore there is no single “correct” total number of rows at any given time (unless there are no update transactions running). With multiple concurrent transactions, it is not really possible to cache the total number of rows in a table, without creating another point of serialization (making it so transactions execute one at a time).

    Now one must also ask the question whether or not it is a valuable thing to know the accurate number of rows in an entire table at any time. What is the business/application reason for needing to know this? Is it more often the case that you WILL have a WHERE clause, to count the number of rows for a specific key (e.g., number of posts in a blog by a certain author)? That information isn’t cached, of course, and as the discussion above makes clear, in most cases you need to have the proper indexes to ensure good performance of queries that use a WHERE clause.

    Further, there may be usable application-level design changes (e.g., triggers) to maintain a total count of rows in a “side table” if the number of inserts and delete transactions is limited, and you can accept the serialization of such transactions.

    So this all implies that it is not best, in most situations, to choose a storage engine because of differences in the performance of counting all the rows in a table. This discussion is a good example of Peter’s other comments today … simple answers are fine as far as they go. But it always is a good idea to have a more complete understanding of the technology, so you can make the best decisions on how to apply it to your particular problem.

  9. peter says:

    Ken,

    Than you for commenting I know this is your favorite topic.
    In fact I already wrote a lot on this matter so I simply put it to “does not store row count” without too much explanations of why it is not as trivial as it seems.

    http://www.mysqlperformanceblog.com/2006/12/01/count-for-innodb-tables/

    Interesting enough with Solid, PBXT, Falcon transactional storage engines with multi versioning, will any of them implement fast count(*) without where clause ? We shall see.

  10. peter says:

    Oh also about triggers and serialization,

    There is nice trick which helps you to get accurate count(*) via triggers but avoid serialization. Instead of 1 counter row have for example 10, update random of them and run select sum(count) to get the value.

  11. danielj says:

    so what is the answer to the first question, is COUNT(1) similar to COUNT(*)?

  12. klaus says:

    count(*) is similar to count(0) as you cas se here:

    mysql> explain extended select count(*) from mytable;
    +—-+————-+——-+——+—————+——+———+——+——+——————————+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    +—-+————-+——-+——+—————+——+———+——+——+——————————+
    | 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
    +—-+————-+——-+——+—————+——+———+——+——+——————————+
    1 row in set, 1 warning (0.00 sec)

    mysql> show warnings;
    +——-+——+———————————————————+
    | Level | Code | Message |
    +——-+——+———————————————————+
    | Note | 1003 | select count(0) AS count(*) from database.mytable |
    +——-+——+———————————————————+
    1 row in set (0.00 sec)

    count(1) is similar to count(1) as you can se here:

    mysql> explain extended select count(1) from mytable;
    +—-+————-+——-+——+—————+——+———+——+——+——————————+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    +—-+————-+——-+——+—————+——+———+——+——+——————————+
    | 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
    +—-+————-+——-+——+—————+——+———+——+——+——————————+
    1 row in set, 1 warning (0.00 sec)

    mysql> show warnings;
    +——-+——+———————————————————+
    | Level | Code | Message |
    +——-+——+———————————————————+
    | Note | 1003 | select count(1) AS count(1) from database.mytable |
    +——-+——+———————————————————+
    1 row in set (0.00 sec)

    So in conclusion count(*) == count(0) and count(*) != count(1)

    Hope it helps.

  13. There is a post with similar problem but in russian
    http://voituk.kiev.ua/2006/07/27/mysql-right-count-statement/

  14. Raj says:

    Just saw this thread and started wondering about how the fact that innodb uses clustered indexes would play into the equation. I find that, with a judicious choice of (col), count(col) may actually be a lot faster than count(*).

    Consider this example (restarting mysql each time to flush the cache, and using O_DIRECT to bypass OS caching)

    mysql> explain select count(*) from wp_comments;
    +—-+————-+————-+——-+—————+———+———+——+——–+————-+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    +—-+————-+————-+——-+—————+———+———+——+——–+————-+
    | 1 | SIMPLE | wp_comments | index | NULL | PRIMARY | 8 | NULL | 424672 | Using index |
    +—-+————-+————-+——-+—————+———+———+——+——–+————-+
    1 row in set (0.11 sec)

    mysql> explain select count(rajid) from wp_comments;
    +—-+————-+————-+——-+—————+——–+———+——+——–+————-+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    +—-+————-+————-+——-+—————+——–+———+——+——–+————-+
    | 1 | SIMPLE | wp_comments | index | NULL | rajidx | 9 | NULL | 424672 | Using index |
    +—-+————-+————-+——-+—————+——–+———+——+——–+————-+
    1 row in set (0.00 sec)

    mysql> select count(*) from wp_comments;
    +———-+
    | count(*) |
    +———-+
    | 230120 |
    +———-+
    1 row in set (32.57 sec)

    and if you look at the buffer pool used for the count(*):

    Total memory allocated 589194920; in additional pool allocated 714496
    Buffer pool size 32768
    Free buffers 14104
    Database pages 18664
    Modified db pages 0

    restart mysql and check the count of (rajid):

    mysql> select count(rajid) from wp_comments;
    +————–+
    | count(rajid) |
    +————–+
    | 230120 |
    +————–+
    1 row in set (0.28 sec)

    Much faster. Why? Because it only reads the index data, not the row data for the table when you use a secondary index. Check the buffer pool usage:

    Total memory allocated 589194920; in additional pool allocated 714496
    Buffer pool size 32768
    Free buffers 32332
    Database pages 436
    Modified db pages 0

    The column rajid btw, is just a copy of the PK column comment_ID with a unique index:

    mysql> show create table wp_comments\G
    *************************** 1. row ***************************
    Table: wp_comments
    Create Table: CREATE TABLE wp_comments (
    comment_ID bigint(20) unsigned NOT NULL auto_increment,
    comment_post_ID int(11) NOT NULL default ’0′,
    comment_author tinytext NOT NULL,
    comment_author_email varchar(100) NOT NULL default ”,
    comment_author_url varchar(200) NOT NULL default ”,
    comment_author_IP varchar(100) NOT NULL default ”,
    comment_date datetime NOT NULL default ’0000-00-00 00:00:00′,
    comment_date_gmt datetime NOT NULL default ’0000-00-00 00:00:00′,
    comment_content text NOT NULL,
    comment_karma int(11) NOT NULL default ’0′,
    comment_approved enum(’0′,’1′,’spam’) NOT NULL default ’1′,
    comment_agent varchar(255) NOT NULL default ”,
    comment_type varchar(20) NOT NULL default ”,
    comment_parent bigint(20) NOT NULL default ’0′,
    user_id bigint(20) NOT NULL default ’0′,
    rajid bigint(20) unsigned default NULL,
    PRIMARY KEY (comment_ID),
    UNIQUE KEY rajidx (rajid)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1
    1 row in set (0.00 sec)

    mysql>

    btw, if there is a smaller secondary index on the table, innodb is smart enough to use that for a count, even if it is not a unique index. For instance:
    mysql> explain select count(*) from wp_comments;
    +—-+————-+————-+——-+—————+———————-+———+——+——–+————-+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    +—-+————-+————-+——-+—————+———————-+———+——+——–+————-+
    | 1 | SIMPLE | wp_comments | index | NULL | comment_approved_idx | 1 | NULL | 228484 | Using index |
    +—-+————-+————-+——-+—————+———————-+———+——+——–+————-+
    1 row in set (0.23 sec)

    mysql> select count(*) from wp_comments;
    ERROR 2006 (HY000): MySQL server has gone away
    No connection. Trying to reconnect…
    Connection id: 2
    Current database: test

    +———-+
    | count(*) |
    +———-+
    | 230120 |
    +———-+
    1 row in set (1.19 sec)

    mysql>
    and the buffer pool usage:

    Total memory allocated 589194920; in additional pool allocated 721920
    Buffer pool size 32768
    Free buffers 32449
    Database pages 319
    Modified db pages 0

    there’s only 3 distinct values in the table for this column, so its not actually counting the values but the pointers from this index:
    mysql> select count( distinct (comment_approved) ) from wp_comments;
    +————————————–+
    | count( distinct (comment_approved) ) |
    +————————————–+
    | 3 |
    +————————————–+
    1 row in set (0.32 sec)

    mysql>

    cheers!
    –Raj.

  15. peter says:

    Raj,

    Good point. It looks like a bug to me as Innodb automatically picks up “shorter” indexes while in practice even first index you tried is shorter.

    The other interesting point I should note – smaller index may not be faster for huge tables, because if you insert data in order primary key tree is often sequential while secondary indexes may be scattered having few sequential pages.

  16. Pavel says:

    What about count(id) (id is primary key)?
    Is count(id) slower than count(*)?

    Using innodb + where and myisam + where;

  17. peter says:

    Pavel,

    For primary key it should be the same as primary key can’t have null values.
    It is same as count(val2) in my example.

  18. howard says:

    hello, can i say:

    the rule of thumb is…

    1. use count(primary_key)
    2. use count(indexed_field_and_not_null)
    3. use count(indexed_field)
    4. use count(*)

    how abt the length of the index, does it matter?

    thanks.

  19. dalin says:

    It should be noted that often when people use COUNT they really just want to know if there’s anything there and aren’t particularly interested in a accurate count. In this case a far more performant approach is
    SELECT 1 FROM foo WHERE bar=”baz”
    The query completes as soon as the first row is found where bar=”baz”.

  20. dalin says:

    And to correct my own typo I meant
    SELECT 1 FROM foo WHERE bar=”baz” LIMIT 1

    the limit being very important

  21. kene says:

    mysql> SELECT count(val2) FROM fact; it is true ?

  22. kene says:

    54.mysql> EXPLAIN SELECT count(val2) FROM fact WHERE i<10000 \G
    55.*************************** 1. row ***************************
    56. id: 1
    57. select_type: SIMPLE
    58. TABLE: fact
    59. type: range
    60.possible_keys: i
    61. KEY: i
    62. key_len: 4
    63. ref: NULL
    64. rows: 691619
    65. Extra: USING WHERE
    66.1 row IN SET (0.00 sec)

    it is true

  23. TerryC says:

    So can I say in this way: Always use count(*)?

  24. TerryC, if you are trying to count the number of rows, then YES.

  25. ibrahim oguz says:

    you say count(*) fast because it founds if there are indexes. for auto increment primary keys also is index. then if we use primary keys select count(id) from users; then it has to be fast too.
    how about group by statement. nobody asked about it. select count(*) from users where country=’USA’ group by CITY;
    how can we use it in innodb and myisam.
    thanks.

  26. ricky says:

    How about select count(1) vs select count(id)

    Which one is better?

  27. Ilan Hazan says:

    If you need to count num results but up to a specific limit (to improve performance), you can use the Limited-Count solution.
    You can read more about it here: http://www.mysqldiary.com/limited-select-count/

  28. Rishabh Jain says:

    can I use count(*)>3

  29. Foobarista says:

    In InnoDB, does a simple count(*) do anything evil with the database cache? I ask because some database engine’s I’ve seen don’t do caching carefully and tablescans (or count(*) that does a tablescan) can end up pulling the whole table into the cache (and pushing more useful pages out). Smarter db engines will just reuse the same buffer page to do table scans (after making sure it isn’t in-cache already), but many don’t.

  30. Hi,

    You’re looking at full index scan or full table scan in this case. Innodb has some protection from full table scan polluting cache, though it is done differently than using same page

  31. OnLine says:

    Hy,
    I have 2 tables.
    Can I make a single Select query to return the total number of rows in the first table and the rows (their data) of the seccond table?
    And if this is posible, how does this query look like?

  32. muhammed sekertekin says:

    very nice article i used to think that you wont notice a a difference if you use count by row i guess to have to recode a lot of my codes…. Thank you from muhammed sekertekin

  33. saaj says:

    MySQL 5.5 seems to be optimized in concern of COUNT(*) queries. At the same machine, for the table of ~120K records I’ve ~2.4 seconds for 5.1 and 0.08 seconds for 5.5 for “SELECT COUNT(*) FROM table” query.

  34. Hi Peter,

    For pagination stuff in high traffic websites having huge tables, we can do it in two ways:-

    select count(*) from table where condition1 and condition2;
    select col1,col2 from table where condition1 and condition2 limit 10,20;

    OR

    Just fire a single query and get all the data and then get the count at PHP end like:-
    select col1,col2 from table where condition1 and condition2;

    Please suggest which one is better idea in both INNODB and MyIsam

  35. rashid says:

    hi what will be the command to count field value who digits less then 8 like your command >select count(val2) from fact where i<10000; in my case i want to count those values who length is equal to 8 or less then 8 digit.

  36. Emad says:

    Good tip, Thank You.

  37. Can you help me with a simple thing?

    i have one table named dealviewer and i want to count values in column message_sent if they are set as true. If they are null leave.

    A need to set this values to a php variable

    Iam newbie, so please post kind a ready code if possible.
    Thanks in advance.

  38. Very interesting, the problem is when we have a request about interacting is with several patterns search inside, example, table1.field, table2.field from table1, table2 where table1.field = table2.field

  39. Hujšanje says:

    Kind of big pile to understand, but I always had problems with understanding of why count(val) does not work in some cases. Many thanks!

Speak Your Mind

*