COUNT(*) vs COUNT(col)
Looking at how people are using COUNT(*) and COUNT(col) it looks like most of them think they are synonyms and just using what they happen to like, while there is substantial difference in performance and even query result.
Lets look at the following series of examples:
-
CREATE TABLE `fact` (
-
`i` int(10) UNSIGNED NOT NULL,
-
`val` int(11) DEFAULT NULL,
-
`val2` int(10) UNSIGNED NOT NULL,
-
KEY `i` (`i`)
-
) ENGINE=MyISAM DEFAULT CHARSET=latin1
-
-
-
mysql> SELECT count(*) FROM fact;
-
+----------+
-
| count(*) |
-
+----------+
-
| 7340032 |
-
+----------+
-
1 row IN SET (0.00 sec)
-
-
-
mysql> SELECT count(val) FROM fact;
-
+------------+
-
| count(val) |
-
+------------+
-
| 7216582 |
-
+------------+
-
1 row IN SET (1.17 sec)
-
-
-
mysql> SELECT count(val2) FROM fact;
-
+-------------+
-
| count(val2) |
-
+-------------+
-
| 7340032 |
-
+-------------+
-
1 row IN SET (0.00 sec)
As this is MYISAM table MySQL has cached number of rows in this table. This is why it is able to instantly answer COUNT(*) and
COUNT(val2) queries, but not COUNT(val). Why ? Because val column is not defined as NOT NULL there can be some NULL values in it and so MySQL have to perform table scan to find out. This is also why result is different for the second query.
So COUNT(*) and COUNT(col) queries not only could have substantial performance performance differences but also ask different question.
MySQL Optimizer does good job in this case doing full table scan only if it is needed because column can be NULL.
Now lets try few more queries:
-
mysql> SELECT count(*) FROM fact WHERE i<10000;
-
+----------+
-
| count(*) |
-
+----------+
-
| 733444 |
-
+----------+
-
1 row IN SET (0.40 sec)
-
-
mysql> EXPLAIN SELECT count(*) FROM fact WHERE i<10000 \G
-
*************************** 1. row ***************************
-
id: 1
-
select_type: SIMPLE
-
TABLE: fact
-
type: range
-
possible_keys: i
-
KEY: i
-
key_len: 4
-
ref: NULL
-
rows: 691619
-
Extra: USING WHERE; USING INDEX
-
1 row IN SET (0.00 sec)
-
-
-
mysql> SELECT count(val) FROM fact WHERE i<10000;
-
+------------+
-
| count(val) |
-
+------------+
-
| 720934 |
-
+------------+
-
1 row IN SET (1.29 sec)
-
-
mysql> EXPLAIN SELECT count(val) FROM fact WHERE i<10000 \G
-
*************************** 1. row ***************************
-
id: 1
-
select_type: SIMPLE
-
TABLE: fact
-
type: range
-
possible_keys: i
-
KEY: i
-
key_len: 4
-
ref: NULL
-
rows: 691619
-
Extra: USING WHERE
-
1 row IN SET (0.00 sec)
-
-
mysql> SELECT count(val2) FROM fact WHERE i<10000;
-
+-------------+
-
| count(val2) |
-
+-------------+
-
| 733444 |
-
+-------------+
-
1 row IN SET (1.30 sec)
-
-
mysql> EXPLAIN SELECT count(val2) FROM fact WHERE i<10000 \G
-
*************************** 1. row ***************************
-
id: 1
-
select_type: SIMPLE
-
TABLE: fact
-
type: range
-
possible_keys: i
-
KEY: i
-
key_len: 4
-
ref: NULL
-
rows: 691619
-
Extra: USING WHERE
-
1 row IN SET (0.00 sec)
As you can see even if you have where clause performance for count(*) and count(col) can be significantly different. In fact this example shows just 3 times performance difference because all data fits in memory, for IO bound workloads you frequently can see 10 and even 100 times performance difference in this case.
The thing is count(*) query can use covering index even while count(col) can't. Of course you can extend index to be (i,val) and get query to be index covered again but I would use this workaround only if you can't change the query (ie it is third party application) or in case column name is in the query for reason, and you really need count of non-NULL values.
It is worth to note in this case MySQL Optimizer does not do too good job optimizing the query. One could notice (val2) column is not null so count(val2) is same as count(*) and so the query could be run as index covered query. It does not and both queries have to perform row reads in this case.
-
mysql> ALTER TABLE fact DROP KEY i, ADD KEY(i,val);
-
Query OK, 7340032 rows affected (37.15 sec)
-
Records: 7340032 Duplicates: 0 Warnings: 0
-
-
-
mysql> SELECT count(val) FROM fact WHERE i<10000;
-
+------------+
-
| count(val) |
-
+------------+
-
| 720934 |
-
+------------+
-
1 row IN SET (0.78 sec)
As you can see extending index helps in this case but it makes query about 2 times slower compared to count(*) one. This is probably because index becomes about two times longer in this case.
19 Comments











del.icio.us
digg
And what about “SELECT count(1) FROM fact;” - is that treated the same way as count(*) ?
I was taught (15 yrs ago) that count(1) was faster
Comment :: April 10, 2007 @ 2:02 pm
What about innodb ? innodb doesnt have row count cached or ?
Comment :: April 10, 2007 @ 3:16 pm
Right. Innodb does not have cached row count. so count(*) without where clause is slow with Innodb tables.
Comment :: April 11, 2007 @ 2:01 am
Peter,
I can’t see any reason why count(1) would be faster. May be it was specific to the database you was using.
SELECT 1 FROM TABLE WHERE ID=123 is faster way to check if row exists than SELECT * but it is other story.
Comment :: April 11, 2007 @ 2:37 am
The statement that “COUNT(*) with InnoDB is slow without a WHERE clause because InnoDB doesn’t cache the row count” only tells part of the story. No, it doesn’t cache the row count. Wish it were that simple. The whole situation is more complex. The points Peter makes about use of indexes, covering indexes and NOT NULL are true, and apply to InnoDB as well as MyISAM. But there is more to the story.
It is easy for MyISAM to maintain the count of rows since only one transaction executes at any time, because MyISAM does table-level locking. After a transaction commits, it is easy for MyISAM to update the row count for a table (taking into account inserts and deletes done by the transaction) before the next transaction starts. Of course, the downside to table-level locking is a limit on throughput in the presence of updates (including deletes and inserts).
However, with InnoDB, which uses row-level locking, there are multiple concurrently executing (and as-yet uncommitted) transactions. This obviously improves throughput when transactions are making changes in the tables. To ensure consistency, each transaction has to see the table (including the number of rows in the table) as of the beginning of the transaction, plus its own changes. Thus the number of rows (and indeed the data) in the table is obviously potentially different for each concurrent transaction. Therefore there is no single “correct” total number of rows at any given time (unless there are no update transactions running). With multiple concurrent transactions, it is not really possible to cache the total number of rows in a table, without creating another point of serialization (making it so transactions execute one at a time).
Now one must also ask the question whether or not it is a valuable thing to know the accurate number of rows in an entire table at any time. What is the business/application reason for needing to know this? Is it more often the case that you WILL have a WHERE clause, to count the number of rows for a specific key (e.g., number of posts in a blog by a certain author)? That information isn’t cached, of course, and as the discussion above makes clear, in most cases you need to have the proper indexes to ensure good performance of queries that use a WHERE clause.
Further, there may be usable application-level design changes (e.g., triggers) to maintain a total count of rows in a “side table” if the number of inserts and delete transactions is limited, and you can accept the serialization of such transactions.
So this all implies that it is not best, in most situations, to choose a storage engine because of differences in the performance of counting all the rows in a table. This discussion is a good example of Peter’s other comments today … simple answers are fine as far as they go. But it always is a good idea to have a more complete understanding of the technology, so you can make the best decisions on how to apply it to your particular problem.
Comment :: April 11, 2007 @ 10:40 am
Ken,
Than you for commenting I know this is your favorite topic.
In fact I already wrote a lot on this matter so I simply put it to “does not store row count” without too much explanations of why it is not as trivial as it seems.
http://www.mysqlperformanceblog.com/2006/12/01/count-for-innodb-tables/
Interesting enough with Solid, PBXT, Falcon transactional storage engines with multi versioning, will any of them implement fast count(*) without where clause ? We shall see.
Comment :: April 11, 2007 @ 2:16 pm
Oh also about triggers and serialization,
There is nice trick which helps you to get accurate count(*) via triggers but avoid serialization. Instead of 1 counter row have for example 10, update random of them and run select sum(count) to get the value.
Comment :: April 11, 2007 @ 2:18 pm
so what is the answer to the first question, is COUNT(1) similar to COUNT(*)?
Comment :: April 13, 2007 @ 6:37 am
[...] und COUNT(foo) gibt. Antwort auf diese Frage findet man, wie könnte es anders sein, im MySQL Performance Blog. Da mir gerade die Zeit fehlt das ganze auf deutsch nieder zu schreiben müsst ihr euch mit der [...]
Pingback :: April 14, 2007 @ 2:07 pm
[...] Als Argument nimmt diese Funktion entweder ein ‘*’ oder eine bestimmte Spalte. Im MySQL Performance Blog wird verglichen, ob man dabei COUNT(*) oder COUNT(spalte) verwenden [...]
Pingback :: April 15, 2007 @ 11:38 am
count(*) is similar to count(0) as you cas se here:
mysql> explain extended select count(*) from mytable;
+—-+————-+——-+——+—————+——+———+——+——+——————————+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——-+——+—————+——+———+——+——+——————————+
| 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
+—-+————-+——-+——+—————+——+———+——+——+——————————+
1 row in set, 1 warning (0.00 sec)
mysql> show warnings;
+——-+——+———————————————————+
| Level | Code | Message |
+——-+——+———————————————————+
| Note | 1003 | select count(0) AS `count(*)` from `database`.`mytable` |
+——-+——+———————————————————+
1 row in set (0.00 sec)
count(1) is similar to count(1) as you can se here:
mysql> explain extended select count(1) from mytable;
+—-+————-+——-+——+—————+——+———+——+——+——————————+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——-+——+—————+——+———+——+——+——————————+
| 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
+—-+————-+——-+——+—————+——+———+——+——+——————————+
1 row in set, 1 warning (0.00 sec)
mysql> show warnings;
+——-+——+———————————————————+
| Level | Code | Message |
+——-+——+———————————————————+
| Note | 1003 | select count(1) AS `count(1)` from `database`.`mytable` |
+——-+——+———————————————————+
1 row in set (0.00 sec)
So in conclusion count(*) == count(0) and count(*) != count(1)
Hope it helps.
Comment :: April 19, 2007 @ 8:29 pm
There is a post with similar problem but in russian
http://voituk.kiev.ua/2006/07/27/mysql-right-count-statement/
Comment :: April 24, 2007 @ 4:33 am
Just saw this thread and started wondering about how the fact that innodb uses clustered indexes would play into the equation. I find that, with a judicious choice of (col), count(col) may actually be a lot faster than count(*).
Consider this example (restarting mysql each time to flush the cache, and using O_DIRECT to bypass OS caching)
mysql> explain select count(*) from wp_comments;
+—-+————-+————-+——-+—————+———+———+——+——–+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+————-+——-+—————+———+———+——+——–+————-+
| 1 | SIMPLE | wp_comments | index | NULL | PRIMARY | 8 | NULL | 424672 | Using index |
+—-+————-+————-+——-+—————+———+———+——+——–+————-+
1 row in set (0.11 sec)
mysql> explain select count(rajid) from wp_comments;
+—-+————-+————-+——-+—————+——–+———+——+——–+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+————-+——-+—————+——–+———+——+——–+————-+
| 1 | SIMPLE | wp_comments | index | NULL | rajidx | 9 | NULL | 424672 | Using index |
+—-+————-+————-+——-+—————+——–+———+——+——–+————-+
1 row in set (0.00 sec)
mysql> select count(*) from wp_comments;
+———-+
| count(*) |
+———-+
| 230120 |
+———-+
1 row in set (32.57 sec)
and if you look at the buffer pool used for the count(*):
Total memory allocated 589194920; in additional pool allocated 714496
Buffer pool size 32768
Free buffers 14104
Database pages 18664
Modified db pages 0
restart mysql and check the count of (rajid):
mysql> select count(rajid) from wp_comments;
+————–+
| count(rajid) |
+————–+
| 230120 |
+————–+
1 row in set (0.28 sec)
Much faster. Why? Because it only reads the index data, not the row data for the table when you use a secondary index. Check the buffer pool usage:
Total memory allocated 589194920; in additional pool allocated 714496
Buffer pool size 32768
Free buffers 32332
Database pages 436
Modified db pages 0
The column rajid btw, is just a copy of the PK column comment_ID with a unique index:
mysql> show create table wp_comments\G
*************************** 1. row ***************************
Table: wp_comments
Create Table: CREATE TABLE `wp_comments` (
`comment_ID` bigint(20) unsigned NOT NULL auto_increment,
`comment_post_ID` int(11) NOT NULL default ‘0′,
`comment_author` tinytext NOT NULL,
`comment_author_email` varchar(100) NOT NULL default ”,
`comment_author_url` varchar(200) NOT NULL default ”,
`comment_author_IP` varchar(100) NOT NULL default ”,
`comment_date` datetime NOT NULL default ‘0000-00-00 00:00:00′,
`comment_date_gmt` datetime NOT NULL default ‘0000-00-00 00:00:00′,
`comment_content` text NOT NULL,
`comment_karma` int(11) NOT NULL default ‘0′,
`comment_approved` enum(’0′,’1′,’spam’) NOT NULL default ‘1′,
`comment_agent` varchar(255) NOT NULL default ”,
`comment_type` varchar(20) NOT NULL default ”,
`comment_parent` bigint(20) NOT NULL default ‘0′,
`user_id` bigint(20) NOT NULL default ‘0′,
`rajid` bigint(20) unsigned default NULL,
PRIMARY KEY (`comment_ID`),
UNIQUE KEY `rajidx` (`rajid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
mysql>
btw, if there is a smaller secondary index on the table, innodb is smart enough to use that for a count, even if it is not a unique index. For instance:
mysql> explain select count(*) from wp_comments;
+—-+————-+————-+——-+—————+———————-+———+——+——–+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+————-+——-+—————+———————-+———+——+——–+————-+
| 1 | SIMPLE | wp_comments | index | NULL | comment_approved_idx | 1 | NULL | 228484 | Using index |
+—-+————-+————-+——-+—————+———————-+———+——+——–+————-+
1 row in set (0.23 sec)
mysql> select count(*) from wp_comments;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect…
Connection id: 2
Current database: test
+———-+
| count(*) |
+———-+
| 230120 |
+———-+
1 row in set (1.19 sec)
mysql>
and the buffer pool usage:
Total memory allocated 589194920; in additional pool allocated 721920
Buffer pool size 32768
Free buffers 32449
Database pages 319
Modified db pages 0
there’s only 3 distinct values in the table for this column, so its not actually counting the values but the pointers from this index:
mysql> select count( distinct (comment_approved) ) from wp_comments;
+————————————–+
| count( distinct (comment_approved) ) |
+————————————–+
| 3 |
+————————————–+
1 row in set (0.32 sec)
mysql>
cheers!
–Raj.
Comment :: April 25, 2007 @ 6:22 pm
Raj,
Good point. It looks like a bug to me as Innodb automatically picks up “shorter” indexes while in practice even first index you tried is shorter.
The other interesting point I should note - smaller index may not be faster for huge tables, because if you insert data in order primary key tree is often sequential while secondary indexes may be scattered having few sequential pages.
Comment :: April 26, 2007 @ 9:34 am
What about count(id) (id is primary key)?
Is count(id) slower than count(*)?
Using innodb + where and myisam + where;
Comment :: April 28, 2007 @ 4:02 pm
Pavel,
For primary key it should be the same as primary key can’t have null values.
It is same as count(val2) in my example.
Comment :: April 29, 2007 @ 7:05 pm
[...] Never do a COUNT(*) (or anything *, says Zach). Instead, replace the * with the name of the column you’re searching against (and is hopefully indexed). That way some queries can execute entirely in the keycache (while * forces MySQL to read every matching row from the table). [...]
Pingback :: June 6, 2007 @ 5:12 am
hello, can i say:
the rule of thumb is…
1. use count(primary_key)
2. use count(indexed_field_and_not_null)
3. use count(indexed_field)
4. use count(*)
how abt the length of the index, does it matter?
thanks.
Comment :: June 23, 2007 @ 3:33 am
[...] هذا الموضوع بالنسبة ل count أنصح بمتابعة هذه المدونه COUNT(*) vs COUNT(col) | MySQL Performance Blog __________________ I Love PHP (d4d@hotmail.com) [...]
Pingback :: September 12, 2008 @ 4:31 am