How to find MySQL queries worth optimizing ?

One question I often get is how one can find out queries which should be optimized. By looking at pt-query-digest report it is easy to find slow queries or queries which cause the large portion of the load on the system but how do we know whenever there is any possibility to make this query run better ? The full answer to this question will indeed require complex analyses as there are many possible ways query can be optimized. There is however one extremely helpful metric which you can use – ratio between rows sent and rows analyzed. Lets look at this example:

# Time: 120911 17:09:44
# User@Host: root[root] @ localhost []
# Thread_id: 64914  Schema: sbtest  Last_errno: 0  Killed: 0
# Query_time: 9.031233  Lock_time: 0.000086  Rows_sent: 0  Rows_examined: 10000000  Rows_affected: 0  Rows_read: 0
# Bytes_sent: 213  Tmp_tables: 0  Tmp_disk_tables: 0  Tmp_table_sizes: 0
# InnoDB_trx_id: 12F03
use sbtest;
SET timestamp=1347397784;
select * from sbtest where pad='abc';

# Time: 120911 17:09:44

# User@Host: root[root] @ localhost []

# Thread_id: 64914 Schema: sbtest Last_errno: 0 Killed: 0

# Query_time: 9.031233 Lock_time: 0.000086 Rows_sent: 0 Rows_examined: 10000000 Rows_affected: 0 Rows_read: 0

# Bytes_sent: 213 Tmp_tables: 0 Tmp_disk_tables: 0 Tmp_table_sizes: 0

# InnoDB_trx_id: 12F03

use sbtest;

SET timestamp=1347397784;

select * from sbtest where pad='abc';

The query in this case has sent zero rows (as there are no matches) but it had to examine 10Mil rows to produce result. What would be good scenario ? – query examining same amount of rows as they end up sending. In this case if I index the table I get the following record in the slow query log:

# Time: 120911 17:18:05
# User@Host: root[root] @ localhost []
# Thread_id: 65005  Schema: sbtest  Last_errno: 0  Killed: 0
# Query_time: 0.000323  Lock_time: 0.000095  Rows_sent: 0  Rows_examined: 0  Rows_affected: 0  Rows_read: 0
# Bytes_sent: 213  Tmp_tables: 0  Tmp_disk_tables: 0  Tmp_table_sizes: 0
# InnoDB_trx_id: 12F14
SET timestamp=1347398285;
select * from sbtest where pad='abc';

# Time: 120911 17:18:05

# User@Host: root[root] @ localhost []

# Thread_id: 65005 Schema: sbtest Last_errno: 0 Killed: 0

# Query_time: 0.000323 Lock_time: 0.000095 Rows_sent: 0 Rows_examined: 0 Rows_affected: 0 Rows_read: 0

# Bytes_sent: 213 Tmp_tables: 0 Tmp_disk_tables: 0 Tmp_table_sizes: 0

# InnoDB_trx_id: 12F14

SET timestamp=1347398285;

select * from sbtest where pad='abc';

Rows_examined=0 same as Rows_sent meaning this query is optimized quite well. Note you may be thinking in this case there is no database access happening at all – you would be wrong. The index lookup is being perform but as only actual rows which are found and returned up to the top level MySQL part for processing are counted the Rows_examined remains zero.

It looks simple so far but it also a huge oversimplification. You can do such simple math only to the queries without aggregate functions/group by and only to ones which examine one table only. What is about queries which query more than one table ?

# Time: 120911 17:25:22
# User@Host: root[root] @ localhost []
# Thread_id: 65098  Schema: sbtest  Last_errno: 0  Killed: 0
# Query_time: 0.000234  Lock_time: 0.000063  Rows_sent: 1  Rows_examined: 1  Rows_affected: 0  Rows_read: 1
# Bytes_sent: 719  Tmp_tables: 0  Tmp_disk_tables: 0  Tmp_table_sizes: 0
# InnoDB_trx_id: 12F1D
SET timestamp=1347398722;
select * from sbtest a,sbtest b where a.id=5 and b.id=a.k;

mysql> explain select * from sbtest a,sbtest b where a.id=5 and b.id=a.k;
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref   | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
|  1 | SIMPLE      | a     | const | PRIMARY,k     | PRIMARY | 4       | const |    1 |       |
|  1 | SIMPLE      | b     | const | PRIMARY       | PRIMARY | 4       | const |    1 |       |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+
2 rows in set (0.00 sec)

# Time: 120911 17:25:22

# User@Host: root[root] @ localhost []

# Thread_id: 65098 Schema: sbtest Last_errno: 0 Killed: 0

# Query_time: 0.000234 Lock_time: 0.000063 Rows_sent: 1 Rows_examined: 1 Rows_affected: 0 Rows_read: 1

# Bytes_sent: 719 Tmp_tables: 0 Tmp_disk_tables: 0 Tmp_table_sizes: 0

# InnoDB_trx_id: 12F1D

SET timestamp=1347398722;

select * from sbtest a,sbtest b where a.id=5 and b.id=a.k;

mysql> explain select * from sbtest a,sbtest b where a.id=5 and b.id=a.k;

+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+

+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+

| 1 | SIMPLE | a | const | PRIMARY,k | PRIMARY | 4 | const | 1 | |

| 1 | SIMPLE | b | const | PRIMARY | PRIMARY | 4 | const | 1 | |

+----+-------------+-------+-------+---------------+---------+---------+-------+------+-------+

2 rows in set (0.00 sec)

In this case we actually join 2 tables but because the access type to the tables is “const” MySQL does not count it as access to two tables. In case of “real” access to the data it will:

# Time: 120911 17:28:12
# User@Host: root[root] @ localhost []
# Thread_id: 65099  Schema: sbtest  Last_errno: 0  Killed: 0
# Query_time: 0.000273  Lock_time: 0.000052  Rows_sent: 1  Rows_examined: 2  Rows_affected: 0  Rows_read: 1
# Bytes_sent: 719  Tmp_tables: 0  Tmp_disk_tables: 0  Tmp_table_sizes: 0
# InnoDB_trx_id: 12F23
SET timestamp=1347398892;
select * from sbtest a,sbtest b where a.k=2 and b.id=a.id;

+----+-------------+-------+--------+---------------+---------+---------+-------------+------+-------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref         | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+-------------+------+-------+
|  1 | SIMPLE      | a     | ref    | PRIMARY,k     | k       | 4       | const       |    1 |       |
|  1 | SIMPLE      | b     | eq_ref | PRIMARY       | PRIMARY | 4       | sbtest.a.id |    1 |       |
+----+-------------+-------+--------+---------------+---------+---------+-------------+------+-------+
2 rows in set (0.00 sec)

# Time: 120911 17:28:12

# User@Host: root[root] @ localhost []

# Thread_id: 65099 Schema: sbtest Last_errno: 0 Killed: 0

# Query_time: 0.000273 Lock_time: 0.000052 Rows_sent: 1 Rows_examined: 2 Rows_affected: 0 Rows_read: 1

# Bytes_sent: 719 Tmp_tables: 0 Tmp_disk_tables: 0 Tmp_table_sizes: 0

# InnoDB_trx_id: 12F23

SET timestamp=1347398892;

select * from sbtest a,sbtest b where a.k=2 and b.id=a.id;

+----+-------------+-------+--------+---------------+---------+---------+-------------+------+-------+

+----+-------------+-------+--------+---------------+---------+---------+-------------+------+-------+

| 1 | SIMPLE | a | ref | PRIMARY,k | k | 4 | const | 1 | |

| 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 4 | sbtest.a.id | 1 | |

+----+-------------+-------+--------+---------------+---------+---------+-------------+------+-------+

2 rows in set (0.00 sec)

In this case we have 2 rows analyzed for each row set which is expected as we have 2 (logical) tables used in the query.

This rule also does not work if you have any group by in the query:

# Time: 120911 17:31:48
# User@Host: root[root] @ localhost []
# Thread_id: 65144  Schema: sbtest  Last_errno: 0  Killed: 0
# Query_time: 5.391612  Lock_time: 0.000121  Rows_sent: 2  Rows_examined: 10000000  Rows_affected: 0  Rows_read: 2
# Bytes_sent: 75  Tmp_tables: 0  Tmp_disk_tables: 0  Tmp_table_sizes: 0
# InnoDB_trx_id: 12F24
SET timestamp=1347399108;
select count(*) from sbtest group by k;

# Time: 120911 17:31:48

# User@Host: root[root] @ localhost []

# Thread_id: 65144 Schema: sbtest Last_errno: 0 Killed: 0

# Query_time: 5.391612 Lock_time: 0.000121 Rows_sent: 2 Rows_examined: 10000000 Rows_affected: 0 Rows_read: 2

# Bytes_sent: 75 Tmp_tables: 0 Tmp_disk_tables: 0 Tmp_table_sizes: 0

# InnoDB_trx_id: 12F24

SET timestamp=1347399108;

select count(*) from sbtest group by k;

This only sends 2 rows while scanning 10 million, while we can’t really optimize this query in a simple way because scanning all that rows are actually needed to produce group by results.
What you can think about in this case is removing group by and aggregate functions. Then query would become “select * from sbtest” which would send all 10M rows and hence there is no ways to simply optimize it.

This method does not only provide you with “yes or no” answer but rather helps to understand how much optimization is possible. For example I might have query which uses some index scans 1000 rows and sends 10… I still might have opportunity to reduce amount of rows it scans 100x, for example by adding combined index.

So what is the easy way to see if query is worth optimizing ?
– see how many rows query sends after group by, distinct and aggregate functions are removed (A)
– look at number of rows examined divided by number of tables in join (B)
– if B is less or equals to A your query is “perfect”
– if B/A is 10x or more this query is a very serious candidate for optimization.

This is simple method and it can be used with pt-query-digest very well as it reports not only average numbers but also the outliers.

7 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Justin Swanhart

11 years ago

The count(*) group by k could be optimized slightly by adding an index to k. That will at least give you a full index scan instead of a full table scan which can reduce io if the table is wide.

Peter Zaitsev

Author

11 years ago

Justin…. yes. There are many things including sorting covering indexes etc which it does not cover. The statement here is perhaps – queries benefit from reducing number of rows they examine, everything else being equal. And also what in the perfect situation you can have 1 row examined for row sent

khan

11 years ago

How do you see the following query that doesn’t examine or send any rows but still takes longer?

hron84

11 years ago

@khan I think delete is a little special, it will not send or examine rows.

What coming in my mind is the eventtime itself. As I see it is a string. However, I think it can be better if that column become a DATETIME or INTEGER (unix timestamp) value, because scanning on these rows can be faster.

khan

11 years ago

@hron84, eventtime is already stored as INTEGER.

MatteoSp

11 years ago

Cool,
can you give a little advice on how to analyze an AWS RDS instance logs?

thanks
m.

Gleb Deykalo

10 years ago

Hi Peter,

Could you please help me with some query execution plan? It looks pretty simple at first glance, but…

The table is very simple:

CREATE TABLE test_idx (
a int(11) DEFAULT NULL,
b int(11) DEFAULT NULL,
c int(11) DEFAULT NULL,
KEY idx_cover_all (a,b,c)
) ENGINE=InnoDB;

Test data:
insert into test_idx values (1, 1, 1), (2, 2, 2), (3, 3, 3), (1, 2, 3), (3, 2, 1), (2, 3, 1), (2, 1, 3), (1, 3, 2), (1, 2, 2);

When I ask for an exact row, it uses index and join type is REF:

mysql> explain select c from test_idx where a = 2 and b = 2 and c = 2\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: test_idx
type: ref
possible_keys: idx_cover_all
key: idx_cover_all
key_len: 15
ref: const,const,const
rows: 1
Extra: Using where; Using index

(1) But why do I see “Using where”? I can not understand why engine (InnoDB) can not filter row inside.

When I ask for a rows with little more complicated query, it confuses me even more:
mysql> explain select c from test_idx where a = 2 and b = 2 and c IN (1, 2, 3)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: test_idx
type: range
possible_keys: idx_cover_all
key: idx_cover_all
key_len: 15
ref: NULL
rows: 3
Extra: Using where; Using index

Join type “range”, but I know that MySQL uses it for IN queries even if it is not actually range, so it is OK. But…

(2) No reference fields — why?
(3) Still can see “User where” — why?

Why InnoDB can not be sure it found exactly correct row? Or is it MySQL server does not believe InnoDB and want to check data on its side?

I can reproduce it both on Percona Server 5.1 and Percona Server 5.5
5.1.54-rel12.5-log Percona Server with XtraDB (GPL), Release 12.5, Revision 188
5.5.21-55 Percona Server (GPL), Release 25.1

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

How to find MySQL queries worth optimizing ?

Related

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

Valkey/Redis: Not-So-Good Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

How to find MySQL queries worth optimizing ?

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

Valkey/Redis: Not-So-Good Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation