MySQL supports two different algorithms for views: the MERGE algorithm and the TEMPTABLE algorithm. These two algorithms differ greatly. A view which uses the MERGE algorithm can merge filter conditions into the view query itself. This has significant performance advantages over TEMPTABLE views. A view which uses the TEMPTABLE algorithm will have to compute the entire contents of the view before any filter is applied. Computing the entire contents requires a temporary table and many more rows may be accessed than otherwise would had the filter been merged into the view query.
As long as a view avoids aggregation, DISTINCT, GROUP BY, ORDER BY, etc, then it can use the MERGE algorithm. Unfortunately, this means that views of any significant complexity will almost always use the TEMPTABLE algorithm.
This blog post demonstrates a workaround that allows the convenience of complex views, including those with aggregation (that is views which use the TEMPTABLE algorithm) with some of the performance advantages of MERGE algorithm views.
As a demonstration, consider the following table which contains a combination of integer values. There are enough values in the table such that a scan of all rows takes a second or two.
1 2 3 4 5 6 7 8 9 10 | mysql> show create table t2\G *************************** 1. row *************************** Table: t2 Create Table: CREATE TABLE `t2` ( `c1` int(11) DEFAULT NULL, `c2` int(11) NOT NULL AUTO_INCREMENT, PRIMARY KEY (`c2`), KEY `c1` (`c1`) ) ENGINE=InnoDB AUTO_INCREMENT=2949071 DEFAULT CHARSET=latin1 1 row in set (0.00 sec) |
If we define a view which uses no aggregation, then MySQL will let us use the MERGE algorithm. Notice that there are no warnings:
1 2 | mysql> create algorithm=merge view v1 as select c1, c2 from t2; Query OK, 0 rows affected (0.11 sec) |
If we then write a SQL query to aggregate over the rows of the view and we add a filter condition on c1, the view is accessed quicky:
1 2 3 4 5 6 7 | mysql> select c1, count(*) from v1 where c1 = 10 group by c1; +------+----------+ | c1 | count(*) | +------+----------+ | 10 | 130 | +------+----------+ 1 row in set (0.01 sec) |
We can use EXPLAIN EXTENDED to see that MySQL rewrites the view query to include the restriction:
1 2 3 4 5 6 7 | mysql> explain extended select c1, count(*) from v1 where c1 = 10 group by c1; +----+-------------+-------+------+---------------+------+---------+-------+------+----------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------+---------------+------+---------+-------+------+----------+--------------------------+ | 1 | SIMPLE | t2 | ref | c1 | c1 | 5 | const | 130 | 100.00 | Using where; Using index | +----+-------------+-------+------+---------------+------+---------+-------+------+----------+--------------------------+ 1 row in set, 1 warning (0.00 sec) |
Notice the rewritten SQL query:
1 2 3 4 5 6 7 | mysql> show warnings; +-------+------+---------------------------------------------------------------------------------------------------------------------------------+ | Level | Code | Message | +-------+------+---------------------------------------------------------------------------------------------------------------------------------+ | Note | 1003 | select `test`.`t2`.`c1` AS `c1`,count(0) AS `count(*)` from `test`.`t2` where (`test`.`t2`.`c1` = 10) group by `test`.`t2`.`c1` | +-------+------+---------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) |
Now, lets try to use the MERGE algorithm with an aggregation query:
1 2 | mysql> create algorithm=MERGE view v2 as select c1, count(*) from v1 group by c1; Query OK, 0 rows affected, 1 warning (0.01 sec) |
Notice that MySQL sets the ALGORITHM=UNDEFINED, and will chose to use TEMPTABLE when the view is actually used:
1 2 3 4 5 6 7 | mysql> show warnings; +---------+------+-------------------------------------------------------------------------------+ | Level | Code | Message | +---------+------+-------------------------------------------------------------------------------+ | Warning | 1354 | View merge algorithm can't be used here for now (assumed undefined algorithm) | +---------+------+-------------------------------------------------------------------------------+ 1 row in set (0.00 sec) |
When you try to restrict the second view using a filter on c1, performance is not good. MySQL is scanning the entire table to produce the result:
1 2 3 4 5 6 7 | mysql> select * from v2 where c1 = 10; +------+----------+ | c1 | count(*) | +------+----------+ | 10 | 130 | +------+----------+ 1 row in set (1.66 sec) |
Even a query with an impossible where clause takes a long time to process:
1 2 | mysql> select * from v2 where c1 = 100; Empty set (1.64 sec) |
You can see that MySQL is accessing millions of rows:
1 2 3 4 5 6 7 8 | mysql> explain select * from v2 where c1 = 100; +----+-------------+------------+-------+---------------+------+---------+------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+-------+---------------+------+---------+------+---------+-------------+ | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 1310720 | Using where | | 2 | DERIVED | t2 | index | NULL | c1 | 5 | NULL | 2785914 | Using index | +----+-------------+------------+-------+---------------+------+---------+------+---------+-------------+ 2 rows in set (1.76 sec) |
Now to the actual workaround. MySQL won’t merge query filters into the view definition, but we can modify the physical definition of the view in a way which provides a different means of providing filter conditions to the query.
Lets start by creating a parameter table. This table might be view specific or you could construct it in such a way that it can be shared between views. This example is somewhere in between those two extremes:
1 2 3 4 5 6 7 | mysql> create table params ( view_name varchar(250), param1_val int, connection_id bigint, primary key(connection_id, view_name) ) engine=innodb; Query OK, 0 rows affected (0.10 sec) |
Notice that MySQL still won’t let us use the MERGE algorithm (1 warning):
1 2 3 4 5 6 7 8 | create alorithm=merge view v2 as select c1, count(*) from v1 join params p on p.connection_id = connection_id() and p.view_name = 'test.v2' and p.param1_val = v1.c1 group by c1; Query OK, 0 rows affected, 1 warning (0.01 sec) |
There are two important things to consider here. First, notice the use of connection_id() in the view definition. This makes sure that the only rows examined in the params table are those which belong to this connection. When the table is inserted too (see below) the connection_id() function will be used to generate the value for the connection_id column. Also notice the restriction that v1.c1 = p.param1_val. This restricts the query to only those rows where v1.c1 matches the value in the params table.
Place parameters into the params table using the REPLACE statement:
1 2 | mysql> replace into params (`view_name`, `param1_val`, `connection_id`) values ('test.v2', 10, connection_id()); Query OK, 1 row affected (0.00 sec) |
When MySQL executes the new view it will be able to restrict the rows examined based on the param1_val value in the params table:
1 2 3 4 5 6 7 | mysql> select * from v2; +------+----------+ | c1 | count(*) | +------+----------+ | 10 | 130 | +------+----------+ 1 row in set (0.01 sec) |
Notice the vastly different plan over the old one. We no longer examine millions of rows:
1 2 3 4 5 6 7 8 9 | mysql> explain select * from v2 where c1 = 10; +----+-------------+------------+--------+---------------+---------+---------+-------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+--------+---------------+---------+---------+-------+------+--------------------------+ | 1 | PRIMARY | <derived2> | system | NULL | NULL | NULL | NULL | 1 | | | 2 | DERIVED | p | const | PRIMARY | PRIMARY | 260 | | 1 | | | 2 | DERIVED | t2 | ref | c1 | c1 | 5 | const | 130 | Using where; Using index | +----+-------------+------------+--------+---------------+---------+---------+-------+------+--------------------------+ 3 rows in set (0.01 sec) |
In review, I like views for their convenient encapsulation of SQL logic that would usually have to be repeated in SQL statements throughout the application otherwise. The convenience sometimes comes with high cost though, particularly when the TEMPTABLE algorithm is used. The technique outlined in this post can be used to make a trade off between increased query complexity for improved view performance.
It’s unbelievable how bad the mysql query execution engine is. MS SQL, Oracle and even Postgres do not have problems like this at all. Of course the can optimize across views. And they have merge and hash join so you are not forced to join manually in PHP like some very big sites are forced to… I cannot believe that MySql still does not have a better query execution engine.
Justin,
Nice and creative workaround. I never thought about it. Though I’m not sure I would use it – VIEWs can simplify how things look and how developers can access data, in this case it however complicate things a lot. Why not to use direct query in this case ?
Hi Peter,
Mostly it was just a cool trick that I thought up and I thought I would share it. There are lots of limitations on versatility such as it wouldn’t really work if you had read_only=true on a slave for example.
One place where I think it might be useful is in ETL or ELT processes. Views can be very convenient for ETL purposes as they abstract the SQL for sometimes complex relationships. Adding a parameter table for such an ETL process would be pretty straightforward as the ETL flow could simply insert parameters into the parameter table. Such parameters might be used to get the total order amount for a particular order number that has changed for example.
Justin,
Nice one. I’ve encountered a similar solution, please see Views: better performance with condition pushdown.
Towards the end I present with a use case I’ve seen.
Shlomi,
Wow, nice post Shlomi. I didn’t think about the fact that stored functions or UDF could be called in the query and provide session state information.
I have a UDF around somewhere called get_session_var(). I need to dig that up.
Hi, good idea but what to you suggest in tables larger than 2 GB. In our case Mysql creates a temp table on the hard drive (we defined 1GB temp table size) so the query takes 7 sec.
Any idea?
Olaf