MySQL VIEW as performance troublemaker
I start to see applications being built utilizing VIEWs functionality which appeared in MySQL 5.0 and quite frequently VIEWs are used to help in writing the queries - to keep queries simple without really thinking how it affects server performance.
Even worse than that - looking at the short table which just gets single row from the table by the key we think this is simple query, while can be real monster instead with complexity hidden away in VIEW definition.
Just another day I worked on optimizing application which uses VIEWs and was looking at the long running query which just joined 2 tables... I ran EXPLAIN for it and got 200 of rows in the result set just for explain due to several layers of cascaded views built on top of one another so it is easy to write the queries, some of them it turn used subqueries subselects and derived tables.
It is also very dangerous if you assume MySQL would optimize your VIEWs same way as more advanced database systems would. Same as with subqueries and derived tables MySQL 5.0 will fail and perform very inefficiently in many counts.
MySQL has two ways of handling the VIEWS - query merge, in which case VIEW is simply expanded as a macro or Temporary Table in which case VIEW is materialized to temporary tables (without indexes !) which is later used further in query execution.
There does not seems to be any optimizations applied to the query used for temporary table creation from the outer query and plus if you use more then one Temporary Tables views which you join together you may have serious issues because such tables do not get any indexes.
Let me now show couple of examples.
Assume we have the comments table which holds users comments to the blog, naturally containing user_id which left comment, comment_id and comment text:
-
CREATE TABLE `comments` (
-
`user_id` int(10) UNSIGNED NOT NULL,
-
`comment_id` int(10) UNSIGNED NOT NULL,
-
`message` text NOT NULL,
-
PRIMARY KEY (`user_id`,`comment_id`)
-
) ENGINE=MyISAM DEFAULT CHARSET=latin1
So how would you get number of comments left by the given user ?
-
mysql> SELECT count(*) FROM comments WHERE user_id=5;
-
+----------+
-
| count(*) |
-
+----------+
-
| 1818 |
-
+----------+
-
1 row IN SET (0.00 sec)
So how would we solve the same problem having things more modular and using MySQL VIEWs ?
-
mysql> CREATE VIEW user_counts AS SELECT user_id,count(*) cnt FROM comments GROUP BY user_id;
-
Query OK, 0 rows affected (0.00 sec)
-
-
mysql> SELECT * FROM user_counts WHERE user_id=5;
-
+---------+------+
-
| user_id | cnt |
-
+---------+------+
-
| 5 | 1818 |
-
+---------+------+
-
1 row IN SET (0.95 sec)
So we create the view which gives us back counts for each user and can simply query from that table restricting by user_id.
If this would be handled properly inside MySQL there would be even good reason to do that - so later you can change your application and convert user_count to summary table avoid changing any queries directly. Unfortunately it does not work.
It is interesting to see EXPLAIN for such query and time for the query which fetches everything from the VIEW - it is almost the same as getting only one row, and note even EXPLAIN takes same amount of time:
-
mysql> EXPLAIN SELECT * FROM user_counts WHERE user_id=5 \G
-
*************************** 1. row ***************************
-
id: 1
-
select_type: PRIMARY
-
TABLE: <derived2>
-
type: ALL
-
possible_keys: NULL
-
KEY: NULL
-
key_len: NULL
-
ref: NULL
-
rows: 1001
-
Extra: USING WHERE
-
*************************** 2. row ***************************
-
id: 2
-
select_type: DERIVED
-
TABLE: comments
-
type: INDEX
-
possible_keys: NULL
-
KEY: PRIMARY
-
key_len: 8
-
ref: NULL
-
rows: 1792695
-
Extra: USING INDEX
-
2 rows IN SET (0.96 sec)
-
-
-
mysql> SELECT * FROM user_counts;
-
+---------+------+
-
| user_id | cnt |
-
+---------+------+
-
| 0 | 850 |
-
| 1 | 1790 |
-
| 2 | 1777 |
-
| 3 | 1762 |
-
| 4 | 1784 |
-
....
-
-
| 999 | 1808 |
-
| 1000 | 898 |
-
+---------+------+
-
1001 rows IN SET (0.96 sec)
So now lets create a very artificial query which will JOIN 2 views just to see how indexes are used:
-
mysql> EXPLAIN SELECT uc.cnt+uc2.cnt FROM user_counts uc, user_counts uc2 WHERE uc.user_id=uc2.user_id AND uc.user_id=5 \G
-
*************************** 1. row ***************************
-
id: 1
-
select_type: PRIMARY
-
TABLE: <derived2>
-
type: ALL
-
possible_keys: NULL
-
KEY: NULL
-
key_len: NULL
-
ref: NULL
-
rows: 1001
-
Extra: USING WHERE; USING JOIN cache
-
*************************** 2. row ***************************
-
id: 1
-
select_type: PRIMARY
-
TABLE: <derived3>
-
type: ALL
-
possible_keys: NULL
-
KEY: NULL
-
key_len: NULL
-
ref: NULL
-
rows: 1001
-
Extra: USING WHERE
-
*************************** 3. row ***************************
-
id: 3
-
select_type: DERIVED
-
TABLE: comments
-
type: INDEX
-
possible_keys: NULL
-
KEY: PRIMARY
-
key_len: 8
-
ref: NULL
-
rows: 1792695
-
Extra: USING INDEX
-
*************************** 4. row ***************************
-
id: 2
-
select_type: DERIVED
-
TABLE: comments
-
type: INDEX
-
possible_keys: NULL
-
KEY: PRIMARY
-
key_len: 8
-
ref: NULL
-
rows: 1792695
-
Extra: USING INDEX
-
4 rows IN SET (1.91 sec)
As you can see we get 2 derived tables in which case which are fully populated and "full join" used to to join between them.
In this particular case it is not that bad because "join cache" is used to perform it relatively efficient, however for large derived tables it will become nightmare.
So be very careful implementing MySQL VIEWs in your application, especially ones which require temporary table execution method. VIEWs can be used with very small performance overhead but only in case they are used with caution.
MySQL has long way to go getting queries with VIEWs properly optimized.
20 Comments
Trackbacks/Pingbacks
- Jay Pipes
Log Buffer #57: A Carnival of the Vanities for DBAs... Dave Edwards once again has given me the privilege of writing this week's Log Buffer, and oooooh it's a jam-packed one. I think there's something for everyone in this week's issue. Buzz in the MySQL Blogosphere... The MySQL blogosphere was... - TSCHITSCHEREENGREEN live » Blog Archive » MySQL - Performance mit Views
[...] bin gerade über einen sehr spannenden Blogeintrag bzgl. MySQL Views [...] - Dancing Mammoth :: Blog Archive :: Derived Attributes with UNION
[...] MySQL 5.0 or above, you won’t be able to mitigate this problem by using a VIEW. MySQL is not very good at optimizing views. If there is not a one-to-one relationship between the rows of your view and the rows of the [...] - View vs. Join - php.de
[...] @dsmcg: Meinst Du den "Sonderfall", dass Du immer (oder zumindest sehr oft) das exakt selbe JOIN ohne veränderliche Parameter ausführst? Dann könnte Dir ein View zwei Vorteile bieten: Du musst Dich nicht auf den Query Cache verlassen, da der Query Plan getrennt und persistent abgespeichert werden könnte. Und Du verbirgst die eigentlichen Datenquellen vor der Applikation. Wenn Du Dich also entschließen solltest, dass die Datenstruktur zu kompliziert ist und Du den Aufbau der Datenbank ändern willst, musst Du die Applikation nicht anpassen. Soviel zur Theorie... in der Praxis sieht das gleich noch mal anders aus. Bei handgepflegten, handoptimierten (Oracle-)Datenbanken kann das richtig Klasse sein. Wenn Du den Datenbankserver einfach machen lässt, kannst Du schnell unerwartete Falschhälse schaffen, siehe zum Beispiel MySQL VIEW as performance troublemaker | MySQL Performance Blog [...] - Enabling Implicit Cast From Integer To Boolean in PostgreSQL | chrisspenblog
[...] complicated queries, and I’m running into a few obnoxious limitations, mostly involving known view performance problems. I’ve decided to start testing the waters in PostgreSQL, so I tried importing my data into PG [...]











del.icio.us
digg
Hi Peter,
Good article as always. Your second code example has some text comments in it that don’t line-wrap and are hard to read.
Comment :: August 12, 2007 @ 4:56 pm
I actually had previously utilized MySQL’s VIEWs in a project when they had just came out. I figured that they would actually optimize how other databases would however what I came to find was that it was one of the worst performance bottlenecks as my data set grew. What I came to find was that MySQL would join all of the data together first and then it would run through the search criteria.
Comment :: August 13, 2007 @ 3:25 pm
We recently converted from Access to MySQL. In this transition, lot of queries were written as views in MySQL. The performance was so bad that I decided to rewrite all of them using the base tables directly. As you aptly said, MySQL does have a long way to go as far as views are concerned.
Comment :: August 15, 2007 @ 1:32 pm
I think the problem is not the VIEWs themselves, but how one attempted to use it.
Putting an aggregate function into a VIEW seemed like a bad choice to begin with (especially one that forced a table scan for each use), and that bad initial choice is compounded if one plans on using it with any frequency. At that cost, you may as well have just written a trigger on the comments table to update a user_counts table (or re-generate it entirely using the query you used for your view) for each insert or delete. You could read the now summarized user_count table (that only regenerates when needed) with the benefits of an index to get a specific user’s posting count.
You do point out a good thing, and that is that people need to really think through what they are trying to ultimately get to and figure out a way to get that data as efficiently as possible. Just using a mechanism (VIEWs) simply because they are there and available to use does not make them necessarily a good choice. Also, don’t put queries into a VIEW that you would not deem efficient to run normally.
Comment :: September 6, 2007 @ 12:37 pm
Chrisk,
It is both. MySQL Views could be optimized better. You can use MySQL Views wisely only in cases when they are optimized well.
Many people do not think about performance until it starts to hurt badly so they just write queries (and put them in the views) in a way it gets them info they want easiest way.
Comment :: September 7, 2007 @ 3:01 pm
“Many people do not think about performance until it starts to hurt badly so they just write queries (and put them in the views) in a way it gets them info they want easiest way.”
That could be construed as a feature, not a bug. You know what they say about premature optimization.
Comment :: November 15, 2007 @ 3:54 pm
Well… Not optimizing beyond the need is one thing. Thinking you never would need to optimize is completely different.
Comment :: November 15, 2007 @ 4:23 pm
If I have a view used to join data (via a UNION select) stored in separate tables, then use a where clause when selecting from the view, is that a case where MySQL would be inefficient? I don’t want it to create a temporary table of ALL the data in ALL the tables, I really was hoping for more of a macro-expand kind of thing.
Comment :: November 20, 2007 @ 9:48 am
i see VIEWS like a good alternative to caching some of the queries. Example: you have a site, with a `categories` menu (parent->childs) that are conditioned to appear in the menu by the number of `products` they reffer to. Every time a user loads a page you load that menu. If you have a large database, you don’t want to execute each time that menu (query), so you only SELECT the view. i think this is a primary use of VIEWS besides caching reports that don’t contain indexes but full values. please correct me if i’m wrong.
Comment :: January 12, 2008 @ 3:12 am
How would you use views to cache queries in MySQL ?
MySQL does not have materialized views so whenever you access view it will be always reevaluated.
Comment :: January 22, 2008 @ 4:47 am
I’ve been working with MySQL Views since release 5 was first made available.
It took me and my team about 2 months of testing to understand that it is just too soon to use Views.
What we did as a solution was to create an extra layer between PHP and MySQL (written in PHP). Hardcoding the complex queries in functions, with our own caching logic. This made it for us, for now.
Comment :: February 9, 2008 @ 1:53 pm
Have MySQL views improved at all – for any version?
Comment :: March 18, 2008 @ 8:25 pm
Not in 5.1 at least. 6.0 have some optimizations which would affect views but I’m not sure if there are general fixes.
Comment :: March 18, 2008 @ 8:44 pm
Ascanio and others: Try using Stored Procedure, for this example.
DELIMITER $$
DROP PROCEDURE IF EXISTS `test2`.`user_count` $$
CREATE DEFINER=`root`@`localhost` PROCEDURE `user_count`(param1 int)
BEGIN
SELECT count(*) FROM comments WHERE user_id=param1;
END $$
DELIMITER ;
Run the above code
then CALL user_count(5); It will run SELECT count(*) FROM comments WHERE user_id=5; and just as fast as the straight select (from my testing less than 0.001 milisecond difference, sometimes the CALL was faster, but thats just due to random flux you might notice a difference if you were making thousands of calls a second..
Comment :: April 10, 2008 @ 6:50 am
Views meant to hide the complexities in the user’s side. This helps you to shorten your query. Optimization is to be done from the MySQL server itself. But have you ever heard of something like indexing a view? Something else? As a developer, performance too matters.
Comment :: April 30, 2008 @ 4:23 am
This post makes it clear that indexes will not be used for views that use temporary tables. For views that do not use temporary tables, will ‘joining’ or ‘whereing’ on a column that is indexed in the underlying table use that index?
Comment :: July 12, 2008 @ 6:15 pm
nice articel… thanks for your backmarking…
Comment :: February 24, 2009 @ 8:32 am
LOL, very useless article!
Author try to compare two uncompirable things!
Look here: Simple select is slower then view-select (ROFL):
==================================================
Simple select:
mysql> select count(*) from (select * from comments where user_id in (select user_id from comments where user_id = 5 group by user_id)) t2;
+———-+
| count(*) |
+———-+
| 100 |
+———-+
1 row in set (3.71 sec)
And from VIEW (CREATE VIEW user_counts AS SELECT * FROM comments):
mysql> select count(*) from user_counts where user_id = 5;
+———-+
| count(*) |
+———-+
| 100 |
+———-+
1 row in set (0.00 sec)
==================================================
Author’s problem NOT IN VIEW at all, but in using a view for non-view-based operations!
VIEW – mechanism to make a pseudo-tables for database users (for example, to make four different client lists based on three tables (account, client, client-address) tables: client’s names list, client’s address list, client’s detail info list. On this lists client-side-software maps grids, combo-boxes and other.
In author’s example, VIEW works slowly ONLY AND ONLY BECOUSE autor use a GROUP BY statement in VIEW!
PS: sorry for my english, i’m not from english-speaking country
Comment :: August 24, 2009 @ 9:19 am
Blakkky, you should learn the execution plan differences between joins and correlated subqueries.
Comment :: August 24, 2009 @ 10:55 am
Hmm, it appears that using views which could be a quicker way to make db calls especially in joins and subqueries… never realized how bad the performance is by using them. I wonder if this performance issue also occurs when creating triggers and stored procedures.
Comment :: September 9, 2009 @ 5:31 am