August 12, 2007

MySQL VIEW as performance troublemaker

Posted by peter |

I start to see applications being built utilizing VIEWs functionality which appeared in MySQL 5.0 and quite frequently VIEWs are used to help in writing the queries - to keep queries simple without really thinking how it affects server performance.

Even worse than that - looking at the short table which just gets single row from the table by the key we think this is simple query, while can be real monster instead with complexity hidden away in VIEW definition.

Just another day I worked on optimizing application which uses VIEWs and was looking at the long running query which just joined 2 tables... I ran EXPLAIN for it and got 200 of rows in the result set just for explain due to several layers of cascaded views built on top of one another so it is easy to write the queries, some of them it turn used subqueries subselects and derived tables.

It is also very dangerous if you assume MySQL would optimize your VIEWs same way as more advanced database systems would. Same as with subqueries and derived tables MySQL 5.0 will fail and perform very inefficiently in many counts.

MySQL has two ways of handling the VIEWS - query merge, in which case VIEW is simply expanded as a macro or Temporary Table in which case VIEW is materialized to temporary tables (without indexes !) which is later used further in query execution.
There does not seems to be any optimizations applied to the query used for temporary table creation from the outer query and plus if you use more then one Temporary Tables views which you join together you may have serious issues because such tables do not get any indexes.

Let me now show couple of examples.

Assume we have the comments table which holds users comments to the blog, naturally containing user_id which left comment, comment_id and comment text:

SQL:
  1. CREATE TABLE `comments` (
  2.   `user_id` int(10) UNSIGNED NOT NULL,
  3.   `comment_id` int(10) UNSIGNED NOT NULL,
  4.   `message` text NOT NULL,
  5.   PRIMARY KEY (`user_id`,`comment_id`)
  6. ) ENGINE=MyISAM DEFAULT CHARSET=latin1

So how would you get number of comments left by the given user ?

SQL:
  1. mysql> SELECT count(*) FROM comments WHERE user_id=5;
  2. +----------+
  3. | count(*) |
  4. +----------+
  5. |     1818 |
  6. +----------+
  7. 1 row IN SET (0.00 sec)

So how would we solve the same problem having things more modular and using MySQL VIEWs ?

SQL:
  1. mysql> CREATE VIEW user_counts AS SELECT user_id,count(*) cnt FROM comments GROUP BY user_id;
  2. Query OK, 0 rows affected (0.00 sec)
  3.  
  4. mysql> SELECT * FROM user_counts WHERE user_id=5;
  5. +---------+------+
  6. | user_id | cnt  |
  7. +---------+------+
  8. |       5 | 1818 |
  9. +---------+------+
  10. 1 row IN SET (0.95 sec)

So we create the view which gives us back counts for each user and can simply query from that table restricting by user_id.
If this would be handled properly inside MySQL there would be even good reason to do that - so later you can change your application and convert user_count to summary table avoid changing any queries directly. Unfortunately it does not work.

It is interesting to see EXPLAIN for such query and time for the query which fetches everything from the VIEW - it is almost the same as getting only one row, and note even EXPLAIN takes same amount of time:

SQL:
  1. mysql> EXPLAIN SELECT * FROM user_counts WHERE user_id=5 \G
  2. *************************** 1. row ***************************
  3.            id: 1
  4.   select_type: PRIMARY
  5.         TABLE: <derived2>
  6.          type: ALL
  7. possible_keys: NULL
  8.           KEY: NULL
  9.       key_len: NULL
  10.           ref: NULL
  11.          rows: 1001
  12.         Extra: USING WHERE
  13. *************************** 2. row ***************************
  14.            id: 2
  15.   select_type: DERIVED
  16.         TABLE: comments
  17.          type: INDEX
  18. possible_keys: NULL
  19.           KEY: PRIMARY
  20.       key_len: 8
  21.           ref: NULL
  22.          rows: 1792695
  23.         Extra: USING INDEX
  24. 2 rows IN SET (0.96 sec)
  25.  
  26.  
  27. mysql> SELECT * FROM user_counts;
  28. +---------+------+
  29. | user_id | cnt  |
  30. +---------+------+
  31. |       0850 |
  32. |       1 | 1790 |
  33. |       2 | 1777 |
  34. |       3 | 1762 |
  35. |       4 | 1784 |
  36. ....
  37.  
  38. |     999 | 1808 |
  39. |    1000898 |
  40. +---------+------+
  41. 1001 rows IN SET (0.96 sec)

So now lets create a very artificial query which will JOIN 2 views just to see how indexes are used:

SQL:
  1. mysql> EXPLAIN SELECT uc.cnt+uc2.cnt FROM user_counts uc, user_counts uc2 WHERE uc.user_id=uc2.user_id AND uc.user_id=5 \G
  2. *************************** 1. row ***************************
  3.            id: 1
  4.   select_type: PRIMARY
  5.         TABLE: <derived2>
  6.          type: ALL
  7. possible_keys: NULL
  8.           KEY: NULL
  9.       key_len: NULL
  10.           ref: NULL
  11.          rows: 1001
  12.         Extra: USING WHERE; USING JOIN cache
  13. *************************** 2. row ***************************
  14.            id: 1
  15.   select_type: PRIMARY
  16.         TABLE: <derived3>
  17.          type: ALL
  18. possible_keys: NULL
  19.           KEY: NULL
  20.       key_len: NULL
  21.           ref: NULL
  22.          rows: 1001
  23.         Extra: USING WHERE
  24. *************************** 3. row ***************************
  25.            id: 3
  26.   select_type: DERIVED
  27.         TABLE: comments
  28.          type: INDEX
  29. possible_keys: NULL
  30.           KEY: PRIMARY
  31.       key_len: 8
  32.           ref: NULL
  33.          rows: 1792695
  34.         Extra: USING INDEX
  35. *************************** 4. row ***************************
  36.            id: 2
  37.   select_type: DERIVED
  38.         TABLE: comments
  39.          type: INDEX
  40. possible_keys: NULL
  41.           KEY: PRIMARY
  42.       key_len: 8
  43.           ref: NULL
  44.          rows: 1792695
  45.         Extra: USING INDEX
  46. 4 rows IN SET (1.91 sec)

As you can see we get 2 derived tables in which case which are fully populated and "full join" used to to join between them.
In this particular case it is not that bad because "join cache" is used to perform it relatively efficient, however for large derived tables it will become nightmare.

So be very careful implementing MySQL VIEWs in your application, especially ones which require temporary table execution method. VIEWs can be used with very small performance overhead but only in case they are used with caution.

MySQL has long way to go getting queries with VIEWs properly optimized.

Related posts: :How to find wrong indexing with glance view::Speaking on OSCON 2007::Using VIEW to reduce number of tables used:
 

20 Comments »

  1. Hi Peter,

    Good article as always. Your second code example has some text comments in it that don’t line-wrap and are hard to read.

    Comment :: August 12, 2007 @ 4:56 pm

  2. I actually had previously utilized MySQL’s VIEWs in a project when they had just came out. I figured that they would actually optimize how other databases would however what I came to find was that it was one of the worst performance bottlenecks as my data set grew. What I came to find was that MySQL would join all of the data together first and then it would run through the search criteria.

    Comment :: August 13, 2007 @ 3:25 pm

  3. 3. Priya Raman

    We recently converted from Access to MySQL. In this transition, lot of queries were written as views in MySQL. The performance was so bad that I decided to rewrite all of them using the base tables directly. As you aptly said, MySQL does have a long way to go as far as views are concerned.

    Comment :: August 15, 2007 @ 1:32 pm

  4. 4. ChrisK

    I think the problem is not the VIEWs themselves, but how one attempted to use it.

    Putting an aggregate function into a VIEW seemed like a bad choice to begin with (especially one that forced a table scan for each use), and that bad initial choice is compounded if one plans on using it with any frequency. At that cost, you may as well have just written a trigger on the comments table to update a user_counts table (or re-generate it entirely using the query you used for your view) for each insert or delete. You could read the now summarized user_count table (that only regenerates when needed) with the benefits of an index to get a specific user’s posting count.

    You do point out a good thing, and that is that people need to really think through what they are trying to ultimately get to and figure out a way to get that data as efficiently as possible. Just using a mechanism (VIEWs) simply because they are there and available to use does not make them necessarily a good choice. Also, don’t put queries into a VIEW that you would not deem efficient to run normally.

    Comment :: September 6, 2007 @ 12:37 pm

  5. Chrisk,

    It is both. MySQL Views could be optimized better. You can use MySQL Views wisely only in cases when they are optimized well.

    Many people do not think about performance until it starts to hurt badly so they just write queries (and put them in the views) in a way it gets them info they want easiest way.

    Comment :: September 7, 2007 @ 3:01 pm

  6. 6. Jim

    “Many people do not think about performance until it starts to hurt badly so they just write queries (and put them in the views) in a way it gets them info they want easiest way.”

    That could be construed as a feature, not a bug. You know what they say about premature optimization.

    Comment :: November 15, 2007 @ 3:54 pm

  7. Well… Not optimizing beyond the need is one thing. Thinking you never would need to optimize is completely different.

    Comment :: November 15, 2007 @ 4:23 pm

  8. 8. Bill Ford

    If I have a view used to join data (via a UNION select) stored in separate tables, then use a where clause when selecting from the view, is that a case where MySQL would be inefficient? I don’t want it to create a temporary table of ALL the data in ALL the tables, I really was hoping for more of a macro-expand kind of thing.

    Comment :: November 20, 2007 @ 9:48 am

  9. i see VIEWS like a good alternative to caching some of the queries. Example: you have a site, with a `categories` menu (parent->childs) that are conditioned to appear in the menu by the number of `products` they reffer to. Every time a user loads a page you load that menu. If you have a large database, you don’t want to execute each time that menu (query), so you only SELECT the view. i think this is a primary use of VIEWS besides caching reports that don’t contain indexes but full values. please correct me if i’m wrong.

    Comment :: January 12, 2008 @ 3:12 am

  10. How would you use views to cache queries in MySQL ?

    MySQL does not have materialized views so whenever you access view it will be always reevaluated.

    Comment :: January 22, 2008 @ 4:47 am

  11. I’ve been working with MySQL Views since release 5 was first made available.
    It took me and my team about 2 months of testing to understand that it is just too soon to use Views.

    What we did as a solution was to create an extra layer between PHP and MySQL (written in PHP). Hardcoding the complex queries in functions, with our own caching logic. This made it for us, for now.

    Comment :: February 9, 2008 @ 1:53 pm

  12. 12. Geoff

    Have MySQL views improved at all – for any version?

    Comment :: March 18, 2008 @ 8:25 pm

  13. Not in 5.1 at least. 6.0 have some optimizations which would affect views but I’m not sure if there are general fixes.

    Comment :: March 18, 2008 @ 8:44 pm

  14. 14. John

    Ascanio and others: Try using Stored Procedure, for this example.

    DELIMITER $$
    DROP PROCEDURE IF EXISTS `test2`.`user_count` $$
    CREATE DEFINER=`root`@`localhost` PROCEDURE `user_count`(param1 int)
    BEGIN
    SELECT count(*) FROM comments WHERE user_id=param1;
    END $$
    DELIMITER ;

    Run the above code
    then CALL user_count(5); It will run SELECT count(*) FROM comments WHERE user_id=5; and just as fast as the straight select (from my testing less than 0.001 milisecond difference, sometimes the CALL was faster, but thats just due to random flux you might notice a difference if you were making thousands of calls a second..

    Comment :: April 10, 2008 @ 6:50 am

  15. Views meant to hide the complexities in the user’s side. This helps you to shorten your query. Optimization is to be done from the MySQL server itself. But have you ever heard of something like indexing a view? Something else? As a developer, performance too matters.

    Comment :: April 30, 2008 @ 4:23 am

  16. 16. Dave

    This post makes it clear that indexes will not be used for views that use temporary tables. For views that do not use temporary tables, will ‘joining’ or ‘whereing’ on a column that is indexed in the underlying table use that index?

    Comment :: July 12, 2008 @ 6:15 pm

  17. nice articel… thanks for your backmarking… :)

    Comment :: February 24, 2009 @ 8:32 am

  18. 18. Blakkky

    LOL, very useless article!

    Author try to compare two uncompirable things!
    Look here: Simple select is slower then view-select (ROFL):
    ==================================================
    Simple select:
    mysql> select count(*) from (select * from comments where user_id in (select user_id from comments where user_id = 5 group by user_id)) t2;
    +———-+
    | count(*) |
    +———-+
    | 100 |
    +———-+
    1 row in set (3.71 sec)

    And from VIEW (CREATE VIEW user_counts AS SELECT * FROM comments):
    mysql> select count(*) from user_counts where user_id = 5;
    +———-+
    | count(*) |
    +———-+
    | 100 |
    +———-+
    1 row in set (0.00 sec)
    ==================================================

    Author’s problem NOT IN VIEW at all, but in using a view for non-view-based operations!

    VIEW – mechanism to make a pseudo-tables for database users (for example, to make four different client lists based on three tables (account, client, client-address) tables: client’s names list, client’s address list, client’s detail info list. On this lists client-side-software maps grids, combo-boxes and other.

    In author’s example, VIEW works slowly ONLY AND ONLY BECOUSE autor use a GROUP BY statement in VIEW!

    PS: sorry for my english, i’m not from english-speaking country :)

    Comment :: August 24, 2009 @ 9:19 am

  19. Blakkky, you should learn the execution plan differences between joins and correlated subqueries.

    Comment :: August 24, 2009 @ 10:55 am

  20. Hmm, it appears that using views which could be a quicker way to make db calls especially in joins and subqueries… never realized how bad the performance is by using them. I wonder if this performance issue also occurs when creating triggers and stored procedures.

    Comment :: September 9, 2009 @ 5:31 am

 

Subscribe without commenting

Trackbacks/Pingbacks