August 12, 2007

MySQL VIEW as performance troublemaker

Posted by peter |

I start to see applications being built utilizing VIEWs functionality which appeared in MySQL 5.0 and quite frequently VIEWs are used to help in writing the queries - to keep queries simple without really thinking how it affects server performance.

Even worse than that - looking at the short table which just gets single row from the table by the key we think this is simple query, while can be real monster instead with complexity hidden away in VIEW definition.

Just another day I worked on optimizing application which uses VIEWs and was looking at the long running query which just joined 2 tables... I ran EXPLAIN for it and got 200 of rows in the result set just for explain due to several layers of cascaded views built on top of one another so it is easy to write the queries, some of them it turn used subqueries subselects and derived tables.

It is also very dangerous if you assume MySQL would optimize your VIEWs same way as more advanced database systems would. Same as with subqueries and derived tables MySQL 5.0 will fail and perform very inefficiently in many counts.

MySQL has two ways of handling the VIEWS - query merge, in which case VIEW is simply expanded as a macro or Temporary Table in which case VIEW is materialized to temporary tables (without indexes !) which is later used further in query execution.
There does not seems to be any optimizations applied to the query used for temporary table creation from the outer query and plus if you use more then one Temporary Tables views which you join together you may have serious issues because such tables do not get any indexes.

Let me now show couple of examples.

Assume we have the comments table which holds users comments to the blog, naturally containing user_id which left comment, comment_id and comment text:

SQL:
  1. CREATE TABLE `comments` (
  2.   `user_id` int(10) UNSIGNED NOT NULL,
  3.   `comment_id` int(10) UNSIGNED NOT NULL,
  4.   `message` text NOT NULL,
  5.   PRIMARY KEY (`user_id`,`comment_id`)
  6. ) ENGINE=MyISAM DEFAULT CHARSET=latin1

So how would you get number of comments left by the given user ?

SQL:
  1. mysql> SELECT count(*) FROM comments WHERE user_id=5;
  2. +----------+
  3. | count(*) |
  4. +----------+
  5. |     1818 |
  6. +----------+
  7. 1 row IN SET (0.00 sec)

So how would we solve the same problem having things more modular and using MySQL VIEWs ?

SQL:
  1. mysql> CREATE VIEW user_counts AS SELECT user_id,count(*) cnt FROM comments GROUP BY user_id;
  2. Query OK, 0 rows affected (0.00 sec)
  3.  
  4. mysql> SELECT * FROM user_counts WHERE user_id=5;
  5. +---------+------+
  6. | user_id | cnt  |
  7. +---------+------+
  8. |       5 | 1818 |
  9. +---------+------+
  10. 1 row IN SET (0.95 sec)

So we create the view which gives us back counts for each user and can simply query from that table restricting by user_id.
If this would be handled properly inside MySQL there would be even good reason to do that - so later you can change your application and convert user_count to summary table avoid changing any queries directly. Unfortunately it does not work.

It is interesting to see EXPLAIN for such query and time for the query which fetches everything from the VIEW - it is almost the same as getting only one row, and note even EXPLAIN takes same amount of time:

SQL:
  1. mysql> EXPLAIN SELECT * FROM user_counts WHERE user_id=5 \G
  2. *************************** 1. row ***************************
  3.            id: 1
  4.   select_type: PRIMARY
  5.         TABLE: <derived2>
  6.          type: ALL
  7. possible_keys: NULL
  8.           KEY: NULL
  9.       key_len: NULL
  10.           ref: NULL
  11.          rows: 1001
  12.         Extra: USING WHERE
  13. *************************** 2. row ***************************
  14.            id: 2
  15.   select_type: DERIVED
  16.         TABLE: comments
  17.          type: INDEX
  18. possible_keys: NULL
  19.           KEY: PRIMARY
  20.       key_len: 8
  21.           ref: NULL
  22.          rows: 1792695
  23.         Extra: USING INDEX
  24. 2 rows IN SET (0.96 sec)
  25.  
  26.  
  27. mysql> SELECT * FROM user_counts;
  28. +---------+------+
  29. | user_id | cnt  |
  30. +---------+------+
  31. |       0850 |
  32. |       1 | 1790 |
  33. |       2 | 1777 |
  34. |       3 | 1762 |
  35. |       4 | 1784 |
  36. ....
  37.  
  38. |     999 | 1808 |
  39. |    1000898 |
  40. +---------+------+
  41. 1001 rows IN SET (0.96 sec)

So now lets create a very artificial query which will JOIN 2 views just to see how indexes are used:

SQL:
  1. mysql> EXPLAIN SELECT uc.cnt+uc2.cnt FROM user_counts uc, user_counts uc2 WHERE uc.user_id=uc2.user_id AND uc.user_id=5 \G
  2. *************************** 1. row ***************************
  3.            id: 1
  4.   select_type: PRIMARY
  5.         TABLE: <derived2>
  6.          type: ALL
  7. possible_keys: NULL
  8.           KEY: NULL
  9.       key_len: NULL
  10.           ref: NULL
  11.          rows: 1001
  12.         Extra: USING WHERE; USING JOIN cache
  13. *************************** 2. row ***************************
  14.            id: 1
  15.   select_type: PRIMARY
  16.         TABLE: <derived3>
  17.          type: ALL
  18. possible_keys: NULL
  19.           KEY: NULL
  20.       key_len: NULL
  21.           ref: NULL
  22.          rows: 1001
  23.         Extra: USING WHERE
  24. *************************** 3. row ***************************
  25.            id: 3
  26.   select_type: DERIVED
  27.         TABLE: comments
  28.          type: INDEX
  29. possible_keys: NULL
  30.           KEY: PRIMARY
  31.       key_len: 8
  32.           ref: NULL
  33.          rows: 1792695
  34.         Extra: USING INDEX
  35. *************************** 4. row ***************************
  36.            id: 2
  37.   select_type: DERIVED
  38.         TABLE: comments
  39.          type: INDEX
  40. possible_keys: NULL
  41.           KEY: PRIMARY
  42.       key_len: 8
  43.           ref: NULL
  44.          rows: 1792695
  45.         Extra: USING INDEX
  46. 4 rows IN SET (1.91 sec)

As you can see we get 2 derived tables in which case which are fully populated and "full join" used to to join between them.
In this particular case it is not that bad because "join cache" is used to perform it relatively efficient, however for large derived tables it will become nightmare.

So be very careful implementing MySQL VIEWs in your application, especially ones which require temporary table execution method. VIEWs can be used with very small performance overhead but only in case they are used with caution.

MySQL has long way to go getting queries with VIEWs properly optimized.

Related posts: :How to find wrong indexing with glance view::A workaround for the performance problems of TEMPTABLE views::Speaking on OSCON 2007:
 

31 Comments »

  1. Hi Peter,

    Good article as always. Your second code example has some text comments in it that don’t line-wrap and are hard to read.

    Comment :: August 12, 2007 @ 4:56 pm

  2. I actually had previously utilized MySQL’s VIEWs in a project when they had just came out. I figured that they would actually optimize how other databases would however what I came to find was that it was one of the worst performance bottlenecks as my data set grew. What I came to find was that MySQL would join all of the data together first and then it would run through the search criteria.

    Comment :: August 13, 2007 @ 3:25 pm

  3. 3. Priya Raman

    We recently converted from Access to MySQL. In this transition, lot of queries were written as views in MySQL. The performance was so bad that I decided to rewrite all of them using the base tables directly. As you aptly said, MySQL does have a long way to go as far as views are concerned.

    Comment :: August 15, 2007 @ 1:32 pm

  4. 4. ChrisK

    I think the problem is not the VIEWs themselves, but how one attempted to use it.

    Putting an aggregate function into a VIEW seemed like a bad choice to begin with (especially one that forced a table scan for each use), and that bad initial choice is compounded if one plans on using it with any frequency. At that cost, you may as well have just written a trigger on the comments table to update a user_counts table (or re-generate it entirely using the query you used for your view) for each insert or delete. You could read the now summarized user_count table (that only regenerates when needed) with the benefits of an index to get a specific user’s posting count.

    You do point out a good thing, and that is that people need to really think through what they are trying to ultimately get to and figure out a way to get that data as efficiently as possible. Just using a mechanism (VIEWs) simply because they are there and available to use does not make them necessarily a good choice. Also, don’t put queries into a VIEW that you would not deem efficient to run normally.

    Comment :: September 6, 2007 @ 12:37 pm

  5. Chrisk,

    It is both. MySQL Views could be optimized better. You can use MySQL Views wisely only in cases when they are optimized well.

    Many people do not think about performance until it starts to hurt badly so they just write queries (and put them in the views) in a way it gets them info they want easiest way.

    Comment :: September 7, 2007 @ 3:01 pm

  6. 6. Jim

    “Many people do not think about performance until it starts to hurt badly so they just write queries (and put them in the views) in a way it gets them info they want easiest way.”

    That could be construed as a feature, not a bug. You know what they say about premature optimization.

    Comment :: November 15, 2007 @ 3:54 pm

  7. Well… Not optimizing beyond the need is one thing. Thinking you never would need to optimize is completely different.

    Comment :: November 15, 2007 @ 4:23 pm

  8. 8. Bill Ford

    If I have a view used to join data (via a UNION select) stored in separate tables, then use a where clause when selecting from the view, is that a case where MySQL would be inefficient? I don’t want it to create a temporary table of ALL the data in ALL the tables, I really was hoping for more of a macro-expand kind of thing.

    Comment :: November 20, 2007 @ 9:48 am

  9. i see VIEWS like a good alternative to caching some of the queries. Example: you have a site, with a `categories` menu (parent->childs) that are conditioned to appear in the menu by the number of `products` they reffer to. Every time a user loads a page you load that menu. If you have a large database, you don’t want to execute each time that menu (query), so you only SELECT the view. i think this is a primary use of VIEWS besides caching reports that don’t contain indexes but full values. please correct me if i’m wrong.

    Comment :: January 12, 2008 @ 3:12 am

  10. How would you use views to cache queries in MySQL ?

    MySQL does not have materialized views so whenever you access view it will be always reevaluated.

    Comment :: January 22, 2008 @ 4:47 am

  11. I’ve been working with MySQL Views since release 5 was first made available.
    It took me and my team about 2 months of testing to understand that it is just too soon to use Views.

    What we did as a solution was to create an extra layer between PHP and MySQL (written in PHP). Hardcoding the complex queries in functions, with our own caching logic. This made it for us, for now.

    Comment :: February 9, 2008 @ 1:53 pm

  12. 12. Geoff

    Have MySQL views improved at all – for any version?

    Comment :: March 18, 2008 @ 8:25 pm

  13. Not in 5.1 at least. 6.0 have some optimizations which would affect views but I’m not sure if there are general fixes.

    Comment :: March 18, 2008 @ 8:44 pm

  14. 14. John

    Ascanio and others: Try using Stored Procedure, for this example.

    DELIMITER $$
    DROP PROCEDURE IF EXISTS `test2`.`user_count` $$
    CREATE DEFINER=`root`@`localhost` PROCEDURE `user_count`(param1 int)
    BEGIN
    SELECT count(*) FROM comments WHERE user_id=param1;
    END $$
    DELIMITER ;

    Run the above code
    then CALL user_count(5); It will run SELECT count(*) FROM comments WHERE user_id=5; and just as fast as the straight select (from my testing less than 0.001 milisecond difference, sometimes the CALL was faster, but thats just due to random flux you might notice a difference if you were making thousands of calls a second..

    Comment :: April 10, 2008 @ 6:50 am

  15. Views meant to hide the complexities in the user’s side. This helps you to shorten your query. Optimization is to be done from the MySQL server itself. But have you ever heard of something like indexing a view? Something else? As a developer, performance too matters.

    Comment :: April 30, 2008 @ 4:23 am

  16. 16. Dave

    This post makes it clear that indexes will not be used for views that use temporary tables. For views that do not use temporary tables, will ‘joining’ or ‘whereing’ on a column that is indexed in the underlying table use that index?

    Comment :: July 12, 2008 @ 6:15 pm

  17. nice articel… thanks for your backmarking… :)

    Comment :: February 24, 2009 @ 8:32 am

  18. 18. Blakkky

    LOL, very useless article!

    Author try to compare two uncompirable things!
    Look here: Simple select is slower then view-select (ROFL):
    ==================================================
    Simple select:
    mysql> select count(*) from (select * from comments where user_id in (select user_id from comments where user_id = 5 group by user_id)) t2;
    +———-+
    | count(*) |
    +———-+
    | 100 |
    +———-+
    1 row in set (3.71 sec)

    And from VIEW (CREATE VIEW user_counts AS SELECT * FROM comments):
    mysql> select count(*) from user_counts where user_id = 5;
    +———-+
    | count(*) |
    +———-+
    | 100 |
    +———-+
    1 row in set (0.00 sec)
    ==================================================

    Author’s problem NOT IN VIEW at all, but in using a view for non-view-based operations!

    VIEW – mechanism to make a pseudo-tables for database users (for example, to make four different client lists based on three tables (account, client, client-address) tables: client’s names list, client’s address list, client’s detail info list. On this lists client-side-software maps grids, combo-boxes and other.

    In author’s example, VIEW works slowly ONLY AND ONLY BECOUSE autor use a GROUP BY statement in VIEW!

    PS: sorry for my english, i’m not from english-speaking country :)

    Comment :: August 24, 2009 @ 9:19 am

  19. Blakkky, you should learn the execution plan differences between joins and correlated subqueries.

    Comment :: August 24, 2009 @ 10:55 am

  20. Hmm, it appears that using views which could be a quicker way to make db calls especially in joins and subqueries… never realized how bad the performance is by using them. I wonder if this performance issue also occurs when creating triggers and stored procedures.

    Comment :: September 9, 2009 @ 5:31 am

  21. 21. Blakkky

    Barton Schwartz, plz read my post MINDFULLY! My example is a joke, that shows two uncompairable things. Author’s post is same as mine, he compare a GROUP BY SELECT and SIMPLE SELECT and say, that VIEW, based on GROUP BY SELECT is slowly than SIMPLE SELECT. Is it a right comparasion?

    Again, author’s example is same as
    SELECT cnt FROM (SELECT user_id,count(*) cnt FROM comments GROUP BY user_id) AS user_counts WHERE user_id = 5;
    and he compare it with
    SELECT count(*) FROM comments WHERE user_id=5;

    In this case, first select (view) is SLOWLY, then simple select (it’s obvious, becouse “group by” with index filtering is slowly, that simple index filtering). Problem not in VIEW MECHANISM in mySQL at all, but in author’s approach to using view!

    Correct test is something like this:
    1st experiment:
    mysql> CREATE VIEW user_counts AS SELECT user_id, comment_id, message FROM comments;
    Query OK, 0 rows affected (0.02 sec)

    mysql> SELECT user_id, count(*) FROM user_counts WHERE user_id = 5;
    +———+———-+
    | user_id | count(*) |
    +———+———-+
    | 5 | 100 |
    +———+———-+
    1 row in set (0.00 sec)

    mysql> explain SELECT user_id, count(*) FROM user_counts WHERE user_id = 5;
    +—-+————-+———-+——+—————+———+———+——-+——+————-+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    +—-+————-+———-+——+—————+———+———+——-+——+————-+
    | 1 | SIMPLE | comments | ref | PRIMARY | PRIMARY | 4 | const | 100 | Using index |
    +—-+————-+———-+——+—————+———+———+——-+——+————-+
    1 row in set (0.00 sec)

    2nd experiment:
    mysql> SELECT user_id, count(*) FROM comments WHERE user_id = 5;
    +———+———-+
    | user_id | count(*) |
    +———+———-+
    | 5 | 100 |
    +———+———-+
    1 row in set (0.00 sec)

    mysql> explain SELECT user_id, count(*) FROM comments WHERE user_id = 5;
    +—-+————-+———-+——+—————+———+———+——-+——+————-+
    | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
    +—-+————-+———-+——+—————+———+———+——-+——+————-+
    | 1 | SIMPLE | comments | ref | PRIMARY | PRIMARY | 4 | const | 100 | Using index |
    +—-+————-+———-+——+—————+———+———+——-+——+————-+
    1 row in set (0.00 sec)

    In this case, this queries has EQUAL PERFORMANCE at EQUAL REQUESTS!

    PS: I know, that in my test there is no need to use VIEW, but it’s a TEST, not production solution!
    view is really need to make “interface pseudo-tables”, that uses from database-client software to make a database structure more flexible and make database-client applications not dependented from database real structure.

    Comment :: December 21, 2009 @ 7:11 am

  22. 22. Tiberius

    So if we have two or more large views that we would like to join are we pretty much stuck? Or is there such a thing as an indexed view?

    Comment :: January 4, 2010 @ 9:21 am

  23. Tiberius,

    It is possible. Devil is as usually in details – MySQL have different ways of executions queries with views – if it just “merges” the query it may perform well if you get 2 large temporary tables which need to be joined on the final stage it will be rather slow.

    EXPLAIN should tell you that.

    Comment :: January 4, 2010 @ 9:39 am

  24. 24. Tiberius

    @Peter

    Thank you, I just did an “explain” on the query and it revealed some very interesting details!

    Could you perhaps rephrase this? I’m confused …

    “if it just “merges” the query it may perform well if you get 2 large temporary tables which need to be joined on the final stage it will be rather slow.”

    Comment :: January 4, 2010 @ 2:56 pm

  25. 25. Geoff

    I’m curious what steps / info we can refer Tiberius to for optimizing his query. I have a view in a query of my own and its speed is turtle-fast (sarcasm, hehe). The EXPLAIN convinces me that temporary tables are being built and then joined.

    However, since even when I copy my view data into a real table before doing the joins, adding the appropriate indexes, and still get horrible performance – it probably is more MySQL’s join that is failing me, rather than the view.

    Comment :: January 4, 2010 @ 3:01 pm

  26. Tiberius,

    MySQL has 2 ways to deal with views – MERGE and TMPTABLE, see http://dev.mysql.com/doc/refman/5.0/en/create-view.html

    Looking at EXPLAIN you should see either all views collapsed and query looking as a simple join or you should see temporary table and probably 2 rows with “ALL” access type corresponding to doing full join on temporary table.

    Note if MySQL does “Merge” for view it does not matter it is able to execute join efficiently in all cases, but this is other topic.

    Comment :: January 4, 2010 @ 3:23 pm

  27. 27. Tiberius

    First of all, thank you both for working with me on this, and thanks for the heads up on the MERGE and TEMPTABLE algorithms. I’ll try creating the views with both algorithms and let you know how it goes.

    Basically, I’m in the process of importing the content of a pre-existing mysql db into Drupal, but the table and field names change when you do that. I’m looking to essentially create “aliases” from the new (Drupal) table and field names to the old ones, so that our HUGE perl library can continue to function without us rewriting every single query. So I thought views would be a good solution. I have yet to see if 5 or 6 views can be joined without a huge performance hit …

    Comment :: January 4, 2010 @ 3:37 pm

  28. 28. John Larsen

    Tiberius: You’re going to lose the ability to reliably write to the db from whatever application is using the views. MySQL views can be used for updates but its application is limited in many ways, you’d need to test all update scenarios as well.

    Comment :: January 4, 2010 @ 4:39 pm

  29. 29. Tiberius

    SOLVED! It turns out that I just needed to add indices. The column names in the ‘where’ clause of the slow query were not indices, so the query was running really slowly. Once I added the appropriate indices to the tables, THEN created the views everything worked! So it turns out that views were not the culprit at all, it was querying on “un-indexed” columns that was the problem.

    Comment :: January 6, 2010 @ 8:03 am

  30. mysql view are good, but certain care is required –

    1. tables on which you want to create views, should have proper primary key and indexes (make sure you do it before dumping data in the tables)

    2. skip any where clause, or group by clause

    3. drop off any columns which you dont require, or create multiple views, based on the column required

    4. increase sql cache / memory (hosting providers … ahem ahem!)

    5. do not mix views with outer joins

    Comment :: May 6, 2010 @ 10:26 am

  31. 31. JackW

    Depending on how often you are updating the data, it can be a lot more efficient to use a stored procedure to generate a table – effectively a materialized view – rather than using an actual MySQL.

    In my application I have several hierarchies – customer, product, division. New products are added several times a day; new customers/divisions, much more rarely. If materialized views were supported, I would use that feature to de-normalize the static data so the frontend doesn’t need to generate a complex query to get the results it needs.

    Using a MySQL view was an option I considered, but quickly rejected on the basis that the server would generally re-evaluate the underlying query each time the view is queried.

    Because the write/query ratio on the static data is very low in my application, I have instead used stored procedures to drop and recreate the tables with the de-normalized data. This process takes a minute or two to perform (the normal tables are quite large), but given the rate of writes this is much more efficient than using even an optimized view.

    Comment :: August 25, 2010 @ 10:06 pm

 

Subscribe without commenting

Trackbacks/Pingbacks