August 31, 2006

Derived Tables and Views Performance

Posted by peter

Starting MySQL 4.1, MySQL had support for what is called derived tables, inline views or basically subselects in the from clause.
In MySQL 5.0 support for views was added.

These features are quite related to each other but how do they compare in terms of performance ?

[read more...]

August 29, 2006

MySQL wins C’T Database Contest

Posted by Vadim

Today MySQL published the press release with results of Database Contest (results on German available here http://www.mysql.de/ct-dbcontest).

Peter and me spent quite some time working on this project while being employed by MySQL and it is great to see results finally publicly available.

The story began about year ago when C’T magazine had called for Database competiton using Dell DVD Store benchmark (details available here: http://firebird.sourceforge.net/connect/ct-dbContest.html).

Most interesting results are (more orders per minute are better :) ):
MySQL5/PHP (our solution) : 3664 orders per minute
DB2/Java : 1537 opm
Oracle / Java: 1412 opm
PostgreSQL / PHP: 120 opm

MySQL5/PHP on two boxes: 6000 opm.
I wonder how C’T have got such low results because on the same hardware I got ~7000 opm on one box and
~12000 opm on two boxes, but that does not matter.

August 23, 2006

Watch out for Marketing benchmarks

Posted by peter

Whenever I see benchmark results I try to understand if it is technical benchmark - made by people seeking the truth or it is done by Marketing department to wash your brains. Watch out. Whenever you treat marketing benchmarks as technical ones, you make make wrong decision. Take a look at MySQL 5.0 Benchmarks Whitepaper and guess which type is this ?

You can also compare it to my MySQL Performance 5.0 vs 4.1 presentation to have some fun.

What can we see ? Out of all MySQL 4.1 vs 5.0 benchmarks which were done only benchmarks which show MySQL 5.0 is faster were selected and bunch of other benchmarks which show 5.0 is actually slower than 4.1 were hidden under the table.

In general any benchmarks I see which show something being absolute winner I smell something fishy. In software development there is no free lunch and your decisions to speed something up often will mean something else becomes slower. There are of course exceptions with optimizing very bad code, but it barely applies to MySQL - Monty is not bad developer at all :)

[read more...]

August 21, 2006

Using LVM for MySQL Backup and Replication Setup

Posted by peter

If someone asks me about MySQL Backup advice my first question would be if they have LVM installed or have some systems with similar features set for other operation systems. Veritas File System can do it for Solaris. Most SAN systems would work as well.

What is really needed is ability to create atomic snapshot of the volume, which can be later mounted same as original file system

[read more...]

August 18, 2006

MySQL Performance Forum: Hot Topics

Posted by peter

As I already announced last week I started MySQL Performance Forums project focusing on MySQL Performance discussions as it names says.

I spend planty of time replying questions and thought it would be good idea to provide weekly overviews of most interesting topic discussed. Here is the list for last week:


Ways to perform full text search on Chinese texts with MySQL

Replication of Summary tables only

Overhead of enabled general query log

Complex join vs running many queries

Emulation of two Column AUTO_INCREMENT key with Innodb

Hope these were helpful, and you’re welcome to post more questions or provide your opinion :)

SysBench - benchmark tool

Posted by Vadim

Sysbench is benchmark developed by Alexey Kopytov (software engineer @ MySQL AB)
- http://sysbench.sourceforge.net/ and I want to write a short intro about this tool as sysbench is one of software for my everyday use. For example, SUN published their Solaris vs RedHat stuff based on sysbench’s results (Peter and me provided performance consutling for this publishing).
Sysbench has a lot of options and details so my goal is describe common usage of benchmark.
Sysbench allows to test:

  • file I/O performance
  • scheduler performance
  • memory allocation and transfer speed
  • POSIX threads implementation performance
  • database server performance

First four is useful for the platform evalution, for example if you want to compare speed of file I/O and implementation of threads on different servers - I will write about in further notes. Regarding database benchmarks - in 0.4.7 and earlier versions Sysbench supported only predefined set of queries, but that will change in version 0.5 where sysbench is scriptable (Lua http://www.lua.org/ as scripting language) , and you can use your own tables and queries. 0.5 should be realesed soon. Originally Sysbench supported only MySQL, but later Oracle and PostgreSQL (and deviations, e.g EnterpriseDB) were added - it is not clear from Docs page, but it is :). Sysbench uses only a native API, not middle layers like ODBC, that is why list of supported DB is not wide. Other restriction Sysbench can’t be complided on Windows - well you can try and write about your experience. In next notes I’ll write how to setup and perform database benchmarks.

August 17, 2006

Duplicate indexes and redundant indexes

Posted by peter

About every second application I look at has some tables which have redundant or duplicate indexes so its the time to speak about these a bit.

So what is duplicate index ? This is when table has multiple indexes defined on the same columns. Sometimes it is indexes with different names, sometimes it is different keywords used to define the index. For example it is quite frequite to see something like
PRIMARY KEY(id), UNIQUE KEY id(id), KEY id2(id)

[read more...]

August 14, 2006

MySQL: Followup on UNION for query optimization, Query profiling

Posted by peter

Few days ago I wrote an article about using UNION to implement loose index scan.

First I should mention double IN also works same way so you do not have to use the union. So changing query to:

SQL:
  1. mysql> SELECT sql_no_cache name FROM people WHERE age IN(18,19,20) AND zip IN (12345,12346, 12347);
  2. +----------------------------------+
  3. | name                             |
  4. +----------------------------------+
  5. | ed4481336eb9adca222fd404fa15658e |
  6. | 888ba838661aff00bbbce114a2a22423 |
  7. +----------------------------------+
  8. 2 rows IN SET (0.00 sec)
  9.  
  10. mysql> EXPLAIN SELECT sql_no_cache name FROM people WHERE age IN(18,19,20) AND zip IN (12345,12346, 12347);
  11. +----+-------------+--------+-------+---------------+------+---------+------+------+-------------+
  12. | id | select_type | TABLE  | type  | possible_keys | KEY  | key_len | ref  | rows | Extra       |
  13. +----+-------------+--------+-------+---------------+------+---------+------+------+-------------+
  14. 1 | SIMPLE      | people | range | age           | age  |       4 | NULL |    9 | USING WHERE |
  15. +----+-------------+--------+-------+---------------+------+---------+------+------+-------------+
  16. 1 row IN SET (0.00 sec)

So as you see there are really different types of ranges in MySQL. IN range allows to optimize lookups on the second key part, while BETWEEN and other ranges do not. Using same access type in EXPLAIN makes it very confusing.

I also was wrong about bug in key length in 5.0 explain. Actually I used tinyint for age and mediumint for zip which makes 4 right answer for using full key.

Be careful however with these nested IN clauses. MySQL has to internally build all possible combinations for row retrieval which ma become very slow if IN lists are large. Take 3 IN lists 1000 values each, on appropriate 3 keyparts and you may finish your lunch before query completes even if table has just couple of rows.

Let me however show how you can profile queries to see what exactly happens during query execution - very helpful for MySQL Performance optimization:

SQL:
  1. mysql> FLUSH STATUS;
  2. Query OK, 0 rows affected (0.00 sec)
  3.  
  4. mysql> SELECT sql_no_cache name FROM people WHERE age BETWEEN 18 AND 20 AND zip IN (12345,12346, 12347);
  5. +----------------------------------+
  6. | name                             |
  7. +----------------------------------+
  8. | ed4481336eb9adca222fd404fa15658e |
  9. | 888ba838661aff00bbbce114a2a22423 |
  10. +----------------------------------+
  11. 2 rows IN SET (0.39 sec)
  12.  
  13. mysql> SHOW STATUS LIKE "Handler%";
  14. +----------------------------+-------+
  15. | Variable_name              | Value |
  16. +----------------------------+-------+
  17. | Handler_commit             | 0     |
  18. | Handler_delete             | 0     |
  19. | Handler_discover           | 0     |
  20. | Handler_prepare            | 0     |
  21. | Handler_read_first         | 0     |
  22. | Handler_read_key           | 1     |
  23. | Handler_read_next          | 42250 |
  24. | Handler_read_prev          | 0     |
  25. | Handler_read_rnd           | 0     |
  26. | Handler_read_rnd_next      | 0     |
  27. | Handler_rollback           | 0     |
  28. | Handler_savepoint          | 0     |
  29. | Handler_savepoint_rollback | 0     |
  30. | Handler_update             | 0     |
  31. | Handler_write              | 14    |
  32. +----------------------------+-------+
  33. 15 rows IN SET (0.00 sec)

So you can do FLUSH STATUS to reset counters run the query (assiming your system does not do anything) and run SHOW STATUS to see how counters have changed. It was quite inconvenient you could only do it on idle box so as in MySQL 5.0 you do not have to any more. SHOW STATUS now will show per session counter increments and to get global counters SHOW GLOBAL STATUS needs to be used.

Let us look at this handler statistic - we can see Handler_read_key=1 - this means one index range scan was initiated. Handler_read_next=42250 means 42250 rows were analyzed during this scan. Basically MySQL started scanning Index with age>=18 and continue scanning as soon as it met something larger than 20.

Now let's see what UNION can handle what IN can't:

Lets say we want to show people in appropriate age group sorting by time when they were last online. If age is fixed this works great and it is efficient, however if we have multiple ages to deal with ether as BETWEEN range or as IN filesort appears and query becomes very slow:

SQL:
  1. mysql> EXPLAIN SELECT * FROM people WHERE age=18 ORDER BY last_online DESC LIMIT 10;
  2. +----+-------------+--------+------+---------------+------+---------+-------+-------+-------------+
  3. | id | select_type | TABLE  | type | possible_keys | KEY  | key_len | ref   | rows  | Extra       |
  4. +----+-------------+--------+------+---------------+------+---------+-------+-------+-------------+
  5. 1 | SIMPLE      | people | ref  | age           | age  | 1       | const | 12543 | USING WHERE |
  6. +----+-------------+--------+------+---------------+------+---------+-------+-------+-------------+
  7. 1 row IN SET (0.00 sec)
  8.  
  9. mysql> EXPLAIN SELECT * FROM people WHERE age IN(18,19,20) ORDER BY last_online DESC LIMIT 10;
  10. +----+-------------+--------+-------+---------------+------+---------+------+-------+-----------------------------+
  11. | id | select_type | TABLE  | type  | possible_keys | KEY  | key_len | ref  | rows  | Extra                       |
  12. +----+-------------+--------+-------+---------------+------+---------+------+-------+-----------------------------+
  13. 1 | SIMPLE      | people | range | age           | age  | 1       | NULL | 37915 | USING WHERE; USING filesort |
  14. +----+-------------+--------+-------+---------------+------+---------+------+-------+-----------------------------+
  15. 1 row IN SET (0.00 sec)

We can however use UNION to avoid filesort of full table:

SQL:
  1. mysql> EXPLAIN (SELECT * FROM people WHERE age=18 ORDER BY last_online DESC LIMIT 10) UNION ALL (SELECT * FROM people WHERE age=19 ORDER BY last_online DESC LIMIT 10) UNION ALL (SELECT * FROM people WHERE age=20 ORDER BY last_online DESC LIMIT 10) ORDER BY last_online DESC LIMIT 10;
  2. +----+--------------+--------------+------+---------------+------+---------+-------+-------+----------------+
  3. | id | select_type  | TABLE        | type | possible_keys | KEY  | key_len | ref   | rows  | Extra          |
  4. +----+--------------+--------------+------+---------------+------+---------+-------+-------+----------------+
  5. 1 | PRIMARY      | people       | ref  | age           | age  | 1       | const | 12543 | USING WHERE    |
  6. 2 | UNION        | people       | ref  | age           | age  | 1       | const | 12741 | USING WHERE    |
  7. 3 | UNION        | people       | ref  | age           | age  | 1       | const | 12631 | USING WHERE    |
  8. |NULL | UNION RESULT | <union1,2,3> | ALL  | NULL          | NULL | NULL    | NULL  |  NULL | USING filesort |
  9. +----+--------------+--------------+------+---------------+------+---------+-------+-------+----------------+
  10. 4 rows IN SET (0.01 sec)

In this case there is also filesort but it applied only to very small table which is result of union, so it is rather fast.

August 11, 2006

Database problems in MySQL/PHP Applications

Posted by peter

Article about database design problems is being discussed by Kristian.

Both article itself and responce cause mixed feellings so I decided it is worth commenting:

1. Using mysql_* functions directly This is probably bad but I do not like solutions proposed by original article ether. PEAR is slow as well as other complex conectors. I have not yet tested PDO but would not expect it to beat MySQLi in speed. It is however bad idea to use mysql_ functions directly as well - I would go for using mysqli object approach. The great things about objects is you can easily overload methods and get debugging and profiling tools, as well as have tools which protect you from SQL Injections.

For example I have little wrapper which allows to do $dbcon->query("Select email from user where name=%s",$name) - wrapper will detect query is being called with multiple parameters and will perform needed checks and query rewriting. You also can use pretty much direct path to mysqli extension to performance critical queries if you need.

I would also note for many PHP applications abstraction layer is not the main performance problem, also benefit from persistent connections can be much more modest. DVD Store was special type of application which was designed to have very simple logic besides database - in most cases you would have beautiful page rendering as well as much more queries per page which will make performance improvement much smaller. Notable exception being AJAX applications which may do very little work and formating, so database connection may become the issue. Caching should be good help in this case though.

About Consulting - it is worth to mention it was my group which was Dell DVD Store optimization, and I'm now on my own, offering MySQL and LAMP Consuilting Services.

2. Not using auto_increment functionality This is right. With some exception however. For example Innodb tables do internal full table lock if auto_increment is used so using values generated elseware might be faster.

3. Using multiple databases Honestly I do not see application using one database per table that often. I however often see applications using multiple databases to group tables by certain logic, such as you do with directories to group files. I think this makes a lot of sense. Sometimes grouping is done so a lot of databases are needed - for example if grouping is done by user. This might be a bit extreem if you have thousands of users - I would rather do many to many relationship between users and tables but it also might work.

Regarding if you use many tables you're doing something wrong it is frequently told by people with traditional database background. Things are different with MySQL.

There are many successful applications, using tens of thousands of tables per host and archiving great performance by doing so.
Using multiple tables gives some very important benefits - your data becomes managable, your ALTER TABLE or OPTIMIZE TABLE now locks small table for few seconds rather than giant 100GB table for few hours so can be done pretty much online. You also get good data clustering so table becomes hot very quickly due to data locality once this user starts his queries. It is also much easier to do backup and restore if you need only portion of your data recovered.

There are some performance problems with many tables some are OS and File System dependent, others correspond to Innodb storage engine or using innodb_file_per_table option in particular.

4. Not using relations This one is right one but also with the catch. It is very traditional recommendation to normalize your data however it does not always bring good performance. Joins are expensive and you can often do much better with denormalized data. You may wish to use denormalized data as cached lookup table however so you do not have all these problems with loosing data etc. Read more in my Why MySQL Could be slow with Large Tables article.

5. The n+1 pattern This probably should rather be called Not using Join. This is typical error. On other hand in MySQL you might be better of using several queries than doing complicated ones. Of course you would rather use IN() than do 100 of queries in this case. This most applies to subqueries Where Subselects with IN() become corellated even if they are not, and so using IN() list of values derived by previous query. For example you can do:

SQL:
  1. SELECT id FROM users WHERE featured=1;
  2.  
  3. Now populate List FOR IN ON your PHP application:
  4.  
  5. SELECT * FROM articles WHERE user_id IN(23,545,654,34)
  6.  
  7. instead of:
  8.  
  9. SELECT * FROM articles WHERE user_id IN (SELECT id FROM users WHERE featured=1)

Some day this should be fixed however but do not expect it soon.

Use Indexes This item was not in original article, however I think this is the most common mistake and it is very important to fix it. Most applications I have to fix have number of indexing missing which requires queries to do full table scans. Funny enough this is often not the problem in the beginning - if application is bought or custom ordered it frequently can pass customer QA - it will work quite fast with almost empty database. With database growth it will however start to crawl.
So developing you PHP applications use test database with reasonable amount of data in it. And do run EXPLAIN for your queries, especially if you see them in slow query log. If you have trouble understanding EXPLAIN or optimizing your queries remember
we're here to help.

MySQL Server Upgrade

Posted by peter

Today I've upgraded MySQL Server on the host running MySQL Performance Blog. MySQL 4.1.12 was running here for well over a year before that.

Why Have not I upgraded before ? Well because it just worked fine. Yes I know there were some security fixes but I have dedicated server with remote MySQL access closed by firewall and only trusted people having local access so I was not worried too much about it.

Why did I upgrade now ? Because new forum application was exposing one of the bugs in MySQL 4.1.12 so there was a need for upgrade.

Which Version did I upgrade to ? I Upgraded to MySQL 4.1.21. Why Did not I go with MySQL 5.0 ? There are three reasons. First there are number of changes in 5.0 some of which are incompatible with MySQL 4.1, for example regarding join syntax. I can't be 100% sure none of my applications will be affected and I see no reasons to spend time testing it carefully or downgrading if problem finally discovered a week later in some rare circumstances. Not to mention there is the chance new application I might be installing does not yet works with MySQL 5.0.

Second - I know my applications designed to work with MySQL 4.1, they avoid queries which MySQL 4.1 does not optimize well and so MySQL 5.0 will likely reduce performance. It is probably just 5-10% for most queries but this site runs on old Celeron box which does not have too much CPU cycles to waste.

Third - My sense of MySQL 5.0 stability is still not at MySQL 4.1 level. This is not exactly problem in this case - this site uses rather simple workload so MySQL 5.0 probably would do as well as MySQL 4.1 which even in its ancient 4.1.12 version never crashed on this site. However new applications which bring new workloads have higher chance of problems on MySQL 5.0 than on 4.1

This is upgrade strategy I follow for this server which has mix of various third party PHP/MySQL applications. In different situations upgrade strategies may be different. The main point I'm making is - you do not have to rush and use latest MySQL version neither it always makes sense to upgrade to each minor release if there are no fixes which affect you.