August 31, 2007

Site was down today, support and web hosting.

Posted by peter

During last one and a half year we had pretty good track record with MySQL Performance Blog - there were times when site was slow (especially when backup was running) but I do not remember significant downtime, until today we went down for few hours.

All this time the site was running on dedicated server which I rented from APLUS about 3 years ago. It is rather slow Celeron box with single disk and 512MB running Fedora Core 2. Despite its age (it was used “Value” server even when I got it) the server had very good track record with basically zero failures during this time - there were some network disruptions at Aplus but this is about all problems we had.
[read more...]

Working on “High Performance MySQL, Second Edition”, how you can help?

Posted by peter

As you may already have seen announcement by Baron we’re working on major rewrite of High Performance MySQL book - the most famous book about MySQL Performance on the market… which is getting old though. We’ve been slowly working on the book for over half a year now and were later joined by Arjen Lentz and Baron
[read more...]

August 29, 2007

Guess what?! Microsecond slow query timing finally available in MySQL!

Posted by Maciej Dobrzanski

Vadim asked me yesterday to update Microslow patch for 5.1.21, because the previous one I wrote for 5.1.20 failed to apply correctly on the new MySQL release. Imagine the expression on my face after I unpacked the sources and found out that MySQL incorporated the patch to their release. So it’s a built-in feature now in the 5.1 line. So far I noticed one addition to my code which is logging the microtime statistics into TABLE type log storage.

Maciek

August 28, 2007

To SQL_CALC_FOUND_ROWS or not to SQL_CALC_FOUND_ROWS?

Posted by Alexey Kovyrin

When we optimize clients’ SQL queries I pretty often see a queries with SQL_CALC_FOUND_ROWS option used. Many people think, that it is faster to use this option than run two separate queries: one - to get a result set, another - to count total number of rows. In this post I’ll try to check, is this true or not and when it is better to run two separate queries.

[read more...]

Do you always need index on WHERE column ?

Posted by Vadim

I believe we wrote about this before, but this topic popups again and again.
Today I’ve read opinion that if we have clause WHERE has_something=1 we should have index on column `has_something` (the column has two values 0 and 1).

In reality the right answer is not so simple.
[read more...]

Redundant index is not always bad

Posted by Vadim

About year ago Peter wrote about redundant indexes and mentioned sometimes it is good to leave two indexes, even one is first part of another. I’m speaking about BTREE indexes, for example, KEY (A), and KEY (A,B). From SQL point of view KEY(A) is not needed, as for queries like WHERE A=5 the index (A,B) also can be used.

But there is case when for performance it would be good to have both
[read more...]

August 26, 2007

Memory allocation in Stored Function

Posted by Vadim

UPDATE : Post is not actual anymore

Not so long time ago I had task to update string column in table with 10mil+ rows, and, as the manipulation was non-trivial, I decided this task is good to try Stored Function. Function written - go ahead. Since 5 min I got totally frozen box with no free memory and giant swap.

The case was worth to look deeply - let's try simple table (experiments with MySQL 5.0.45)

SQL:
  1. CREATE TABLE `testf` (
  2.   `id` int(11) NOT NULL AUTO_INCREMENT,
  3.   `name` varchar(255) DEFAULT NULL,
  4.   PRIMARY KEY  (`id`)
  5. ) ENGINE=MyISAM AUTO_INCREMENT=20971521 DEFAULT CHARSET=latin1
  6.  
  7. mysql> SELECT count(*) FROM testf;
  8. +----------+
  9. | count(*) |
  10. +----------+
  11. | 20971520 |
  12. +----------+
  13. 1 row IN SET (0.00 sec)

and simple function I've found in MySQL manual:

SQL:
  1. CREATE FUNCTION hello (s CHAR(20)) RETURNS CHAR(50) RETURN CONCAT('Hello, ',s,'!');

Originally I used UPDATE statement, but for avoiding I/O and allocating os cashes let's check only select:

SQL:
  1. mysql> SELECT avg(length(hello(name))) FROM testf;

and here is vmstat 5 output:

SQL:
  1. procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
  2. r  b   swpd   free   buff  cache   si   so    bi    bo   IN    cs us sy id wa
  3. 0  0      0 5103732 212556 9558976    0    0     0     0 1052  3179  0  0 100  0
  4. 1  0      0 4878900 212556 9558976    0    0     0     0 1078  3171 15  3 82  0
  5. 1  0      0 4534900 212556 9558976    0    0     0     3 1048  3091 21  4 75  0
  6. 1  0      0 4190964 212556 9558976    0    0     0     2 1071  3131 21  5 75  0
  7. 1  0      0 3848372 212556 9558976    0    0     0     4 1048  3093 21  5 75  0
  8. 1  0      0 3506868 212556 9558976    0    0     0     2 1070  3135 21  5 74  0
  9. 1  0      0 3165876 212556 9558976    0    0     0     2 1049  3092 21  5 75  0
  10. 1  0      0 2824820 212556 9558976    0    0     0     2 1071  3126 21  5 75  0
  11. 1  0      0 2483188 212556 9558976    0    0     0     0 1047  3213 21  5 75  0
  12. 1  0      0 2091380 212556 9558976    0    0     0     2 1070  3024 21  9 71  0
  13. 1  0      0 1798836 212556 9558976    0    0     0     2 1049  2141 21 16 63  0
  14. 1  0      0 1457012 212556 9558976    0    0     0     4 1072  2770 21  5 75  0
  15. 1  0      0 1114484 212556 9558976    0    0     0     2 1049  2729 21  5 75  0
  16. 1  0      0 771444 212556 9558976    0    0     0     2 1071  2772 21  4 75  0
  17. 1  0      0 428724 212556 9558976    0    0     0     2 1050  2729 21  4 74  0
  18. 1  0      0  85748 212556 9558976    0    0     0     2 1073  2770 21  5 75  0
  19. 1  0      0  22964 204920 9291280    0    0     0     3 1049  2753 21  5 74  0

As you see MySQL ate 5GB of memory in 85 sec. That makes things clearer. As I understand MySQL allocates memory in each call of Stored Function, but de-allocates it only at the end of statement. This makes usage of Stored Function very limited from my point of view. And, yes, if you are looking for a way to DoS attack of your hosting provider - this is worth to try.

August 18, 2007

How fast can you sort data with MySQL ?

Posted by peter

I took the same table as I used for MySQL Group by Performance Tests to see how much MySQL can sort 1.000.000 rows, or rather return top 10 rows from sorted result set which is the most typical way sorting is used in practice.

I tested full table scan of the table completes in 0.22 seconds giving us about 4.5 Million of rows/sec. Obviously we can't get sorted result set faster than that.
[read more...]

August 17, 2007

Should MySQL Extend GROUP BY Syntax ?

Posted by peter

Jan has a good article about finding the row matching some value in the group:

This is one illustration of group by limitations in SQL language which is not offset by any MySQL specific extensions,yet
As you can see if you want to get one row from the group which is sorted some way you have to use ugly hacks. This is because SQL does not have a notion of sorting the data within the groups and in fact ANSI SQL even forbids you to select columns which are not aggregates or part of group by because result in this case is not defined.

What would be quite helpful is to have GROUPORDER (pick the name) clause which defines which element is selected for non aggregate columns, something like

SQL:
  1. SELECT MAX(Population),  City, Country  FROM City GROUP BY Country GROUPORDER BY Population DESC;

This could give you row which has highest population for which country, and really you would not even need to query MAX(population) if you do not need that because you're soring by population you would see MAX(Population) in the population column:

SQL:
  1. SELECT City, Population City FROM City GROUP BY Country GROUPORDER BY  Population DESC

would simply return city with highest population for each country.

In fact we had similar problem in Sphinx when we were working on BoardReader project - we wanted to group search results by thread and return only one best matching post from each thread while matches themselve could be sorted by relevance or freshness. The benefit of not following any standards is what you can always add what you want and this is just what we did with sphinx in similar case.

SQL Standard might be good for enterprise applications but huge amount of Web applications do not care - they are stuck enough in MySQL they would have rewrite a lot going to other database anyway while they would love to get features which allow you to do stuff more efficient or more beautiful way.

On other hand what I'm talking about if probably the most commonly requested extension which is already supported by number of vendors - CONNECT BY is still not implemented.

August 16, 2007

How much overhead is caused by on disk temporary tables

Posted by peter

As you might know while running GROUP BY and some other kinds of queries MySQL needs to create temporary tables, which can be created in memory, using MEMORY storage engine or can be created on disk as MYISAM tables. Which one will be used depends on the allowed tmp_table_size and also by the data which needs to be put to temporary tables - BLOB/TEXT columns are not supported with MEMORY storage engine so must use on disk MyISAM temporary table.

What i decided to do is to see how much overhead do on disk temporary tables cause compared to MEMORY tables. To have things comparable I used medium size table and types which can be presented both in MEMORY and MyISAM storage engine.

The benchmarks are done with MySQL 5.0.45 on 32bit Fedora Core 6 on Pentium 4 box - something I had available for testing.

SQL:
  1. CREATE TABLE `gt` (
  2.   `i` int(10) UNSIGNED NOT NULL,
  3.   `c` char(50) NOT NULL
  4. ) ENGINE=MyISAM DEFAULT CHARSET=latin1
  5.  
  6.  
  7. mysql> SELECT * FROM gt LIMIT 5;
  8. +-------+------------------------------------------+
  9. | i     | c                                        |
  10. +-------+------------------------------------------+
  11. | 25451 | 80c9a87595687ccb33fa525d396ee75658aec777 |
  12. | 48063 | 05bdcc2c1bb051cff861cee1ffcb7f680922b13e |
  13. | 84146 | afdf7dbdf7a71dbeac553a0cccce4ab3b1ca3bdb |
  14. | 66197 | a26f4ce41df9b79f8ce73f8a41c207c4b67f827d |
  15. |   432 | 5f912d13f2d415c2c7cb325e1ebbdf468b37bc91 |
  16. +-------+------------------------------------------+
  17. 5 rows IN SET (0.00 sec)

The table contains 1.000.000 rows with mostly unique "c" column.

SQL:
  1. mysql> EXPLAIN SELECT count(*) cnt,c FROM gt GROUP BY c ORDER BY NULL LIMIT 10;
  2. +----+-------------+-------+------+---------------+------+---------+------+---------+-----------------+
  3. | id | select_type | TABLE | type | possible_keys | KEY  | key_len | ref  | rows    | Extra        |
  4. +----+-------------+-------+------+---------------+------+---------+------+---------+-----------------+
  5. 1 | SIMPLE      | gt    | ALL  | NULL          | NULL | NULL    | NULL | 1000000 | USING TEMPORARY |
  6. +----+-------------+-------+------+---------------+------+---------+------+---------+-----------------+
  7. 1 row IN SET (0.01 sec)

As you can see I'm using ORDER BY NULL clause as otherwise MySQL will sort the data after performing group by which is overhead we're not looking to measure. This is actually good trick to use for cases when you do not need GROUP BY results sorted - it can significantly improve performance in some cases.

First I run the query with default key_buffer_size which is not large enough to fit all key blocks from temporary table index, so we get a lot of key writes which kill performance.

The problem with writes is OS can delay writes only for fixed amount of time and when it has to perform them anyway which can cause IO bound load even with small data which can fully fit in OS cache as in this case. In perfect world it should not happen and we should have seen 50MB sequentially written once per certain amount of time which takes fraction of second to do, but it is not how it happens in practice.

SQL:
  1. mysql> SHOW global STATUS LIKE "key%";
  2. +------------------------+---------+
  3. | Variable_name          | Value   |
  4. +------------------------+---------+
  5. | Key_blocks_not_flushed | 6516    |
  6. | Key_blocks_unused      | 0       |
  7. | Key_blocks_used        | 7248    |
  8. | Key_read_requests      | 2051451 |
  9. | Key_reads              | 198208  |
  10. | Key_write_requests     | 469220  |
  11. | Key_writes             | 185808  |
  12. +------------------------+---------+
  13. 7 rows IN SET (0.00 sec)

So how long does it take:

SQL:
  1. mysql> SELECT count(*) cnt,c FROM gt GROUP BY c ORDER BY NULL LIMIT 10;
  2. +-----+------------------------------------------+
  3. | cnt | c                                        |
  4. +-----+------------------------------------------+
  5. |   1 | 80c9a87595687ccb33fa525d396ee75658aec777 |
  6. |   1 | 05bdcc2c1bb051cff861cee1ffcb7f680922b13e |
  7. |   1 | afdf7dbdf7a71dbeac553a0cccce4ab3b1ca3bdb |
  8. |   1 | a26f4ce41df9b79f8ce73f8a41c207c4b67f827d |
  9. |   1 | 5f912d13f2d415c2c7cb325e1ebbdf468b37bc91 |
  10. |   1 | dc960ee027a9d741ccd22ae0ee32d0c8915f158b |
  11. |   1 | 68d91f476ff40116c9c9ba0b85037496ededa1ef |
  12. |   2 | 9bc66fb6426222e3a94c43a64ed701cf197f8cb8 |
  13. |   1 | 9c2a671310c810e1f4a9aa4763ce5e76d28e3af4 |
  14. |   1 | 2e282a3aeb832b8b5861f703041acba5a4285a35 |
  15. +-----+------------------------------------------+
  16. 10 rows IN SET (9 min 5.40 sec)

I repeated the run a few times with very similar results.

OK so what if we set key buffer to 128M ?

SQL:
  1. mysql> SHOW global STATUS LIKE "key%";
  2. +------------------------+---------+
  3. | Variable_name          | Value   |
  4. +------------------------+---------+
  5. | Key_blocks_not_flushed | 49361   |
  6. | Key_blocks_unused      | 61201   |
  7. | Key_blocks_used        | 49361   |
  8. | Key_read_requests      | 4007329 |
  9. | Key_reads              | 49361   |
  10. | Key_write_requests     | 873177  |
  11. | Key_writes             | 0       |
  12. +------------------------+---------+
  13. 7 rows IN SET (0.00 sec)

As you can see we have no writes because data fits in key buffer. Note however there is still significant amount of reads (the stats were flushed between query runs) which does not make sense as there no data to be read from the disk as nothing was written. So I filled bug on this.

So what was performance

SQL:
  1. mysql> SELECT count(*) cnt,c FROM gt GROUP BY c ORDER BY NULL LIMIT 10;
  2. +-----+------------------------------------------+
  3. | cnt | c                                        |
  4. +-----+------------------------------------------+
  5. |   1 | 80c9a87595687ccb33fa525d396ee75658aec777 |
  6. |   1 | 05bdcc2c1bb051cff861cee1ffcb7f680922b13e |
  7. |   1 | afdf7dbdf7a71dbeac553a0cccce4ab3b1ca3bdb |
  8. |   1 | a26f4ce41df9b79f8ce73f8a41c207c4b67f827d |
  9. |   1 | 5f912d13f2d415c2c7cb325e1ebbdf468b37bc91 |
  10. |   1 | dc960ee027a9d741ccd22ae0ee32d0c8915f158b |
  11. |   1 | 68d91f476ff40116c9c9ba0b85037496ededa1ef |
  12. |   2 | 9bc66fb6426222e3a94c43a64ed701cf197f8cb8 |
  13. |   1 | 9c2a671310c810e1f4a9aa4763ce5e76d28e3af4 |
  14. |   1 | 2e282a3aeb832b8b5861f703041acba5a4285a35 |
  15. +-----+------------------------------------------+
  16. 10 rows IN SET (34.47 sec)

Wow this is over 15 times better so you surely need to make sure your key_buffer_size is set to accommodate needs of your temporary table. Do not forget about it especially if running only Innodb tables so you may think there is no use for it.

Looking at this number I found it to be too good to be true as I've seen significantly worse performance for large data sets. So I decided to see what happens if I create a larger table - 4M rows. I sized key_buffer to 256M so it would fit everything needed and there was still enough memory for OS cache to keep temporary table. Results:

SQL:
  1. mysql> SELECT count(*) cnt,c FROM gtest GROUP BY c ORDER BY NULL LIMIT 10;
  2. +-----+------------------------------------------+
  3. | cnt | c                                        |
  4. +-----+------------------------------------------+
  5. |   2 | 80c9a87595687ccb33fa525d396ee75658aec777 |
  6. |   1 | 05bdcc2c1bb051cff861cee1ffcb7f680922b13e |
  7. |   1 | afdf7dbdf7a71dbeac553a0cccce4ab3b1ca3bdb |
  8. |   2 | a26f4ce41df9b79f8ce73f8a41c207c4b67f827d |
  9. |   1 | 5f912d13f2d415c2c7cb325e1ebbdf468b37bc91 |
  10. |   1 | dc960ee027a9d741ccd22ae0ee32d0c8915f158b |
  11. |   1 | 68d91f476ff40116c9c9ba0b85037496ededa1ef |
  12. |   2 | 9bc66fb6426222e3a94c43a64ed701cf197f8cb8 |
  13. |   1 | 9c2a671310c810e1f4a9aa4763ce5e76d28e3af4 |
  14. |   1 | 2e282a3aeb832b8b5861f703041acba5a4285a35 |
  15. +-----+------------------------------------------+
  16. 10 rows IN SET (27 min 5.51 sec)

VMSTAT:

SQL:
  1. procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
  2.  r  b   swpd   free   buff  cache   si   so    bi    bo   IN    cs us sy id wa st
  3.  0  4    116  60064   9084 547160    0    0     5    44   96   103  2  0 94  3  0
  4.  0  3    116  56248   9104 549212    0    0     2  2023 1518   498  5  2 10 83  0
  5.  1  3    116  56316   9104 549228    0    0     0  2121 1539   497  0  1 24 75  0

In this case as you can see there is a lot of dirty blocks flushing going on with pretty random IO. Clearly OS write caching is not effective for relatively large area (which still fits in memory) which is getting a lot of random writes.

OK. Now lets move to testing MEMORY tables for the same queries:

SQL:
  1. mysql> SET global key_buffer_size=8000000;
  2. Query OK, 0 rows affected (0.03 sec)
  3.  
  4. mysql> SET global max_heap_table_size=1000000000;
  5. Query OK, 0 rows affected (0.00 sec)
  6.  
  7. mysql> SET global tmp_table_size=1000000000;
  8. Query OK, 0 rows affected (0.00 sec)

Ahh... I alway forget GLOBAL only sets global value but it does not affect current session. A little gotcha which still catches me.

For 1M rows we have

SQL: