May 1, 2008

Quickly preloading Innodb tables in the buffer pool

Posted by peter

In the previous post I mentioned a way I use to preload Clustered Index (data) for Innodb tables. Though I thought this topic would benefit from a bit more information.

But lest first start with feature request for Innodb Team: All ways I mention here are hacks and they can’t be as efficient as native support. It would be great if Innodb would implement command to preload table to Innodb buffer pool, which would simply go through .ibd file sequentially and inject pages in the buffer pool. This would make preload done using sequential file scan even if indexed suffered a lot of page splits.

Now lets continue to the hacks :)
[read more...]

Learning about MySQL Table Fragmentation

Posted by peter

Recently I was working with the customer who need quick warmup - to get Innodb table fetched in memory as fast as possible to get good in memory access performance.

To do it I run the query: “SELECT count(*) FROM tbl WHERE non_idx_col=0″ I use this particular form of query because it will do full table scan - running count(*) without where clause may pick to scan some small index instead.

If your table is not fragmented one of two things should happen - either you should be reading at your hard drive sequential read rate or you would see MySQL becoming CPU bound if IO subsystem is too fast.
[read more...]

T2000 CPU Performance - Watch out

Posted by peter

Sun is aggressively pushing T2000 as Scalable MySQL Platforms, and indeed it is Scalable in terms of high concurrency workloads - it is able to execute a lot of concurrent threads and so speed gain from 1 thread to say 32 thread will be significant.

But thing a lot of people miss is - Being Scalable is Not Enough - you need to scale from reasonable base to claim the good performance, and this is where T2000 performs subpar in many cases.
[read more...]

April 30, 2008

Can having information public hurt consulting business ?

Posted by peter

People frequently ask me if the fact we keep information public can hurt our consulting business ? Lets keep aside for the moment amount of new business publishing this information brings to us but think it also have significant negative effect because people find information on MySQL Performance Blog and use it instead of purchasing our services ?

[read more...]

Percona Team presentations from MySQL Users Conference 2008 published

Posted by peter

If you’ve missed our presentations on MySQL Users Conference you can catch up now by taking a look at the slides, which are now published in presentations section of our company web site. You can also find a lot of old presentations in the same location.

Enjoy !

April 28, 2008

MySQL Replication vs DRBD Battles

Posted by peter

Well these days we see a lot of post for and against (more, more) using of MySQL and DRBD as a high availability practice.

I personally think DRBD has its place but there are far more cases when other techniques would work much better for variety of reasons.
First let me start with Florian’s comments on the issue as I think they are most interested ones.
[read more...]

The MySQL optimizer, the OS cache, and sequential versus random I/O

Posted by Baron Schwartz

In my post on estimating query completion time, I wrote about how I measured the performance on a join between a few tables in a typical star schema data warehousing scenario.

In short, a query that could take several days to run with one join order takes an hour with another, and the optimizer chose the poorer of the two join orders. Why is one join order so much slower than the other, and why did the optimizer not choose the faster one? That's what this post is about.

Let's start with the MySQL query optimizer. The optimizer tries to choose the best join order based on its cost metric; it tries to estimate the cost for a query, then choose the query plan that has the lowest cost. The unit of cost for the MySQL query optimizer is a single random 4k data page read. In general, it's a pretty good metric, but it has one major weakness: the server doesn't know whether a read will be satisfied from the operating system cache, or whether it'll have to go to disk. (This distinction is abstracted away by the storage engine; the optimizer doesn't know how a given storage engine stores its data).

I'll try to omit the details and keep this clean. Let's take a look at the tables.

SQL:
  1. mysql> SHOW TABLE STATUS LIKE 'fact'\G
  2. *************************** 1. row ***************************
  3. Name: fact
  4. Engine: MyISAM
  5. Rows: 147045493
  6. Avg_row_length: 117
  7. Data_length: 17217646764
  8. Index_length: 11993816064
  9.  
  10. mysql> SHOW TABLE STATUS LIKE 'dim1'\G
  11. *************************** 1. row ***************************
  12. Name: dim1
  13. Engine: MyISAM
  14. Rows: 453193
  15. Avg_row_length: 122
  16. Data_length: 55605116
  17. Index_length: 93812736
  18.  
  19. mysql> SHOW TABLE STATUS LIKE 'dim2'\G
  20. *************************** 1. row ***************************
  21. Name: dim2
  22. Engine: MyISAM
  23. Rows: 811
  24. Avg_row_length: 105
  25. Data_length: 85368
  26. Index_length: 154624

It's a big fact table and two fairly small dimension tables, which is normal. Here is the query:

SQL:
  1. SELECT fact.col1, min(fact.col2) AS min_col2
  2. FROM fact, dim1, dim2
  3. WHERE fact.col4 = dim1.col4
  4. AND dim1.col3 <> 'hello world'
  5. AND dim2.col5 = 1
  6. AND fact.dim2_id = dim2.dim2_id
  7. AND fact.col2> some_const
  8. GROUP BY fact.col1

There are indexes on all the columns in all the ways you'd expect: all the dimension columns are indexed on every table, and there's a separate index on every column in the WHERE clause. Here's the query plan initially.

SQL:
  1. *************************** 1. row ***************************
  2. TABLE: dim1
  3. type: range
  4. key_len: 195
  5. rows: 18790
  6. Extra: USING WHERE; USING TEMPORARY; USING filesort
  7. *************************** 2. row ***************************
  8. TABLE: fact
  9. type: ref
  10. key_len: 4
  11. rows: 606
  12. Extra: USING WHERE
  13. *************************** 3. row ***************************
  14. TABLE: dim2
  15. type: eq_ref
  16. key_len: 2
  17. rows: 1
  18. Extra: USING WHERE

This query will run for days and never complete. No one ever let it finish to see how long it would run.

How do I know it will run for days? Here's my train of thought:

  • It's performing index lookups into the fact table, which is big.
  • An index lookup is a random I/O.
  • A modern disk can do about 100 random I/O's per second, as a rule of thumb.
  • If you do the math with the rows column in EXPLAIN, you realize that this equates to about 18790 * 606 = 11386740 I/O operations, assuming that the indexes are fully in memory.
  • When you divide this by 100 I/O operations per second, and then divide that by 86400 seconds in a day, you get about 2.6 days.

Ouch! That's slow.

Now let's look at the alternative: table-scan the fact table, and do index lookups in the two dimension tables. MySQL doesn't want to choose this join order, so we'll force it with STRAIGHT_JOIN:

SQL:
  1. EXPLAIN SELECT STRAIGHT_JOIN  ....
  2. +-------+-----------+-----------+---------------------------------+
  3. | TABLE | type      | rows      | Extra                           |
  4. +-------+-----------+-----------+---------------------------------+
  5. | fact  | ALL       | 147367284 | USING TEMPORARY; USING filesort |
  6. | dim1  | eq_ref    | 1         | USING WHERE                     |
  7. | dim2  | eq_ref    | 1         | USING WHERE                     |
  8. +-------+-----------+-----------+---------------------------------+

As we saw in the previous post, which I linked at the top of this post, we can scan the fact table in less than an hour. And it turns out that joining to the dimension tables doesn't slow the query perceptibly, because these tables are small and they stay in memory, in the OS cache. (They don't get evicted from memory by the cache's LRU policy, because they are frequently used -- once per row in the fact table. The LRU policy evicts old blocks from the fact table instead, which is perfect -- these blocks are used only once and not needed again, so they can be replaced).

The difference between the two queries -- 55 minutes and 2.6 days -- is basically the difference between scanning data sequentially on disk and random disk I/O.

So now you know why one join order is faster than the other. But why didn't the optimizer know this, too? The optimizer does know that random access is slower than sequential access, but it doesn't know that the dimension tables will stay in memory, and this is an important distinction.

Let's put ourselves into the mindset of the optimizer. We'll assume that every join to the dimension tables will go to disk instead of being read from cache. Now the STRAIGHT_JOIN becomes a table scan of about 313 sequential reads (150 million rows / 117 bytes per row / 4096 bytes per read), plus about 150 million random I/Os for the first dimension table, plus 150 million random I/Os for the second dimension table. That's 300 million random I/O operations.

In contrast, the optimizer chose a plan that it thought would cause only 11.3 million random I/O operations.

The optimizer was being smart, given its lack of knowledge about the OS cache. This is why an expert is sometimes needed to provide the missing information. If the MySQL optimizer were right and each of these had to go to disk, our STRAIGHT_JOIN plan would take more than a month to complete! Good thing we know the difference between cache and disk!

April 26, 2008

How multiple disks can benefit for single client workload ?

Posted by peter

Let us talk few more about disks. You might have read my previous post and Matt's Reply and it looks like there are few more things to clarify and explain.

Before I get to main topic of the article lets comment on IO vs Disk question. If you look at Disk Based databases all data accesses are treated as IOs - it can be "logical" if they are cached or "phyiscal" if they require actual IO done but in the general sense all data accesses are IOs. If you use this terminology when most of the problems would come down to IO - making queries to touch fewer rows (or row portions) or having these "touches" resolved as logical IO rather than physical. There is still locking ,networking etc to deal with but it is minor story.

This is not however as Most of the people understand IO and as not as I typically use these terms. For me IO is IO bound workload - disks are moving and CPU sits idle. With such terminology there is instantly much smaller amount of cases are about IO because we would call cases when too much of logical IO is happening CPU bound. The beauty of this terminlogy (and so why I use it) - it is very easy to see if system is IO bound or CPU bound, while to understand if MySQL goes through more rows than it needs to requires look at the queries and schema.

Ok Let us new get to back to main point of the article.
[read more...]

April 25, 2008

Is disk Everything for MySQL Performance ?

Posted by peter

I read very nice post by Matt today and it has many good insights though I can't say I agree on all points.
First there is a lot of people out where which put it as disk is everything. Remember Paul Tuckfield saying "You should ask how many disks they have instead of how many systems they have" on MySQL UC2008 Scalability Panel ? Indeed disks MAY be the most important part in your system performance or it may not be. Different people get to deal with different systems and so acquire different feeling about percentage of cases when disk would be the problem.
[read more...]

April 24, 2008

MySQL Architecture meeting at Google

Posted by peter

Friday after MySQL Users Conference we had a smaller meeting at Google campus to talk about MySQL architecture mainly focusing on storage engine vendors and other extension areas.
It was very interesting to see all these storage engine interface extensions which are planned for MySQL 6.0 and beyond - abilities to intercept query execution or offloading query fragments and operations (sorting limit etc) in the storage engines. This is great news as this would allow to build really innovative storage engines with MySQL which was previously hard because of defined row by row retrieval interface and nested loops used for joins.

However what stroke me is a thought - This thing is really getting complicated. Few years ago Marten would frequently mention Oracle (and other commercial databases) as complicated beasts being overkill for most of their users.
[read more...]


This page was found by: mysql performance bl... mysql performance mysqlperformanceblog www.mysqlperformance... mysql optimization b...