January 31, 2007

Getting MySQL to use full key length

Posted by peter

There is one bug, or "missing feature" in MySQL Optimizer which may give you hard time causing performance problems which may be hard to track down, it is using only part of the index when full index can be used or using shorter index while there is longer index available. The last item is yet another good reason for removing redundant indexes

Here is example from NNSEEK database:

SQL:
  1. mysql> EXPLAIN SELECT thread_id FROM nn2_msg132.msg132   WHERE group_id=398157 AND parent_id=0 AND (published BETWEEN '2006-12-02 00:00:00' AND '2006-12-02 23:59:59')   ORDER BY published DESC LIMIT 0,10 \G
  2. *************************** 1. row ***************************
  3.            id: 1
  4.   select_type: SIMPLE
  5.         TABLE: msg132
  6.          type: ref
  7. possible_keys: group_id,thread_id,groupid_published,grp_subj
  8.           KEY: group_id
  9.       key_len: 8
  10.           ref: const,const
  11.          rows: 1
  12.         Extra: USING WHERE
  13. 1 row IN SET (0.00 sec)

Where group_id key is defined as KEY `group_id` (`group_id`,`parent_id`,`published`)

As you can see MySQL selects to use "ref" access only using two first key parts from the index and it assumes there is only one row to be matched by index. In fact there are 2000 matching rows for these constants (parent_id=0 is special value which matches significant amount of rows).

I think optimizer should be fixed to always expand and use "range" lookup at least for simple ranges. Even if further restrictions by next key part will only barely increase selectivity it anyway does not cost much more as basically the same data is traversed anyway.

So how do you force MySQL optimizer to use full length in this case ? ANALYZE TABLE does not help, at least in this case. However looks like as one of its side effects FORCE INDEX actually forces index to be used to largest extent possible:

SQL:
  1. mysql> EXPLAIN SELECT thread_id FROM nn2_msg132.msg132 force INDEX(group_id)  WHERE group_id=398157 AND parent_id=0 AND (published BETWEEN '2006-12-02 00:00:00' AND '2006-12-02 23:59:59')   ORDER BY published DESC LIMIT 0,10 \G
  2. *************************** 1. row ***************************
  3.            id: 1
  4.   select_type: SIMPLE
  5.         TABLE: msg132
  6.          type: range
  7. possible_keys: group_id
  8.           KEY: group_id
  9.       key_len: 12
  10.           ref: NULL
  11.          rows: 51
  12.         Extra: USING WHERE
  13. 1 row IN SET (0.00 sec)

As you can see in this case estimate becomes closer to the truth and this query runs about 50 times faster and easy on buffer pool (we do not have many un-needed rows accessed causing their pages to be loaded replacing valuable data from cache).

MySQL could look at the stats and adjust them appropriately - the plan for second query is accessing subset of rows from the first plan so estimation for number of rows for the first query should not be smaller. Stats and B-Tree dives can give information which is quite far away from the real number so it is worth at least make it consistent in this respect.

The other example I mentioned in the start of this post which I often have trouble with (just do not have example today) is using different index, for example, having indexes (A) and (A,B) for query A=Const and B>Const we can see MySQL selecting (A) index instead of (A,B) or even worse it can can select something like (A,D) using only first keypart of that index.

To fix this problem I guess another statistics alignment should take place - if you have several keys which start with prefix A and you use only this prefix for lookup - estimate number of rows should be the same for all indexes.

January 30, 2007

Linux IO Schedulers and MySQL

Posted by peter

Found a great article about Linux IO Schedulers today which is quite interesting. It goes in details about schedulers and explains in which of workloads which of schedulers is best.

The interesting thing this article points out is - there are multiple versions of each of the schedulers, while name remains the same. This means unless you really know mapping between kernel versions and scheduler versions it is very hard to evaluate benchmark results.

This could be noticed by benchmarks we've done over years. Long time ago "AS" scheduler could be several times slower than deadline for MySQL workloads such as SysBench or DBT2 when it went down to 30% difference and in the last runs we've done difference was not really significant.

This article also points out benchmarking IO schedulers you should look at more numbers than aggregate bandwidth - you also better to measure per client bandwidth as well as max latency as this is what can be the problem. Take a look at these old results for example. It also means you'd better to perform IO scheduler benchmarks on mixed load with different of task, for example mixing OLTP with some reporting queries if you really want to see the difference.

From the article it looks like CFQ should be good choice for databases and it is also found to work pretty well by some benchmarks we've done. The only question if it is doing as good as it could - In the docs it is mentioned it uses "per process" scheduling while MySQL is single process but single thread - does each thread gets its own queue in reality or is it shared ?

We should look into this when we'll run more benchmarks for IO Schedulers.

Making MySQL Replication Parallel

Posted by peter

Kevin Burton writes about making MySQL Replication Parallel. Many of us have been beaten by the fact MySQL Replication is single threaded so in reality it is only able to use only single CPU and single disk effectively which is getting worse and worse as computers are getting "wider" these days with multi-core CPUs.

Kevin proposes to execute queries in parallel and it is generally good idea, the problem is however implementing it right without changing MySQL Replication semantics - which is - Slave database state corresponds to master database state at certain point in time. It is delayed but threads reading from the slave never will see state of the database which never existed on master.

As I commented in Kevins blog the problem is very simple to illustrate - assume you have 2 queries modifying 2 different tables, query A and query B. On the Master query A completed first and B followed it. On the slave we execute them in parallel so query B may complete before query A causing database stage which never existed on the master. Of course the idea could be to wait on final commit stage and commit queries A and B in order defined by Master but it brings to the plate other problems such as possible deadlocks between queries if they are complex transactions.

It should be however not as bad if we only look at single queries or transactions which do not have any overlap in terms of tables.

For some users commit order for independent queries may be unimportant so this restriction could be weakened to only make sure there is a "barrier" between queries which are possibly dependent on each other, such as reading or writing to the same tables.

There is other possible solution it is to allow multiple threads inside the server to share same transactional/lock context. In this case replication could accumulate number of queries execute them in parallel and then commit all at once.

None of these however are easy trick which I would expect to come quite soon.

On other hand if support for Multi-Master is implemented for many applications Parallel Replication could be implemented simply by filtering transactions and writing to number of binary logs.

If you're "Scaling Out" you may just treat single server as it is few servers, so place several independent pieces on it, for example if different databases. Now if you could setup filtering so updates for each of them is written to its own binary log file and setup multi-master replication so slave can read all of them in parallel you can get replication parallel enough for many application without serious code complications.

If MySQL would not implement it it might be nice feature to hack into community tree.

January 19, 2007

TMP_TABLE_SIZE and MAX_HEAP_TABLE_SIZE

Posted by peter

We all know disk based temporary tables are bad and you should try to have implicit temporary tables created in memory where possible, do to it you should increase tmp_table_size to appropriate value and avoid using blob/text columns which force table creation on the disk because MEMORY storage engine does not support them Right ?

Wrong.

In fact setting tmp_table_size is not enough as MySQL also looks at max_heap_table_size variable and uses lower value as a limit to for in memory temporary table after which it will be converted to MyISAM.

To make things more confusing this is not what you would read in MySQL manual as far as I understand it:
From http://dev.mysql.com/doc/refman/5.0/en/memory-storage-engine.html

MEMORY table contents are stored in memory, which is a property that MEMORY tables share with internal tables that the server creates on the fly while processing queries. However, the two types of tables differ in that MEMORY tables are not subject to storage conversion, whereas internal tables are:

* If an internal table becomes too large, the server automatically converts it to an on-disk table. The size limit is determined by the value of the tmp_table_size system variable.
* MEMORY tables are never converted to disk tables. To ensure that you don't accidentally do anything foolish, you can set the max_heap_table_size system variable to impose a maximum size on MEMORY tables. For individual tables, you can also specify a MAX_ROWS table option in the CREATE TABLE statement.

For me this description looks as there are two types of in memory tables where internal ones are controlled by tmp_table_size and explicit ones use max_heap_table_size value.

Interesting enough there is 2.5 years old bug on this matter which just recently started to get attention. I understand it could be complex to fix but why real behavior was not documented in the manual at least ?

But what surprises me the most is how this issue was fixed (patch pending):

ChangeSet@1.2311, 2006-11-16 04:11:16+03:00, ted@ted.mysql.internal +6 -0
BUG #4291 fix: new configuration option "disk-tmp-table-size"
introduced to set maximum expected on-disk temporary table size
and avoid mix-up of tmp_table_size and max_heap_table_size

ChangeSet@1.2358, 2007-01-03 14:45:26+03:00, ted@ted.mysql.internal +7 -0
BUG #4291: max_heap_table_size affects creation of disk-based temporary table

fix: the new system variable memory_tmp_table_size is introduced;
it stands now for the exact purpose the Manual says
tmp_table_size used to do.

tmp_table_size retains to (give a hint about a)
limit of the on-disk temporary table size. The limit imposed upon
the disk-based temporary tables is still quite relative due to MyISAM
current implementation restrictions.

So now we're getting 4 variables instead of two ?

It is unclear about tmp_table_size - if it is going to be read only variable to tell you what maximum temporary table size is or is it going to limit on disk table size ? Any of behaviors have nothing to do with previous behavior and second one would break a lot of things.

In my opinion it would be much better to change it to match what users know about it, what is documented in the manual, config files, tons of books and articles on the web - you will have users expecting old behavior for years.

The size restriction of on disk temporary table could be good to add but that should have been another variable.

MySQL Binaries availability

Posted by peter

It looks like it looks like there are going a lot of discussions about future of MySQL Community Binaries, see for example this post and Kaj's clarifications.

Obviously now it grew to a lot of speculations and many comments are far from real story. It is also really interesting to watch MySQL try to balance situation of maximizing revenues and minimizing missing off community at the same time. It is moving target and balance may shift in the future one direction or another depending on how things go.

Some of confusion I guess comes from my post where I speak about recent version and so recent binaries which were not available at that point. In many cases "recent" word seems to be omitted and people are just speaking about binaries in general.

As Kaj explains now there will be official MySQL Community binaries, including Windows but they will be rarely released.

For many users running old MySQL binary is not the problem, it is however the problem if you run into the bug, which may already be fixed in newer version but binary with bugfix is not yet available. I guess having bug fixes promptly available is one of the selling points of MySQL Enterprise.

Windows build specially come into the picture as Windows users are not used to building software and most do not even have tools needed to perform the build available. Furthermore there are no vendors which will create binaries for you as happens with Linux.

The good news are - this only applies to Official binaries. I expect there would be unofficial builds popping up, we for example published our build of MySQL 5.0.33 for Linux x86-64. Scott has provided builds both for Linux i386 and Windows. I guess over next few months community will establish trusted sources of MySQL Builds and you will be able to get good builds promptly with new community releases. As long as MySQL Enterprise sources continue to be available I expect people will create builds of these as well, under different name of course.

January 18, 2007

INSERT ON DUPLICATE KEY UPDATE and REPLACE INTO

Posted by peter

Jonathan Haddad writes about REPLACE INTO and INSERT ON DUPLICATE KEY UPDATE. Really, Why MySQL has both of these, especially both are non ANSI SQL extensions ?

The story here seems to be the following - REPLACE INTO existed forever, at least since MySQL 3.22 and was a way to do replace faster and what is also important atomically, remember at that time MySQL only had ISAM tables with table locks and no transactions support. Of course you could use LOCK TABLES but it is not efficient.

The reason REPLACE could be efficient for ISAM and MyISAM, especially for fixed length rows is - it could perform row replacement without reading old data first, and of course because you could set it to replace multiple values at the same time just as you have multiple value INSERT.

As a side note: the fact REPLACE does not have to do read before write is a bit overrated from efficiency standpoint. As most rows are less than 4K-8K in size and are not aligned to OS cache page OS still would need to perform read from hard drive before it can perform an update, if data is not in OS cache, and if it is read would not be large overhead ether.

The problem with REPLACE was - many people tried to use it like update accessing previous column value, for example doing something like REPLACE INTO tbl (col1) VALUES (col1+10);. Instead of acting as update this really will insert NULL value, as col1 is undefined at this stage.

INSERT ON DUPLICATE KEY UPDATE is newer feature which came in MySQL 4.1 by advice one of MySQL big users. The question was efficiently maintaining counters in MySQL. There are also number of similar cases when you want ether to insert the new row or update stats for existing row.

I think both features are really great and I use them both. They are implemented really in MySQL style of being simple powerful and easy to use.

January 17, 2007

Performance impact of complex queries

Posted by peter

What is often underestimated is impact of MySQL Performance by complex queries on large data sets(ie some large aggregate queries) and batch jobs. It is not rare to see queries which were taking milliseconds to stall for few seconds, especially in certain OS configurations, and on low profile servers (ie having only one disk drive) even if you just have one such query running concurrently.

Lets talk a bit how it is happening and how to prevent it.

Cache Wiping This is first reason for this to happen - query which crunches large amount of data set wipes data from your normal working set from OS cache. Operation Systems and MySQL Itself employs various strategies to attempt to minimize such effect but the truth is it still happens.

Disk Starvation As Cache efficiency drops more requests have to hit the disk, which may be 100% busy running your batch job query. This is especially bad when you only have one drive for database location and in this case single query really can keep it always busy. As drive gets busy requests have to wait to be scheduled. You may imagine it would not need to wait for long because there is only one query ripping disk apart, in fact however a lot of queries can get piled up because of reduced cache efficiency many queries may start waiting on disk at once. In some cases I've seen average time requests spends in the queue to be over 1 second, which is much more than few milliseconds you would expect disk IO to take.

Scheduling Issues OS Disk IO schedules may also "help" in this case. Number of them would try to optimize throughput before latency, meaning if you have query which is doing full table scan (sequential reads) its requests will be prioritized as they do not require disk head movement, compared to IO requests in random locations other threads are willing to take. Especially some older disk schedulers could be poor. I remember in Linux 2.4 times I could make single full table scan query to slow things that bad, so "ls" would take over 10 seconds for directory with couple of files. In recent Linux Kernels deadline or cfq IO schedulers should be better with this.

So what can you do about it ?

Do reporting on the slave Doing reporting on the slave is great idea if you have one. Even if you use Innodb tables and so do not suffer from table locks reporting queries can affect web site performance dramatically. Sometimes instead of using slave you may use database version from the backup or LVM snapshot of current database with same results. In some cases you can't move it fully to the slave - for example if you load the data or build summary tables. In such cases you may still move some of the load to the server - you may read data from slave and write it to the master (selects often contribute most of the load) or you may prepare summary tables on the slave and when move them to the master using mysqldump, or even careful file transfer if you're using MyISAM tables.

Chop it If you can't execute this work on slave server, for example it is old data purge activity you may at least chop it, meaning do not do it as all one big simple query or as set of queries going one after another. Have sleep between them so none of them can take too much resources for too long time. If you delete things do DELETE ... LIMIT 1000 and insert sleep 10; in between. By spreading load this way you make sure large portion of the cache will not be wiped out at once and also if few queries pile up because query took many resources they will have time to resolve before next portion of the query takes place.

Time it This is kind of obvious but I still mention it - if you have to do something intense do it during the lowest load. Do not just place it in the cron job for 4AM and forget about it, it may be still too bad and your web site users may be suffering when you can't notice it. Make sure the load is low enough so your complex query or batch job do not slow things to bad.
In some environments which do not run 24/7 this is great solution as there is no one to suffer at certain times.

January 16, 2007

Speaking at MySQL Users Conference 2007

Posted by peter

Just got email from Jay today about two of my talks being accepted to MySQL Users Conference 2007 "MySQL Server Settings Tuning" and "MySQL Performance Cookbook". This is great as MySQL Conferences are always fun and great way to get to know people and learn things about MySQL and connected areas.

This will be my 5th time, but only first time, not being MySQL Employee.

No word so far about other talk I proposed - "Innodb Performance Optimization" which I think is one of my best talks. Might be the name sounds old, even though I add new information to it each time to keep it fresh.

Some of our articles are translated to Russian

Posted by peter

A friend pointed out to me number of our our articles were translated and included in PHPInside.RU - electronic magazine about PHP and surrounding technologies which of course include MySQL. You can download PDF for free right here.

We love our articles being translated and or republished, as long as they are available for free same as original articles and as long as you give us a credit for being authors.

January 12, 2007

Binaries of MySQL 5.0.33 Community release for AMD64 / EM64T

Posted by Vadim

Update 3 June 2008: We have removed the builds below, since they are quite obsolete and no one has posted comments about them since more than a year ago.

Great news are MySQL finally released new Community release - MySQL 5.0.33, which however as promised comes without Binaries.
This version also does not have any community patches yet, coming of the same tree as MySQL Enterprise.

To help those who would like to use MySQL Community version but does not like to build binaries we decided to publish our build for MySQL 5.0.33 Community release.

This build was done using "Generic Linux RPM" spec file on CentOS 4.4 (RHEL compatible) x86_64

(Links removed)

We added one more RPM to stantard MySQL RPMs - MySQL-microslow-5.0.33-0.glibc23.x86_64.rpm
This package contains mysqld-microslow binary, the server built with our microslow patch, which enables microsecound in slow-log.

More info about the patch
http://www.mysqlperformanceblog.com/2006/09/06/slow-query-log-analyzes-tools/
http://bugs.mysql.com/bug.php?id=25412

Also we propose Linux (AMD64 / Intel EM64T) binaries in tar.gz archive

(Link removed)

The archive contains both mysqld and mysqld-microslow binaries.

Source tar.gz with microslow patch

(Link removed)

The patch by itself

18K patch.slow-micro.5.0.33.diff

We are waiting for your response if our binaries are helpful and if we should make binaries for other platforms / Linux distributions or if you would like to help us doing builds.


This page was found by: insert query for ima...