November 18, 2009

How innodb_open_files affects performance

Posted by peter |

Recently I looked at table_cache sizing which showed larger table cache does not always provides the best performance. So I decided to look at yet another similar variable – innodb_open_files which defines how many files Innodb will keep open while working in innodb_file_per_table mode.

Unlike MyISAM Innodb does not have to keep open file descriptor when table is open – open table is purely logical state and appropriate .ibd file may be open or closed. Furthermore besides MySQL table_cache Innodb maintains its own (called data dictionary) which keeps all tables ever accessed since table start – there is no variable to control its size and it can take significant amount of memory in some edge cases. Percona patches though provide innodb_dict_size_limit to restrict growth of data dictionary.
[read more...]

November 5, 2009

InnoDB: look after fragmentation

Posted by Vadim |

One problem made me puzzled for couple hours, but it was really interesting to figure out what's going on.

So let me introduce problem at first. The table is

CODE:
  1. CREATE TABLE `c` (
  2.   `tracker_id` int(10) unsigned NOT NULL,
  3.   `username` char(20) character set latin1 collate latin1_bin NOT NULL,
  4.   `time_id` date NOT NULL,
  5.   `block_id` int(10) unsigned default NULL,
  6.   PRIMARY KEY  (`tracker_id`,`username`,`time_id`),
  7.   KEY `block_id` (`block_id`)
  8. ) ENGINE=InnoDB

Table has 11864696 rows and takes Data_length: 698,351,616 bytes on disk

The problem is that after restoring table from mysqldump, the query that scans data by primary key was slow. How slow ? Let me show.
[read more...]

November 1, 2009

Speaking at the LA MySQL Meetup – 18th November

Posted by Morgan Tocker |
Morgan speaking at Highload.ru
A recent photo from Highload.ru

I said in my last post, that we're interested in speaking at MySQL meetups, and I'm happy to say that the Los Angeles MySQL Meetup has taken us up on the offer.

On November 18th, I'll be giving an introductory talk on InnoDB/XtraDB Performance Optimization.  I will be the second speaker, with Carl Gelbart first speaking on Infobright.

What brings me to LA?  On the same day (18th Nov) I'll be teaching a one day class on Performance Optimization for MySQL with InnoDB and XtraDB.  If you haven't signed up yet - spaces are still available.

October 27, 2009

State of the art: Galera – synchronous replication for InnoDB

Posted by Vadim |

First time I heard about Galera on Percona Performance Conference 2009, Seppo Jaakola was presenting "Galera: Multi-Master Synchronous MySQL Replication Clusters". It was impressed as I personally always wanted it for InnoDB, but we had it in plans at the bottom of the list, as this is very hard to implement properly.
The idea by itself is not new, I remember synchronous replication was announced for SolidDB on MySQL UC 2007, but later the product was killed by IBM.

So long time after PPC 2009 there was available version mysql-galera-0.6, which had serious flow, to setup a new node you had to take down whole cluster. And all this time Codership ( company that develops Galera) was working on 0.7 release that introduces node propagation keeping cluster online. You can play with 0.7pre release by yourself MySQL/Galera Release 0.7pre.
[read more...]

October 14, 2009

Tuning for heavy writing workloads

Posted by Yasufumi |

For the my previous post, there was comment to suggest to test db_STRESS benchmark on XtraDB by Dimitri. And I tested and tuned for the benchmark. I will show you the tunings. It should be also tuning procedure for general heavy writing workloads.

At first, <tuning peak performance>. The next, <tuning purge operation> to stabilize performance  and to avoid decreasing performance.

<test condition>

Server:
PowerEdge R900, Four Quad Core E7320 Xeon, 2.13GHz, 32GB Memory, 16X2GB, 667MHz

db_STRESS:
32 sessions, RW=1, dbsize = 1000000, no thinktime

XtraDB: (mysql-5.1.39 + XtraDB-1.0.4-current)
innodb_io_capacity = 4000
innodb_support_xa = false
innodb_file_per_table = true
innodb_buffer_pool_size = 16G
innodb_read_io_threads = 8
innodb_write_io_threads = 8
innodb_flush_log_at_trx_commit = 2
innodb_log_buffer_size = 128M
innodb_log_file_size = 512M
innodb_log_files_in_group = 2
innodb_max_dirty_pages_pct = 90
innodb_flush_method = O_DIRECT
(the followings are XtraDB specific general settings)
innodb_ibuf_active_contract = 1
innodb_adaptive_flushing = false
innodb_adaptive_checkpoint = estimate

<tuning peak performance>

At first, tuning the peak performance to use CPU and IO resource more effectively. To avoid mutex/lock contentions are good to use more CPU resource of many CPUs.

purge_thread_test_1ST_TUNE

This graph shows the peak performance in tps of db_STRESS.

At current settings, "base" in the graph is the perfomance. We can confirm the mutex/lock contention roughly by the SEMAPHORES sction of SHOW INNODB STATUS output.

"xx-lock on RW-latch at 0x7f2ff40a3dc0 created in file dict/dict0dict.c line 1627"

It is index->lock, viewing the source file (and it may be HISTORY table). This is the lock for each index tree. We may be able to disperse the lock using by the partitioning of MySQL. Added the following clause to the HISTORY table definition.

"PARTITION BY HASH(REF_OBJECT) PARTITIONS 16"

Now the performance became to "+partitioned" in the graph. Looking the SEMAPHORES section again,

"has waited at handler/ha_innodb.cc line 7275 for 0.0000 seconds the semaphore:
X-lock on RW-latch at 0xd30320 created in file dict/dict0dict.c line 623"

may be the line which appears for the most times (it is dict_operation_lock). It may be partition specific lock contention. The current XtraDB has the variable to tune the contention.

innodb_stats_update_need_lock = 0 (default 1)

It skip the updating statistics which needs the lock. (it only affects for "Data_free:" value of TABLE STATS). And the performance became "+skip_stats" in the graph.  Then, the next contention at SEMAPHORES section is...

"Mutex at 0x1b3e3e78 created file trx/trx0rseg.c line 167"

may be remarkable (it is rseg->mutex). The mutex is for each rollback segments, so we can increase the rsegs to solve the contention problem. XtraDB can increase the rseg.

innodb_extra_rsegments = 64 (affects to initialization of InnoDB)

Recreated database files with the parameter. Then the performance became "+rsegs64". At last, the next contention may be "Mutex at 0x28ce8e0 created file srv/srv0srv.c line 982". It is kernel_mutex, currently we don't have proper solution for that. The setting seems to be enough for now.

<tuning purge operation>

Next, looking the sequential result in more long term.

The next problem is "History list length" growing to huge size. The value is the number of entries in rollback segment. The entries are used for consistent reading of the older transactions. They can be removed when any transactions doesn't refer the entry. This removing operation for the entries is called "purge" in InnoDB. The purge operation should be done enough on time, because the huge history list affects to performance.

Basically, the purging is done by master_thread (general background thread of InnoDB). The huge history list makes the purge operation slow, and it interferes  with the other tasks of the master_thread (e.g. flushing dirty blocks, treating insert buffer, etc...). Dimitri implemented a purge_thread to devote to the purging, and also XtraDB has similar purge_thread. Though it seems to make the throughput stabilize, it is not enough still for heavy update workloads. A single purge_thread on one CPU is not enough for updates from user threads on the all of other CPUs.

XtraDB can increase the purge_threads from the next release.

innodb_use_purge_thread = 4

seems to be enough for this workload on the server.

The first graph of followings is sequential throughput [tps] up to 3500 sec.

purge_thread_test_TPS

The next is tracking the "History list length" at the same time.

purge_thread_test_HIST_LENGTH

"Norm 1.0.4": Normal InnoDB Plugin 1.0.4 without XtraDB specific options
"xtra p_t 0": XtraDB 1.0.4-new (no purge_thread)
"xtra p_t 1": XtraDB 1.0.4-new (single purge_thread similar to Dimitri's)
"xtra p_t 4": XtraDB 1.0.4-new (4 purge threads)

The graphs show...

  • The purge thread (> 0) helps to stabilize the throughput greatly.
  • Increasing the purge threads can suppress the strong growing of the hitory list
  • The adaptive checkpoint "estimate" needs the purge_thread... (than the adaptive_flushing does)

And the last 300secs' average tps are...

"Norm 1.0.4": 5725.47
"xtra p_t 0": 4699.33
"xtra p_t 1": 7130.3
"xtra p_t 4": 9118    (about 60%up from Normal Plugin 1.0.4)

In the end, the faster and more stable performance of db_STRESS benchmark is obtained by these tunings of XtraDB.

-----------------------------------------

(Added 2009.10.29)

<FAQ: Is XtraDB slower than Plugin?>

I'd like to say "no" to this question. We have been adding many tuning options to XtraDB. But they are effective not for all cases, sometimes the performance may get worse because of "not proper" or "too much value". We should choose the options correctly. XtraDB is based on InnoDB Plugin and we can set XtraDB same to InnoDB Plugin at least. The following graphs are results of XtraDB and Plugin with same options and same database.

top-left (same condition to above graphs):
innodb_flush_log_at_trx_commit = 2
innodb_doublewrite = true

top-right:
innodb_flush_log_at_trx_commit = 1
innodb_doublewrite = true

bottom-left:
innodb_flush_log_at_trx_commit = 2
innodb_doublewrite = false

bottom-right:
innodb_flush_log_at_trx_commit = 1
innodb_doublewrite = false

purge_thread_test_2_TPS

It seems that XtraDB is not slower than Plugin here at least.

We can start tuning based on these performances using XtraDB specific options!

Why do you make XtraDB slower than Plugin? :-)

September 28, 2009

How number of columns affects performance ?

Posted by peter |

It is pretty understood the tables which have long rows tend to be slower than tables with short rows. I was interested to check if the row length is the only thing what matters or if number of columns we have to work with also have an important role. I was interested in peak row processing speed so I looked at full table scan in case data fits in OS cache completely. I created 3 tables - First containing single tinyint column which is almost shortest type possible (CHAR(0) could be taking less space), table with 1 tinyint column and char(99) column and table with 100 tinyint columns. The former two tables have the same row length but have number of column different 50 times. Finally I have created 4th table which is also 100 columns but one of them is VARCHAR causes raw format to be dynamic.

[read more...]

Why InnoDB index cardinality varies strangely

Posted by Baron Schwartz |

This is a very old draft, from early 2007 in fact. At that time I started to look into something interesting with the index cardinality statistics reported by InnoDB tables. The cardinality varies because it's derived from estimates, and I know a decent amount about that. The interesting thing I wanted to look into was why the cardinality varies in a particular pattern.

Here I'll grab a bunch of cardinality estimates from sakila.film on MySQL 5.0.45 and put them into a file:

CODE:
  1. baron@kanga:~$ while true; do mysql sakila -N -e 'show index from film' | head -n 2 | tail -n 1 | awk '{print $7}'; done> sizes

After a while I cancel it and then sort and aggregate them with counts:

CODE:
  1. baron@kanga:~$ sort sizes | uniq -c
  2. 157 1022
  3. 156 1024
  4. 156 1058
  5. 156 1059
  6. 156 1131
  7. 313 951
  8. 312 952
  9. 312 953

Look at the distribution of the counts. The weighted average of these is 1000.53, so it's close to the truth (1000 rows). But five of the eight distinct estimates are shown about one-half as often as the others; it looks like the random choice of which statistic to use is not evenly distributed.

I mentioned this to Heikki and he pondered it for a bit -- but neither of us really figured out what was going on. I know the code superficially, but not as well as he or Yasufumi or others do; and I was not able to find a cause.

More recently I saw that I'm not the only one who notices oddities in the random number generation. I waited. And indeed the fixes for that bug seemed to have fixed the skew in the statistics. Case solved, and all I had to do was wait. Truly, laziness is a virtue.

September 15, 2009

Which adaptive should we use?

Posted by Yasufumi |

As you may know, InnoDB has 2 limits for unflushed modified blocks in the buffer pool. The one is from physical size of the buffer pool. And the another one is oldness of the block which is from the capacity of transaction log files.

In the case of heavy updating workload, the modified ages of the many blocks are clustered. And to reduce the maximum of the modified ages InnoDB needs to flush many of the blocks in a short time, if these are not flushed at all. Then the flushing storm affect the performance seriously.

We suggested the "adaptive_checkpoint" option of constant flushing to avoid such a flushing storm. And finally, the newest InnoDB Plugin 1.0.4 has the new similar option "adaptive_flushing" as native.

Let's check the adaptive flushing options at this post.
[read more...]

September 14, 2009

Queries Active vs Transactions Active

Posted by peter |

What is wrong here (the part of SHOW INNODB STATUS):

--------------
ROW OPERATIONS
--------------
8 queries inside InnoDB, 9 queries in queue
100 read views open inside InnoDB

It is relationship between queries active - queries inside innodb+queries in the queue totalling 17 with "read views open inside InnoDB" which is a fancy way of saying "active transactions" which is 100.
[read more...]

Statistics of InnoDB tables and indexes available in xtrabackup

Posted by Vadim |

If you ever wondered how big is that or another index in InnoDB ... you had to calculate it yourself by multiplying size of row (which I should add is harder in the case of a VARCHAR - since you need to estimate average length) on count of records. And it still would be quite inaccurate as secondary indexes tend to take more space. So we added more detailed index statistics into our xtrabackup utility. The thanks for this feature goes to a well known Social Network who sponsored the development.

[read more...]