October 14, 2009

Tuning for heavy writing workloads

Posted by Yasufumi |

For the my previous post, there was comment to suggest to test db_STRESS benchmark on XtraDB by Dimitri. And I tested and tuned for the benchmark. I will show you the tunings. It should be also tuning procedure for general heavy writing workloads.

At first, <tuning peak performance>. The next, <tuning purge operation> to stabilize performance  and to avoid decreasing performance.

<test condition>

Server:
PowerEdge R900, Four Quad Core E7320 Xeon, 2.13GHz, 32GB Memory, 16X2GB, 667MHz

db_STRESS:
32 sessions, RW=1, dbsize = 1000000, no thinktime

XtraDB: (mysql-5.1.39 + XtraDB-1.0.4-current)
innodb_io_capacity = 4000
innodb_support_xa = false
innodb_file_per_table = true
innodb_buffer_pool_size = 16G
innodb_read_io_threads = 8
innodb_write_io_threads = 8
innodb_flush_log_at_trx_commit = 2
innodb_log_buffer_size = 128M
innodb_log_file_size = 512M
innodb_log_files_in_group = 2
innodb_max_dirty_pages_pct = 90
innodb_flush_method = O_DIRECT
(the followings are XtraDB specific general settings)
innodb_ibuf_active_contract = 1
innodb_adaptive_flushing = false
innodb_adaptive_checkpoint = estimate

<tuning peak performance>

At first, tuning the peak performance to use CPU and IO resource more effectively. To avoid mutex/lock contentions are good to use more CPU resource of many CPUs.

purge_thread_test_1ST_TUNE

This graph shows the peak performance in tps of db_STRESS.

At current settings, “base” in the graph is the perfomance. We can confirm the mutex/lock contention roughly by the SEMAPHORES sction of SHOW INNODB STATUS output.

“xx-lock on RW-latch at 0×7f2ff40a3dc0 created in file dict/dict0dict.c line 1627″

It is index->lock, viewing the source file (and it may be HISTORY table). This is the lock for each index tree. We may be able to disperse the lock using by the partitioning of MySQL. Added the following clause to the HISTORY table definition.

“PARTITION BY HASH(REF_OBJECT) PARTITIONS 16″

Now the performance became to “+partitioned” in the graph. Looking the SEMAPHORES section again,

“has waited at handler/ha_innodb.cc line 7275 for 0.0000 seconds the semaphore:
X-lock on RW-latch at 0xd30320 created in file dict/dict0dict.c line 623″

may be the line which appears for the most times (it is dict_operation_lock). It may be partition specific lock contention. The current XtraDB has the variable to tune the contention.

innodb_stats_update_need_lock = 0 (default 1)

It skip the updating statistics which needs the lock. (it only affects for “Data_free:” value of TABLE STATS). And the performance became “+skip_stats” in the graph.  Then, the next contention at SEMAPHORES section is…

“Mutex at 0×1b3e3e78 created file trx/trx0rseg.c line 167″

may be remarkable (it is rseg->mutex). The mutex is for each rollback segments, so we can increase the rsegs to solve the contention problem. XtraDB can increase the rseg.

innodb_extra_rsegments = 64 (affects to initialization of InnoDB)

Recreated database files with the parameter. Then the performance became “+rsegs64″. At last, the next contention may be “Mutex at 0×28ce8e0 created file srv/srv0srv.c line 982″. It is kernel_mutex, currently we don’t have proper solution for that. The setting seems to be enough for now.

<tuning purge operation>

Next, looking the sequential result in more long term.

The next problem is “History list length” growing to huge size. The value is the number of entries in rollback segment. The entries are used for consistent reading of the older transactions. They can be removed when any transactions doesn’t refer the entry. This removing operation for the entries is called “purge” in InnoDB. The purge operation should be done enough on time, because the huge history list affects to performance.

Basically, the purging is done by master_thread (general background thread of InnoDB). The huge history list makes the purge operation slow, and it interferes  with the other tasks of the master_thread (e.g. flushing dirty blocks, treating insert buffer, etc…). Dimitri implemented a purge_thread to devote to the purging, and also XtraDB has similar purge_thread. Though it seems to make the throughput stabilize, it is not enough still for heavy update workloads. A single purge_thread on one CPU is not enough for updates from user threads on the all of other CPUs.

XtraDB can increase the purge_threads from the next release.

innodb_use_purge_thread = 4

seems to be enough for this workload on the server.

The first graph of followings is sequential throughput [tps] up to 3500 sec.

purge_thread_test_TPS

The next is tracking the “History list length” at the same time.

purge_thread_test_HIST_LENGTH

"Norm 1.0.4": Normal InnoDB Plugin 1.0.4 without XtraDB specific options
"xtra p_t 0": XtraDB 1.0.4-new (no purge_thread)
"xtra p_t 1": XtraDB 1.0.4-new (single purge_thread similar to Dimitri's)
"xtra p_t 4": XtraDB 1.0.4-new (4 purge threads)

The graphs show…

  • The purge thread (> 0) helps to stabilize the throughput greatly.
  • Increasing the purge threads can suppress the strong growing of the hitory list
  • The adaptive checkpoint “estimate” needs the purge_thread… (than the adaptive_flushing does)

And the last 300secs’ average tps are…

"Norm 1.0.4": 5725.47
"xtra p_t 0": 4699.33
"xtra p_t 1": 7130.3
"xtra p_t 4": 9118    (about 60%up from Normal Plugin 1.0.4)

In the end, the faster and more stable performance of db_STRESS benchmark is obtained by these tunings of XtraDB.

—————————————–

(Added 2009.10.29)

<FAQ: Is XtraDB slower than Plugin?>

I’d like to say “no” to this question. We have been adding many tuning options to XtraDB. But they are effective not for all cases, sometimes the performance may get worse because of “not proper” or “too much value”. We should choose the options correctly. XtraDB is based on InnoDB Plugin and we can set XtraDB same to InnoDB Plugin at least. The following graphs are results of XtraDB and Plugin with same options and same database.

top-left (same condition to above graphs):
innodb_flush_log_at_trx_commit = 2
innodb_doublewrite = true

top-right:
innodb_flush_log_at_trx_commit = 1
innodb_doublewrite = true

bottom-left:
innodb_flush_log_at_trx_commit = 2
innodb_doublewrite = false

bottom-right:
innodb_flush_log_at_trx_commit = 1
innodb_doublewrite = false

purge_thread_test_2_TPS

It seems that XtraDB is not slower than Plugin here at least.

We can start tuning based on these performances using XtraDB specific options!

Why do you make XtraDB slower than Plugin? :-)

June 29, 2009

Few more ideas for InnoDB features

Posted by Vadim |

As you see MySQL is doing great in InnoDB performance improvements, so we decided to concentrate more on additional InnoDB features, which will make difference.

Beside ideas I put before http://www.mysqlperformanceblog.com/2009/03/30/my-hot-list-for-next-innodb-features/ (and one of them – moving InnoDB tables between servers are currently under development), we have few mores:

- Stick some InnoDB tables / indexes in buffer pool, or set priority for InnoDB tables. That means tables with bigger priority will be have more chances to stay in buffer pool then tables with lower priority. Link to blueprint https://blueprints.launchpad.net/percona-patches/+spec/lru-priority-patch

- Separate LRU list into several lists, and in this way it will allow us to emulate several buffer pool, with features to keep different tables in different buffer pools and also to decrease contention on buffer pool. Link https://blueprints.launchpad.net/percona-patches/+spec/multiple-lru-patch

- We are looking to include Waffle Grid into XtraDB releases with some additional features like caching buffer pool on SSD.

If ideas are interesting for you and you want to support them, contact us

May 13, 2009

Global Transaction ID and other patches available!

Posted by Vadim |

I do not know if you noticed it, but Google (Mark Callaghan, Justin Tolmer and their internal mysql-team) made a great contribution to MySQL. Patches global transaction IDs, binlog event checksums and crash-safe replication state are separated and published on Launchpad (https://code.launchpad.net/~jtolmer/mysql-server/global-trx-ids).

For me it was a big wall in using these patches that they were part of one big patch, which you can apply only to 5.0.37, and now there is no barrier to include patches into our builds or MySQL releases.

If you do not know what is Global Transactional ID is – it is worth to look http://code.google.com/p/google-mysql-tools/wiki/GlobalTransactionIds. From my point of view – it is absolutely new view on MySQL replication and it can change MySQL replication architecture.

We definitely will look if we can integrate patches into percona builds and provide binaries if there is no problems. Also I mostly sure the patches will be included into MariaDB along with other Percona-improvements and XtraDB storage engine.

March 10, 2009

Percona at PHP Quebec 09

Posted by Morgan Tocker |

Percona presented two talks at PHP Quebec last week – one on A Tour of MySQL High Availability, and another on Performance Tuning MySQL. There was a great reaction to showcasing some of the quick-wins that can be found by using the Percona patches. Unfortunately, the one thing that I forgot to mention in the slides is that the patches are Open Source and free to use.

March 4, 2009

Making replication a bit more reliable

Posted by Vadim |

Running MySQL slave is quite common and regular task which we do every day, taking backups from slave is often recommended solution. However the current state of MySQL replication makes restoring slave a bit tricky (if possible at all). The main problem is that InnoDB transaction state and replication state are not synchronized. If we speak about backup and you can execute SHOW SLAVE STATUS command you can get reliable information about current state, but some solutions does not allow that. Look for example Sun Storage 7410, which provides storage via NFS and where you can make ZFS snapshots without any info what kind of data you are storing there. What makes situation worse is that files with replication state (relay-log.info, master.info) are not synchronized on disk after each update, and even wrose – in case with NFS they are stored on client side OS/NFS cache for long time. As solution we can do patch to execute fsync() for these files after each write, but I can’t predict how much performance penalty we will see here, I expect it will be very significant.
[read more...]

February 11, 2009

Limiting InnoDB Data Dictionary

Posted by Vadim |

One of InnoDB’s features is that memory allocated for internal tables definitions is not limited and may grow indefinitely. You may not notice it if you have an usual application with say 100-1000 tables. But for hosting providers and for user oriented applications ( each user has dedicated database / table) it is disaster. For 100.000+ tables InnoDB is consuming gigabytes of memory, keeping definition in memory all time after table was once opened. Only way to cleanup memory is to drop table or restart mysqld – I can’t say this is good solution, so we made patch which allows to restrict memory dedicated for data dictionary.

Patch was made by request of our customer Vertical Response and released under GPL, so you can download it there http://mysqlperformanceblog.com/files/patches/innodb_dict_size_limit_standalone.patch. Currently patch is on testing stage, but later will be included into our releases. To limit memory we introduce new variable innodb_dict_size_limit (in bytes).

Some internals: There is already implemented in InnoDB LRU-based algorithm to keep only recent table entries, but it was not used by reason that InnoDB has to know if table is used or not on MySQL level. We made it by checking MySQL table_cache. If table is placed in table_cache we consider it as used, if not – we can delete it from InnoDB data dictionary. So there is the trick – if you have big enough table_cache, memory consumed by data dictionary may exceed innodb_dict_size_limit, as we can’t delete any table entry from it.

To finalize this post small marketing message – if you faced bug or problem which exists for long time and is not going to be solved by MySQL / InnoDB – contact us regarding Custom MySQL Development.

February 2, 2009

Pretending to fix broken group commit

Posted by Vadim |

The problem with broken group commit was discusses many times, bug report was reported 3.5 years ago and still not fixed in MySQL 5.0/5.1 (and most likely will not be in MySQL 5.1). Although the rough truth is this bug is very hard (if possible) to fix properly. In short words if you enable replication (log-bin) on server without BBU (battery backup unit) your InnoDB write performance in concurrent load drops down significantly.
We wrote also about it before, see “Group commit and real fsync” and “Group commit and XA“.
[read more...]

January 23, 2009

Another scalability fix in XtraDB

Posted by Vadim |

Recent scalability fixes in InnoDB and also Google's and your SMP fixes almost made InnoDB results acceptable in primary key lookups queries, but secondary indexes were forgotten for some time. Now having Dell PowerEdge R900 on board (16CPU cores, 16GB RAM) I have some time for experiments, and I played with queries

CODE:
  1. SELECT name  FROM sbtest WHERE country_id = ? LIMIT 5

[read more...]

5.0.75-build12 Percona binaries

Posted by Vadim |

After several important fixes to our patches we made binaries for build12.

Fixes include:

Control of InnoDB insert buffer to address problems Peter mentioned http://www.mysqlperformanceblog.com/2009/01/13/some-little-known-facts-about-innodb-insert-buffer/, also check Bug 41811 to see symptoms of problem with Insert buffer.

http://www.percona.com/docs/wiki/patches:innodb_io_patches

* innodb_flush_neighbor_pages (default 1) - When the dirty page are flushed (written to datafile), this parameter determines whether the neighbor pages in the datafile are also flushed at the same time or not. If you use the storage which don't have “head seek delay” (e.g. SSD or enough Write-Buffered), 0 may show better performance. 0:disable, 1:enable

* innodb_ibuf_max_size (default [the half of innodb_buffer_pool_size](bytes)) - This parameter is startup parameter. If the lower value is set than the half of innodb_buffer_pool_size, it is used as maximum size of insert buffer. To restrict to the too small value (e.g. 0) is not recommended for performance. If you don't like the insert buffer growing bigger, you should use the following parameters instead. (* If you use very fast storage, small value (like several MB) may show better performance.)

* innodb_ibuf_accel_rate (default 100(%)) - This parameter is additional tuning the amount of insert buffer processing by background thread. Sometimes, only innodb_io_capacity is insufficient to tune the insert buffer.

* innodb_ibuf_active_contract (default 0) - By default (same to normal InnoDB), the each user threads do nothing about contracting the insert buffer until the insert buffer reaches its maximum size. 1 makes the each user threads positive to contract the insert buffer as possible in asynchronous.

Second important fix introduces variable use_global_long_query_time, which allows all current threads see change of long_query_time. By default value set in SET GLOBAL long_query_time=N command is visible only on new established connection, which is problem if you have pre-established connection pool, say in Java or Ruby on Rails application. With use_global_long_query_time=true even all current threads will respect SET GLOBAL long_query_time=N. The feature made for EngineYard, hosting provider for Ruby on Rails application.

You can download binaries (RPMS x86_64) and sources with patches here
http://www.percona.com/mysql/5.0.75-b12/

January 9, 2009

How Percona Develops Open-Source Software

Posted by Baron Schwartz |

Percona has been building and contributing to open-source software since the company was founded, and individually we've been doing the same thing for many years.  We think it's a huge value for our customers and the community.

We're involved in a dozen or so open-source projects, but our three core efforts at the moment are the following:

  • Percona patches, which are included in our own MySQL builds and then in OurDelta builds and perhaps others as well
  • XtraDB, which is our new high-performance transactional storage engine
  • Maatkit, which is a toolkit that provides advanced functionality for MySQL.

We have a team of dedicated MySQL developers working on the server and on ExtraDB, and a dedicated Maatkit developer. Other Percona employees also put significant time into these projects.

Outside observers have commented that our development process doesn't seem very open-source.  That is, we typically just go build the software and then announce it. We don't involve the community very much in our decisions about what features to include, or how they should get built; and we don't encourage community contributions directly into our codebase.  There's also some ambiguity about where the money comes from and where it goes.  These are all fair points to bring up.  Peter, Vadim and I thought we should address them and let everyone know how we really work on these things and what our vision for the future is.

[read more...]