October 19, 2009

MySQL-Memcached or NOSQL Tokyo Tyrant – part 3

Posted by matt |

This is part 3 of our series.  In part 1 we talked about boosting performance with memcached on top of MySQL, in Part 2 we talked about running 100% outside the data with memcached, and now in Part 3 we are going to look at a possible solution to free you from the database.  The solution I am going to discuss here is Tokyo Cabinet and Tyrant.

I am not going to give you a primer  or Tutorial on Tyrant and Cabinet, there are plenty of these out there already.  Instead I want to see what sort of performance we can see compared to MySQL and Memcached, and later on other NoSQL solutions.  Tokyo actually allows you to use several types of databases that are supported, there are hash databases which are very similar to memcached, a table database which is similar to your classic database tables where you can add a where clause and search individual columns, and a ton more “database options”  beyond just those two.  Again my goal is to not make this a Tokyo Tyrant tutorial but rather show one potential role it can play.

More details can be read about here:
http://1978th.net/tokyotyrant/
http://1978th.net/tokyocabinet/

So if we can get performance similar to memcached with Tokoyo Tyrant when using disk based hash tables it would be a compelling replacement for our application here.  It should provide the interface and the same access we saw in memcached but with disk persistence. So let’s look at the numbers:

Tyrant -vs- memcached

Tyrant’s disk based hash was almost 2x faster then combining memcached and MySQL, and about 20% slower then the all memory memcached approach.  So for this particular application I would have been much better off not storing my data in MySQL and instead looking outside the database for an answer.  Now sure there are other reasons you may want to keep data in the database… but I am trying to get you to think about your application and if those reasons are really valid.  Helping clients pick the right solution is one of the things we do here at Percona.  If an application requires a database great, but if there is a better solution we want to suggest it.  It’s our goal to make your application perform optimally.

Finally, one concern you have to have is the scalability of your storage solution.  As load, number of threads, and data size increases how does performance differ or change?  One knock on Tokyo -vs- Memcached is Tokyo is not distributed by default.  Now that’s not to say we could not shard it based on a hash, or even build an api with the capability built in ( or use the memcached clients which works! )…  but native support is lacking.  It does support replication which could make some rather interesting architectures in the future.

So lets look at some scalability benchmarks, my server resources are rather limited but I thought I should try throwing more threads and work at the server until it hit its limit and fell over dead.  It’s interesting to see the number of transactions that occur with a given number of threads.  let’s look at some of these:

Tyrant/MySQL/Memcached Thread Benchmark performance

As expected the smaller buffer pool struggled ( why a smaller buffer pool?  This simulates a much larger data set.  A BP of 256M with 1GB of data, can give similar performance to 20GB of data and a 5GB BP ).  So with 256M BP and 4GB of memcached we were well off the numbers we hit with a 4GB BP+4Gb of memcached ( which is expected ).   Adding more threads even up to 128 threads increased overall throughput but my load average on the server hit 40 and my CPU was pegged.  At 128 threads I was pegging out my CPU across the board.  Also interesting is I started to hit bottlenecks in MySQL/Innodb when I had enough memory but I increased the threads from 64 to 128.  As time permits I should revisit this and look at increased datasets, and look for area’s where Tyrant may stumble a bit.

Bottom line given a specific application and data pattern sometimes a relational database is not the appropriate place for storing data.  A tool like Tokyo Tyrant may not be for everyone or every application, but neither is a relational database.  Before building your next application try and understand whether an RDBMS is really needed or not.

How did I do these tests:

The above number were run with 32 Threads, Tyrant was started with 8 threads and 128M of memory,  memached was started with 16 threads ( 1.4 memcached ), mysql was 5.1 XtraDB.  Each environment had 2 tables each with 2 million rows.  The data was identical. memcached and Tyrant stored a comma delimited string to represent the row.   Mysql was running with 256M allocated to the innodb buffer unless otherwise noted.

What’s next?  Well next I am going to try and continue this series by exploring and benchmarking other NOSQL options and comparing them to database based solutions.  I think showing the performance of a couple of different Tokyo database formats would also be interesting.  What other solutions are people interested in?  I know I have gotten a lot of requests for cassandra #’s, but what else?  Drop a comment and let me know!

October 14, 2009

Tuning for heavy writing workloads

Posted by Yasufumi |

For the my previous post, there was comment to suggest to test db_STRESS benchmark on XtraDB by Dimitri. And I tested and tuned for the benchmark. I will show you the tunings. It should be also tuning procedure for general heavy writing workloads.

At first, <tuning peak performance>. The next, <tuning purge operation> to stabilize performance  and to avoid decreasing performance.

<test condition>

Server:
PowerEdge R900, Four Quad Core E7320 Xeon, 2.13GHz, 32GB Memory, 16X2GB, 667MHz

db_STRESS:
32 sessions, RW=1, dbsize = 1000000, no thinktime

XtraDB: (mysql-5.1.39 + XtraDB-1.0.4-current)
innodb_io_capacity = 4000
innodb_support_xa = false
innodb_file_per_table = true
innodb_buffer_pool_size = 16G
innodb_read_io_threads = 8
innodb_write_io_threads = 8
innodb_flush_log_at_trx_commit = 2
innodb_log_buffer_size = 128M
innodb_log_file_size = 512M
innodb_log_files_in_group = 2
innodb_max_dirty_pages_pct = 90
innodb_flush_method = O_DIRECT
(the followings are XtraDB specific general settings)
innodb_ibuf_active_contract = 1
innodb_adaptive_flushing = false
innodb_adaptive_checkpoint = estimate

<tuning peak performance>

At first, tuning the peak performance to use CPU and IO resource more effectively. To avoid mutex/lock contentions are good to use more CPU resource of many CPUs.

purge_thread_test_1ST_TUNE

This graph shows the peak performance in tps of db_STRESS.

At current settings, “base” in the graph is the perfomance. We can confirm the mutex/lock contention roughly by the SEMAPHORES sction of SHOW INNODB STATUS output.

“xx-lock on RW-latch at 0×7f2ff40a3dc0 created in file dict/dict0dict.c line 1627″

It is index->lock, viewing the source file (and it may be HISTORY table). This is the lock for each index tree. We may be able to disperse the lock using by the partitioning of MySQL. Added the following clause to the HISTORY table definition.

“PARTITION BY HASH(REF_OBJECT) PARTITIONS 16″

Now the performance became to “+partitioned” in the graph. Looking the SEMAPHORES section again,

“has waited at handler/ha_innodb.cc line 7275 for 0.0000 seconds the semaphore:
X-lock on RW-latch at 0xd30320 created in file dict/dict0dict.c line 623″

may be the line which appears for the most times (it is dict_operation_lock). It may be partition specific lock contention. The current XtraDB has the variable to tune the contention.

innodb_stats_update_need_lock = 0 (default 1)

It skip the updating statistics which needs the lock. (it only affects for “Data_free:” value of TABLE STATS). And the performance became “+skip_stats” in the graph.  Then, the next contention at SEMAPHORES section is…

“Mutex at 0×1b3e3e78 created file trx/trx0rseg.c line 167″

may be remarkable (it is rseg->mutex). The mutex is for each rollback segments, so we can increase the rsegs to solve the contention problem. XtraDB can increase the rseg.

innodb_extra_rsegments = 64 (affects to initialization of InnoDB)

Recreated database files with the parameter. Then the performance became “+rsegs64″. At last, the next contention may be “Mutex at 0×28ce8e0 created file srv/srv0srv.c line 982″. It is kernel_mutex, currently we don’t have proper solution for that. The setting seems to be enough for now.

<tuning purge operation>

Next, looking the sequential result in more long term.

The next problem is “History list length” growing to huge size. The value is the number of entries in rollback segment. The entries are used for consistent reading of the older transactions. They can be removed when any transactions doesn’t refer the entry. This removing operation for the entries is called “purge” in InnoDB. The purge operation should be done enough on time, because the huge history list affects to performance.

Basically, the purging is done by master_thread (general background thread of InnoDB). The huge history list makes the purge operation slow, and it interferes  with the other tasks of the master_thread (e.g. flushing dirty blocks, treating insert buffer, etc…). Dimitri implemented a purge_thread to devote to the purging, and also XtraDB has similar purge_thread. Though it seems to make the throughput stabilize, it is not enough still for heavy update workloads. A single purge_thread on one CPU is not enough for updates from user threads on the all of other CPUs.

XtraDB can increase the purge_threads from the next release.

innodb_use_purge_thread = 4

seems to be enough for this workload on the server.

The first graph of followings is sequential throughput [tps] up to 3500 sec.

purge_thread_test_TPS

The next is tracking the “History list length” at the same time.

purge_thread_test_HIST_LENGTH

"Norm 1.0.4": Normal InnoDB Plugin 1.0.4 without XtraDB specific options
"xtra p_t 0": XtraDB 1.0.4-new (no purge_thread)
"xtra p_t 1": XtraDB 1.0.4-new (single purge_thread similar to Dimitri's)
"xtra p_t 4": XtraDB 1.0.4-new (4 purge threads)

The graphs show…

  • The purge thread (> 0) helps to stabilize the throughput greatly.
  • Increasing the purge threads can suppress the strong growing of the hitory list
  • The adaptive checkpoint “estimate” needs the purge_thread… (than the adaptive_flushing does)

And the last 300secs’ average tps are…

"Norm 1.0.4": 5725.47
"xtra p_t 0": 4699.33
"xtra p_t 1": 7130.3
"xtra p_t 4": 9118    (about 60%up from Normal Plugin 1.0.4)

In the end, the faster and more stable performance of db_STRESS benchmark is obtained by these tunings of XtraDB.

—————————————–

(Added 2009.10.29)

<FAQ: Is XtraDB slower than Plugin?>

I’d like to say “no” to this question. We have been adding many tuning options to XtraDB. But they are effective not for all cases, sometimes the performance may get worse because of “not proper” or “too much value”. We should choose the options correctly. XtraDB is based on InnoDB Plugin and we can set XtraDB same to InnoDB Plugin at least. The following graphs are results of XtraDB and Plugin with same options and same database.

top-left (same condition to above graphs):
innodb_flush_log_at_trx_commit = 2
innodb_doublewrite = true

top-right:
innodb_flush_log_at_trx_commit = 1
innodb_doublewrite = true

bottom-left:
innodb_flush_log_at_trx_commit = 2
innodb_doublewrite = false

bottom-right:
innodb_flush_log_at_trx_commit = 1
innodb_doublewrite = false

purge_thread_test_2_TPS

It seems that XtraDB is not slower than Plugin here at least.

We can start tuning based on these performances using XtraDB specific options!

Why do you make XtraDB slower than Plugin? :-)

September 15, 2009

Which adaptive should we use?

Posted by Yasufumi |

As you may know, InnoDB has 2 limits for unflushed modified blocks in the buffer pool. The one is from physical size of the buffer pool. And the another one is oldness of the block which is from the capacity of transaction log files.

In the case of heavy updating workload, the modified ages of the many blocks are clustered. And to reduce the maximum of the modified ages InnoDB needs to flush many of the blocks in a short time, if these are not flushed at all. Then the flushing storm affect the performance seriously.

We suggested the “adaptive_checkpoint” option of constant flushing to avoid such a flushing storm. And finally, the newest InnoDB Plugin 1.0.4 has the new similar option “adaptive_flushing” as native.

Let’s check the adaptive flushing options at this post.
[read more...]

June 19, 2009

Scaling IO Bound Workloads Webinar

Posted by peter |

I will co-present in webinar on Performance Challenges and Solutions for IO Bound Workloads in MySQL. My part of the presentation will be speaking about why switching from CPU bound workload to IO bound is such an important event, how to prepare to it as well as how to keep your application performance good as the data growths.

The Brian’s portion of webinar will focus on the Schooner offering as example of flash based appliance – one of solutions I mention in my presentation.

It should be interesting whenever you’re interested in Schooner appliance offerings, flash or scaling MySQL in General.

You can follow this url to register.

June 5, 2009

A rule of thumb for choosing column order in indexes

Posted by Baron Schwartz |

I wanted to share a little rule of thumb I sometimes use to decide which columns should come first in an index. This is not specific to MySQL, it’s generally applicable to any database server with b-tree indexes. And there are a bunch of subtleties, but I will also ignore those for the sake of simplicity.

[read more...]

February 12, 2009

How much memory can MySQL use in the worst case?

Posted by Baron Schwartz |

I vaguely recall a couple of blog posts recently asking something like “what’s the formula to compute mysqld’s worst-case maximum memory usage?” Various formulas are in wide use, but none of them is fully correct. Here’s why: you can’t write an equation for it.

[read more...]

January 19, 2009

Profiling MySQL stored routines

Posted by Aurimas Mikalauskas |

These days I’m working with a customer who has an application based entirely on stored routines on MySQL side. Even though I haven’t worked much with stored procedures, I though it’s going to be a piece of cake. In the end – it was, but there’s a catch.

[read more...]

December 22, 2008

Goal driven performance optimization

Posted by peter |

When your goal is to optimize application performance it is very important to understand what goal do you really have. If you do not have a good understanding of the goal your performance optimization effort may well still bring its results but you may waste a lot of time before you reach same results as you would reach much sooner with focused approach.

The time is critical for many performance optimization tasks not only because of labor associated expenses but also because of the suffering – slow web site means your marketing budget is wasted, customer not completing purchases, users are leaving to competitors, all of this making the time truly critical matter.
[read more...]

November 24, 2008

How Percona does a MySQL Performance Audit

Posted by Baron Schwartz |

Our customers or prospective customers often ask us how we do a performance audit (it’s our most popular service). I thought I should write a blog post that will both answer their question, so I can just reply “read all about it at this URL” and share our methodology with readers a little bit. This fits well with our philosophy of openness. It also shocks people sometimes — “you’re giving away the golden goose!” Not really. What you hire us for is our experience, not a recipe that anyone can follow.

A full performance audit is far more extensive than this article can cover, and might wander into Apache, networking config, caching layers, etc. Wherever the problem is, we’ll track it down. I won’t talk about that. That’s not because I want to keep secrets from you. To the contrary, I’d love to share it all with you. But that’s a huge job; it will take many pages, and I’m not going to write that much.

[read more...]

November 21, 2008

How to calculate a good InnoDB log file size

Posted by Baron Schwartz |

Peter wrote a post a while ago about choosing a good InnoDB log file size.  Not to pick on Peter, but the post actually kind of talks about a lot of things and then doesn’t tell you how to choose a good log file size!  So I thought I’d clarify it a little.

The basic point is that your log file needs to be big enough to let InnoDB optimize its I/O, but not so big that recovery takes a long time.  That much Peter covered really well.  But how do you choose that size? I’ll show you a rule of thumb that works pretty well.

[read more...]