October 30, 2009
Giving a talk in Palo Alto, November 3rd
I’m going to give a talk on Goal Driven Performance Optimization next Tuesday. This is one of my favorite talks as it goes beyond MySQL to the principles you can apply to performance optimizations of the complex systems, especially when you have to do a lot in limited time or budget and so you can’t just fix everything what can be fixed.
Please RSVP if you’re planning to attend as space is limited.
Thanks to Sam Ghods and Box.Net for organizing the event.
October 27, 2009
State of the art: Galera – synchronous replication for InnoDB
First time I heard about Galera on Percona Performance Conference 2009, Seppo Jaakola was presenting “Galera: Multi-Master Synchronous MySQL Replication Clusters”. It was impressed as I personally always wanted it for InnoDB, but we had it in plans at the bottom of the list, as this is very hard to implement properly.
The idea by itself is not new, I remember synchronous replication was announced for SolidDB on MySQL UC 2007, but later the product was killed by IBM.
So long time after PPC 2009 there was available version mysql-galera-0.6, which had serious flow, to setup a new node you had to take down whole cluster. And all this time Codership ( company that develops Galera) was working on 0.7 release that introduces node propagation keeping cluster online. You can play with 0.7pre release by yourself MySQL/Galera Release 0.7pre.
[read more...]
October 26, 2009
Air traffic queries in LucidDB
After my first post Analyzing air traffic performance with InfoBright and MonetDB where I was not able to finish task with LucidDB, John Sichi contacted me with help to setup. You can see instruction how to load data on LucidDB Wiki page
You can find the description of benchmark in original post, there I will show number I have for LucidDB vs previous systems.
October 25, 2009
XtraDB Amazon Image
For those who use Amazon EC2 service and were anxious about having XtraDB ready to launch there is a good news.
We created a public AMI (Amazon Machine Image) with XtraDB release 8 installed on CentOS 5.3.
How to use it.
First make sure it is avaiable.
-
$ ec2-describe-images ami-4701e22e
-
IMAGE ami-4701e22e xtradb/centos-5.3-x86_64.fs.manifest.xml 834362721059 available public x86_64 machine
-
$
Run it. It is built for x86_64 plaform, so allowed types are m1.large, m1.xlarge and c1.xlarge
-
$ ec2-run-instances ami-4701e22e -t m1.large
-
RESERVATION r-46b3432e 834362721059 default
-
INSTANCE i-ecc74084 ami-4701e22e pending 0 m1.large 2009-10-25T18:31:06+0000 us-east-1c
Wait till the instance starts
-
$ ec2-describe-instances i-ecc74084
-
RESERVATION r-46b3432e 834362721059 default
-
INSTANCE i-ecc74084 ami-4701e22e ec2-75-101-203-143.compute-1.amazonaws.com domU-12-31-39-0A-26-22.compute-1.internal running 0
-
m1.large 2009-10-25T18:31:06+0000 us-east-1c
Now it is up and ready.
October 24, 2009
xtrabackup-0.9.5rc
Dear Community,
As of today release 0.9.5rc is available.
In this release there are following changes:
Changelog:
- Option --no-lock is added to innobackupex-1.5.1. Use it only while ALL your
tables are InnoDB and you DO NOT CARE about binary log
position of backup - XtraBackup is ported for InnoDB Plugin 1.0.4. Barracuda file format as well as compressed tables are supported. We thank a well known Social Network site for the sponsorship.
- Windows conscious change more
- Impoved error messages in innobackupex
- Windows conscious experimental change
- Suppress purge when --stats
- Build number in RPM name. For instance, in the name xtrabackup-0.9.5rc-50.rhel5.x86_64.rpm 50 is the build number.
- Suppress master_thread ibuf operations for --stats
- Suppress ibuf operations for --stats
- fFx fatal bug at --backup when added --stats
- New option --stats to gather index stats
- Fixed some bugs for a 32bit platform
Fixed bugs:
- Bug #435634 in Percona-XtraBackup: "xtrabackup can't write to a file on windows"
- Bug #437788 in Percona-XtraBackup: "xtrabackup.exe can't write to a file"
- Bug #386535 in Percona-XtraBackup: "ibbackup_binlog_marker bug still persists"
- Bug #402884 in Percona-XtraBackup: "tries to create temporary tables in relative ./tmp directory"
- Bug #417178 in Percona-XtraBackup: "can't backup tables with $ in name"
The binary packages for RHEL4,5, Debian, FreeBSD, MacOS as well as source code of the XtraBackup is available on http://www.percona.com/mysql/xtrabackup/0.9.5rc/.
The Debian package is also available via APT. Just add these lines in /etc/apt/sources.list
deb http://repo.percona.com/apt lenny main deb-src http://repo.percona.com/apt lenny main
Update the local database
# apt-get update
And install xtrabackup
# apt-get install xtrabackup
The project lives on Launchpad : https://launchpad.net/percona-xtrabackup and you can report bug to Launchpad bug system:
https://launchpad.net/percona-xtrabackup/+filebug. The documentation is available on our Wiki.
For general questions use our Pecona-discussions group, and for development question Percona-dev group.
For support, commercial and sponsorship inquiries contact Percona.
October 23, 2009
Percona Performance Conference 2010
It's time to announce our plans for Percona Performance Conference 2010. We will hold the event on April 12th and 13th in Santa Clara, California. In the months to come we will announce the theme, the exact location (we have it reserved, but it's subject to change), the exact dates (we may add days), and we'll open up registration and a Call For Proposals. Stay tuned!
October 20, 2009
Captcha is replaced
After many complains we removed captcha in comments and replaced to Anti-Captcha ( http://wordpress.org/extend/plugins/anti-captcha/). Now you need enabled JavaScript in your browser to post comments.
October 19, 2009
MySQL-Memcached or NOSQL Tokyo Tyrant – part 3
This is part 3 of our series. In part 1 we talked about boosting performance with memcached on top of MySQL, in Part 2 we talked about running 100% outside the data with memcached, and now in Part 3 we are going to look at a possible solution to free you from the database. The solution I am going to discuss here is Tokyo Cabinet and Tyrant.
I am not going to give you a primer or Tutorial on Tyrant and Cabinet, there are plenty of these out there already. Instead I want to see what sort of performance we can see compared to MySQL and Memcached, and later on other NoSQL solutions. Tokyo actually allows you to use several types of databases that are supported, there are hash databases which are very similar to memcached, a table database which is similar to your classic database tables where you can add a where clause and search individual columns, and a ton more "database options" beyond just those two. Again my goal is to not make this a Tokyo Tyrant tutorial but rather show one potential role it can play.
More details can be read about here:
http://1978th.net/tokyotyrant/
http://1978th.net/tokyocabinet/
So if we can get performance similar to memcached with Tokoyo Tyrant when using disk based hash tables it would be a compelling replacement for our application here. It should provide the interface and the same access we saw in memcached but with disk persistence. So let's look at the numbers:

Tyrant's disk based hash was almost 2x faster then combining memcached and MySQL, and about 20% slower then the all memory memcached approach. So for this particular application I would have been much better off not storing my data in MySQL and instead looking outside the database for an answer. Now sure there are other reasons you may want to keep data in the database... but I am trying to get you to think about your application and if those reasons are really valid. Helping clients pick the right solution is one of the things we do here at Percona. If an application requires a database great, but if there is a better solution we want to suggest it. It's our goal to make your application perform optimally.
Finally, one concern you have to have is the scalability of your storage solution. As load, number of threads, and data size increases how does performance differ or change? One knock on Tokyo -vs- Memcached is Tokyo is not distributed by default. Now that's not to say we could not shard it based on a hash, or even build an api with the capability built in ( or use the memcached clients which works! )... but native support is lacking. It does support replication which could make some rather interesting architectures in the future.
So lets look at some scalability benchmarks, my server resources are rather limited but I thought I should try throwing more threads and work at the server until it hit its limit and fell over dead. It's interesting to see the number of transactions that occur with a given number of threads. let's look at some of these:

As expected the smaller buffer pool struggled ( why a smaller buffer pool? This simulates a much larger data set. A BP of 256M with 1GB of data, can give similar performance to 20GB of data and a 5GB BP ). So with 256M BP and 4GB of memcached we were well off the numbers we hit with a 4GB BP+4Gb of memcached ( which is expected ). Adding more threads even up to 128 threads increased overall throughput but my load average on the server hit 40 and my CPU was pegged. At 128 threads I was pegging out my CPU across the board. Also interesting is I started to hit bottlenecks in MySQL/Innodb when I had enough memory but I increased the threads from 64 to 128. As time permits I should revisit this and look at increased datasets, and look for area's where Tyrant may stumble a bit.
Bottom line given a specific application and data pattern sometimes a relational database is not the appropriate place for storing data. A tool like Tokyo Tyrant may not be for everyone or every application, but neither is a relational database. Before building your next application try and understand whether an RDBMS is really needed or not.
How did I do these tests:
The above number were run with 32 Threads, Tyrant was started with 8 threads and 128M of memory, memached was started with 16 threads ( 1.4 memcached ), mysql was 5.1 XtraDB. Each environment had 2 tables each with 2 million rows. The data was identical. memcached and Tyrant stored a comma delimited string to represent the row. Mysql was running with 256M allocated to the innodb buffer unless otherwise noted.
What's next? Well next I am going to try and continue this series by exploring and benchmarking other NOSQL options and comparing them to database based solutions. I think showing the performance of a couple of different Tokyo database formats would also be interesting. What other solutions are people interested in? I know I have gotten a lot of requests for cassandra #'s, but what else? Drop a comment and let me know!
October 16, 2009
Finding your MySQL High-Availability solution – The questions
After having reviewed the definition my the previous post (The definitions), the next step is to respond to some questions.
Do you need MySQL High-Availability?
That question is quite obvious but some times, it is skipped. It can also be formulated "What is the downtime cost of the service?". In the cost, you need to include lost revenue from the service and you also need to consider less direct impact like loss of corporate image and other marketing costs. If your downtime cost is under $10/h, you can stop reading this document, you don't need HA. For the others, let's move on!
How to determine which MySQL High-Availability solution is best?
What is really tricky with MySQL is the number of possible HA solutions. From the simplest the most complex let's list the most common ones:
- MySQL replication with manual failover
- Master-Master with MMM manager
- Heartbeat/SAN
- Heartbeat/DRBD
- NDB Cluster
These technologies are by far, not a one size fits all and many deployments use combination of solutions. I will not cover ScaleDB and Continuent because I know almost nothing of these solutions. There are many more questions you need to ask yourself before being able to pick the right one. Below, I listed the most common questions, I might have missed some.
1. What level of HA do you need?
Since all the technologies do not offer the same level of availability, this is a first important sorting factor. Here are estimates of the level of availability offered by the various solutions.
| Level of availability | |
| Simple replication | 98 to 99.9+% |
| Master-Master with MMM manager | 99% |
| Heartbeat/SAN (depends on SAN) | 99.5% to 99.9% |
| Heartbeat/DRBD | 99.9% |
| NDB Cluster | 99.999% |
From the table, if your requirements are for 99.99%, you are restricted to NDB Cluster while if it is only 99% you have more options. I recall that the level of availability is hard to estimate and subject to debate. These are the usually accepted level of availability for these technologies.
2. Can you afford to lose data?
Obviously, if you are concerned about loss of data, you are most likely using the InnoDB storage engine, since MyISAM is not transactional and do not sync data to disk. Similarly, MySQL replication is an asynchronous process and although it is fairly fast at transferring data between the master and the slaves, there is a window of time where data loss is possible.
If you can afford to lose some data, you can consider "MySQL replication" and "Master-Master with MMM manager" otherwise, you can only consider the other three solutions.
| Data 100% safe | |
| MySQL replication | no |
| Master-Master with MMM manager | no |
| Heartbeat/SAN (depends on SAN) | yes |
| Heartbeat/DRBD | yes |
| NDB Cluster | yes |
3. Does your application use MyISAM only features?
There are some features like Full text indexes and GIS indexes that are supported only by MyISAM. The HA solutions that work well with MyISAM are "MySQL replication" and "Master-Master with MMM manager". Depending on the application, the MyISAM Full text indexes might be replaced by another search engine like Sphinx in order to remove the restriction. There is no HA solution other than the ones based on replication that handles GIS indexes.
| HA solutions | |
| Need MyISAM Full text or GIS indexes | "MySQL replication" and "Master-Master with MMM manager" |
| Don't use any special MyISAM feature | All |
| Can change MyISAM Full text to Sphinx | All |
4. What is the write load?
The HA solutions we present are not equal in term of their write capacity. Due to the way replication is implemented, only one thread on the slave can handle the write operations. If the replication master is multi-cores servers and is heavily writing using multiple threads, the slaves will likely not be able to keep up. Replication is not the only technology that put a strain on the write capacity, DRBD, a shared storage emulator for Linux, also reduce by about 30% (very dependent on hardware) the write capacity of a database server. In term of write capacity here are you choices.
| Write capacity | |
| MySQL replication | Fair |
| Master-Master with MMM manager | Fair |
| Heartbeat/SAN (depends on SAN) | Excellent |
| Heartbeat/DRBD | Good |
| NDB Cluster | Excellent |
5. For what level of growth are you planning?
Since NDB Cluster is an integrated sharding environment, if you are planning for a growth that will need sharding (splitting the database over multiple servers), then you might need to take a serious at that solution. If not, then, apart from the write capacity, all the solutions are approximately equal.
6. How qualified is your staff or support company?
There is a quite direct relationship between the level of availability and the complexity of the solution. In order to reach the promised level of availability, the staff maintaining the HA setup, either internal or external, must have the required level of expertise. The required expertise level is summarized in the table below.
| Expertise level | |
| MySQL replication | Typical, average MySQL DBA + some Sysadmin skills |
| Master-Master with MMM manager | Good, average MySQL DBA + good Sysadmin skills |
| Heartbeat/SAN (depends on SAN) | High, Good MySQL DBA + strong Sysadmin skills |
| Heartbeat/DRBD | High, Good MySQL DBA + strong Sysadmin skills |
| NDB Cluster | Very high, Specific NDB knowledge, strom MySQL skills and strong Sysadmin skills |
7. How deep are your pocket?
The last aspect that needs to be considered is the budget, complexity is expensive. We will consider two types of setup. The first one is a basic proof of concept of the technology with the hardware tested, the data imported and basic testing and documentation. A proof of concept setup is a good way to get used to a technology and experiment with it in a test environment. The other type of setup we will consider is a full production setup that includes extensive testing, fire drills, full documentation, monitoring, alerting, backups, migration to production and post migration monitoring. Of course, it is the safest way to migrate an HA solution to production. All the times here are estimates based on field experience, the values presented here are fairly typical and contains some buffers for unexpected problems. Although an HA solution can be built remotely through a KVM over IP and adequate remote power management, an on site intervention with physical access to the servers is the preferred way, especially for the most complex solutions.
| Proof of concept | Migration to Production | |
| MySQL replication | 4 hours | 12 hours |
| Master-Master with MMM manager | 8 hours | 24 hours |
| Heartbeat/SAN (depends on SAN) | 32 hours | 120 hours |
| Heartbeat/DRBD | 40 hours | 120 hours |
| NDB Cluster | 40 hours | 120 hours+ |











del.icio.us
digg