October 30, 2009

New MariaDB release is out

Posted by peter |

MariaDB project kept development going in the repository only not providing any binary releases since April, so release was well over due and it is here now.

If you’re wondering how this release of MariaDB is different from MySQL you should read this FAQ

Giving a talk in Palo Alto, November 3rd

Posted by peter |

I’m going to give a talk on Goal Driven Performance Optimization next Tuesday. This is one of my favorite talks as it goes beyond MySQL to the principles you can apply to performance optimizations of the complex systems, especially when you have to do a lot in limited time or budget and so you can’t just fix everything what can be fixed.

Please RSVP if you’re planning to attend as space is limited.

Thanks to Sam Ghods and Box.Net for organizing the event.

October 27, 2009

State of the art: Galera – synchronous replication for InnoDB

Posted by Vadim |

First time I heard about Galera on Percona Performance Conference 2009, Seppo Jaakola was presenting “Galera: Multi-Master Synchronous MySQL Replication Clusters”. It was impressed as I personally always wanted it for InnoDB, but we had it in plans at the bottom of the list, as this is very hard to implement properly.
The idea by itself is not new, I remember synchronous replication was announced for SolidDB on MySQL UC 2007, but later the product was killed by IBM.

So long time after PPC 2009 there was available version mysql-galera-0.6, which had serious flow, to setup a new node you had to take down whole cluster. And all this time Codership ( company that develops Galera) was working on 0.7 release that introduces node propagation keeping cluster online. You can play with 0.7pre release by yourself MySQL/Galera Release 0.7pre.
[read more...]

October 26, 2009

Air traffic queries in LucidDB

Posted by Vadim |

After my first post Analyzing air traffic performance with InfoBright and MonetDB where I was not able to finish task with LucidDB, John Sichi contacted me with help to setup. You can see instruction how to load data on LucidDB Wiki page

You can find the description of benchmark in original post, there I will show number I have for LucidDB vs previous systems.

[read more...]

October 25, 2009

XtraDB Amazon Image

Posted by Aleksandr Kuzminsky |

For those who use Amazon EC2 service and were anxious about having XtraDB ready to launch there is a good news.

We created a public AMI (Amazon Machine Image) with XtraDB release 8 installed on CentOS 5.3.

How to use it.

First make sure it is avaiable.

CODE:
  1. $ ec2-describe-images ami-4701e22e
  2. IMAGE   ami-4701e22e    xtradb/centos-5.3-x86_64.fs.manifest.xml        834362721059    available       public          x86_64  machine
  3. $

Run it. It is built for x86_64 plaform, so allowed types are m1.large, m1.xlarge and c1.xlarge

CODE:
  1. $ ec2-run-instances ami-4701e22e -t m1.large
  2. RESERVATION     r-46b3432e      834362721059    default
  3. INSTANCE        i-ecc74084      ami-4701e22e                    pending         0               m1.large        2009-10-25T18:31:06+0000        us-east-1c

Wait till the instance starts

CODE:
  1. $ ec2-describe-instances i-ecc74084
  2. RESERVATION     r-46b3432e      834362721059    default
  3. INSTANCE        i-ecc74084      ami-4701e22e    ec2-75-101-203-143.compute-1.amazonaws.com      domU-12-31-39-0A-26-22.compute-1.internal       running      0
  4. m1.large        2009-10-25T18:31:06+0000        us-east-1c

Now it is up and ready.

October 24, 2009

xtrabackup-0.9.5rc

Posted by Aleksandr Kuzminsky |

Dear Community,

As of today release 0.9.5rc is available.

In this release there are following changes:
Changelog:

  • Option --no-lock is added to innobackupex-1.5.1. Use it only while ALL your
    tables are InnoDB and you DO NOT CARE about binary log
    position of backup
  • XtraBackup is ported for InnoDB Plugin 1.0.4. Barracuda file format as well as compressed tables are supported. We thank a well known Social Network site for the sponsorship.
  • Windows conscious change more
  • Impoved error messages in innobackupex
  • Windows conscious experimental change
  • Suppress purge when --stats
  • Build number in RPM name. For instance, in the name xtrabackup-0.9.5rc-50.rhel5.x86_64.rpm 50 is the build number.
  • Suppress master_thread ibuf operations for --stats
  • Suppress ibuf operations for --stats
  • fFx fatal bug at --backup when added --stats
  • New option --stats to gather index stats
  • Fixed some bugs for a 32bit platform

Fixed bugs:

The binary packages for RHEL4,5, Debian, FreeBSD, MacOS as well as source code of the XtraBackup is available on http://www.percona.com/mysql/xtrabackup/0.9.5rc/.

The Debian package is also available via APT. Just add these lines  in /etc/apt/sources.list

deb http://repo.percona.com/apt lenny main
deb-src http://repo.percona.com/apt lenny main

Update the local database

# apt-get update

And install xtrabackup

# apt-get install xtrabackup

The project lives on Launchpad : https://launchpad.net/percona-xtrabackup and you can report bug to Launchpad bug system:
https://launchpad.net/percona-xtrabackup/+filebug. The documentation is available on our Wiki.

For general questions use our Pecona-discussions group, and for development question Percona-dev group.

For support, commercial and sponsorship inquiries contact Percona.

October 23, 2009

Percona Performance Conference 2010

Posted by Baron Schwartz |

It's time to announce our plans for Percona Performance Conference 2010. We will hold the event on April 12th and 13th in Santa Clara, California. In the months to come we will announce the theme, the exact location (we have it reserved, but it's subject to change), the exact dates (we may add days), and we'll open up registration and a Call For Proposals. Stay tuned!

October 20, 2009

Captcha is replaced

Posted by Vadim |

After many complains we removed captcha in comments and replaced to Anti-Captcha ( http://wordpress.org/extend/plugins/anti-captcha/). Now you need enabled JavaScript in your browser to post comments.

October 19, 2009

MySQL-Memcached or NOSQL Tokyo Tyrant – part 3

Posted by matt |

This is part 3 of our series.  In part 1 we talked about boosting performance with memcached on top of MySQL, in Part 2 we talked about running 100% outside the data with memcached, and now in Part 3 we are going to look at a possible solution to free you from the database.  The solution I am going to discuss here is Tokyo Cabinet and Tyrant.

I am not going to give you a primer  or Tutorial on Tyrant and Cabinet, there are plenty of these out there already.  Instead I want to see what sort of performance we can see compared to MySQL and Memcached, and later on other NoSQL solutions.  Tokyo actually allows you to use several types of databases that are supported, there are hash databases which are very similar to memcached, a table database which is similar to your classic database tables where you can add a where clause and search individual columns, and a ton more "database options"  beyond just those two.  Again my goal is to not make this a Tokyo Tyrant tutorial but rather show one potential role it can play.

More details can be read about here:
http://1978th.net/tokyotyrant/
http://1978th.net/tokyocabinet/

So if we can get performance similar to memcached with Tokoyo Tyrant when using disk based hash tables it would be a compelling replacement for our application here.  It should provide the interface and the same access we saw in memcached but with disk persistence. So let's look at the numbers:

Tyrant -vs- memcached

Tyrant's disk based hash was almost 2x faster then combining memcached and MySQL, and about 20% slower then the all memory memcached approach.  So for this particular application I would have been much better off not storing my data in MySQL and instead looking outside the database for an answer.  Now sure there are other reasons you may want to keep data in the database... but I am trying to get you to think about your application and if those reasons are really valid.  Helping clients pick the right solution is one of the things we do here at Percona.  If an application requires a database great, but if there is a better solution we want to suggest it.  It's our goal to make your application perform optimally.

Finally, one concern you have to have is the scalability of your storage solution.  As load, number of threads, and data size increases how does performance differ or change?  One knock on Tokyo -vs- Memcached is Tokyo is not distributed by default.  Now that's not to say we could not shard it based on a hash, or even build an api with the capability built in ( or use the memcached clients which works! )...  but native support is lacking.  It does support replication which could make some rather interesting architectures in the future.

So lets look at some scalability benchmarks, my server resources are rather limited but I thought I should try throwing more threads and work at the server until it hit its limit and fell over dead.  It's interesting to see the number of transactions that occur with a given number of threads.  let's look at some of these:

Tyrant/MySQL/Memcached Thread Benchmark performance

As expected the smaller buffer pool struggled ( why a smaller buffer pool?  This simulates a much larger data set.  A BP of 256M with 1GB of data, can give similar performance to 20GB of data and a 5GB BP ).  So with 256M BP and 4GB of memcached we were well off the numbers we hit with a 4GB BP+4Gb of memcached ( which is expected ).   Adding more threads even up to 128 threads increased overall throughput but my load average on the server hit 40 and my CPU was pegged.  At 128 threads I was pegging out my CPU across the board.  Also interesting is I started to hit bottlenecks in MySQL/Innodb when I had enough memory but I increased the threads from 64 to 128.  As time permits I should revisit this and look at increased datasets, and look for area's where Tyrant may stumble a bit.

Bottom line given a specific application and data pattern sometimes a relational database is not the appropriate place for storing data.  A tool like Tokyo Tyrant may not be for everyone or every application, but neither is a relational database.  Before building your next application try and understand whether an RDBMS is really needed or not.

How did I do these tests:

The above number were run with 32 Threads, Tyrant was started with 8 threads and 128M of memory,  memached was started with 16 threads ( 1.4 memcached ), mysql was 5.1 XtraDB.  Each environment had 2 tables each with 2 million rows.  The data was identical. memcached and Tyrant stored a comma delimited string to represent the row.   Mysql was running with 256M allocated to the innodb buffer unless otherwise noted.

What's next?  Well next I am going to try and continue this series by exploring and benchmarking other NOSQL options and comparing them to database based solutions.  I think showing the performance of a couple of different Tokyo database formats would also be interesting.  What other solutions are people interested in?  I know I have gotten a lot of requests for cassandra #'s, but what else?  Drop a comment and let me know!

October 16, 2009

Finding your MySQL High-Availability solution – The questions

Posted by yves |

After having reviewed the definition my the previous post (The definitions), the next step is to respond to some questions.

Do you need MySQL High-Availability?

That question is quite obvious but some times, it is skipped. It can also be formulated "What is the downtime cost of the service?". In the cost, you need to include lost revenue from the service and you also need to consider less direct impact like loss of corporate image and other marketing costs. If your downtime cost is under $10/h, you can stop reading this document, you don't need HA. For the others, let's move on!

How to determine which MySQL High-Availability solution is best?

What is really tricky with MySQL is the number of possible HA solutions. From the simplest the most complex let's list the most common ones:

- MySQL replication with manual failover
- Master-Master with MMM manager
- Heartbeat/SAN
- Heartbeat/DRBD
- NDB Cluster

These technologies are by far, not a one size fits all and many deployments use combination of solutions. I will not cover ScaleDB and Continuent because I know almost nothing of these solutions. There are many more questions you need to ask yourself before being able to pick the right one. Below, I listed the most common questions, I might have missed some.

1. What level of HA do you need?

Since all the technologies do not offer the same level of availability, this is a first important sorting factor. Here are estimates of the level of availability offered by the various solutions.

  Level of availability
Simple replication 98 to 99.9+%
Master-Master with MMM manager 99%
Heartbeat/SAN (depends on SAN) 99.5% to 99.9%
Heartbeat/DRBD 99.9%
NDB Cluster 99.999%

From the table, if your requirements are for 99.99%, you are restricted to NDB Cluster while if it is only 99% you have more options. I recall that the level of availability is hard to estimate and subject to debate. These are the usually accepted level of availability for these technologies.

2. Can you afford to lose data?

Obviously, if you are concerned about loss of data, you are most likely using the InnoDB storage engine, since MyISAM is not transactional and do not sync data to disk. Similarly, MySQL replication is an asynchronous process and although it is fairly fast at transferring data between the master and the slaves, there is a window of time where data loss is possible.

If you can afford to lose some data, you can consider "MySQL replication" and "Master-Master with MMM manager" otherwise, you can only consider the other three solutions.

  Data 100% safe
MySQL replication no
Master-Master with MMM manager no
Heartbeat/SAN (depends on SAN) yes
Heartbeat/DRBD yes
NDB Cluster yes

3. Does your application use MyISAM only features?

There are some features like Full text indexes and GIS indexes that are supported only by MyISAM. The HA solutions that work well with MyISAM are "MySQL replication" and "Master-Master with MMM manager". Depending on the application, the MyISAM Full text indexes might be replaced by another search engine like Sphinx in order to remove the restriction. There is no HA solution other than the ones based on replication that handles GIS indexes.

  HA solutions
Need MyISAM Full text or GIS indexes "MySQL replication" and "Master-Master with MMM manager"
Don't use any special MyISAM feature All
Can change MyISAM Full text to Sphinx All

4. What is the write load?

The HA solutions we present are not equal in term of their write capacity. Due to the way replication is implemented, only one thread on the slave can handle the write operations. If the replication master is multi-cores servers and is heavily writing using multiple threads, the slaves will likely not be able to keep up. Replication is not the only technology that put a strain on the write capacity, DRBD, a shared storage emulator for Linux, also reduce by about 30% (very dependent on hardware) the write capacity of a database server. In term of write capacity here are you choices.

  Write capacity
MySQL replication Fair
Master-Master with MMM manager Fair
Heartbeat/SAN (depends on SAN) Excellent
Heartbeat/DRBD Good
NDB Cluster Excellent

5. For what level of growth are you planning?

Since NDB Cluster is an integrated sharding environment, if you are planning for a growth that will need sharding (splitting the database over multiple servers), then you might need to take a serious at that solution. If not, then, apart from the write capacity, all the solutions are approximately equal.

6. How qualified is your staff or support company?

There is a quite direct relationship between the level of availability and the complexity of the solution. In order to reach the promised level of availability, the staff maintaining the HA setup, either internal or external, must have the required level of expertise. The required expertise level is summarized in the table below.

  Expertise level
MySQL replication Typical, average MySQL DBA + some Sysadmin skills
Master-Master with MMM manager Good, average MySQL DBA + good Sysadmin skills
Heartbeat/SAN (depends on SAN) High, Good MySQL DBA + strong Sysadmin skills
Heartbeat/DRBD High, Good MySQL DBA + strong Sysadmin skills
NDB Cluster Very high, Specific NDB knowledge, strom MySQL skills and strong Sysadmin skills

7. How deep are your pocket?

The last aspect that needs to be considered is the budget, complexity is expensive. We will consider two types of setup. The first one is a basic proof of concept of the technology with the hardware tested, the data imported and basic testing and documentation. A proof of concept setup is a good way to get used to a technology and experiment with it in a test environment. The other type of setup we will consider is a full production setup that includes extensive testing, fire drills, full documentation, monitoring, alerting, backups, migration to production and post migration monitoring. Of course, it is the safest way to migrate an HA solution to production. All the times here are estimates based on field experience, the values presented here are fairly typical and contains some buffers for unexpected problems. Although an HA solution can be built remotely through a KVM over IP and adequate remote power management, an on site intervention with physical access to the servers is the preferred way, especially for the most complex solutions.

  Proof of concept Migration to Production
MySQL replication 4 hours 12 hours
Master-Master with MMM manager 8 hours 24 hours
Heartbeat/SAN (depends on SAN) 32 hours 120 hours
Heartbeat/DRBD 40 hours 120 hours
NDB Cluster 40 hours 120 hours+