June 23, 2008

Neat tricks for the MySQL command-line pager

Posted by Baron Schwartz

How many of you use the mysql command-line client?  And did you know about the pager command you can give it?  It’s pretty useful.  It tells mysql to pipe the output of your commands through the specified program before displaying it to you.

Here’s the most basic thing I can think of to do with it: use it as a pager.  (It’s scary how predictable I am sometimes, isn’t it?)

[read more...]

May 31, 2008

Tools to use for MySQL Performance Review

Posted by peter

There are some tools we commonly use doing performance review and optimization and we often ask each other where that particular stuff is located on the web or what is exactly name of the command what does that.

Initially I thought creating internal Percona Wiki page, but thought there is no reason this information should not be public instead.

So now you can find our favorite MySQL Performance Review Tools on the site.

This is just list of tools which came from the top of my head and I’m sure it is far from complete. We will extend it and we would like to hear your suggestions on what else we should add to it.

In this list we focused on Open Source tools which are helpful for “in time” performance audit - for example you see no graphing software in this list or any commercial offerings.

March 27, 2008

Using MMM to ALTER huge tables

Posted by Aurimas Mikalauskas

Few months ago, I wrote about a faster way to do certain table modifications online. It works well when all you want is to remove auto_increment or change ENUM values. When it comes to changes that really require table to be rebuilt - adding/dropping columns or indexes, changing data type, converting data to different character set - MySQL master-master replication especially accompanied by MMM can be very handy to do the changes with virtually no downtime.
[read more...]

November 26, 2007

Data Recovery Toolkit for InnoDB Version 0.1 Released

Posted by Alexey Kovyrin

As Peter mentioned in one of previous posts, we’ve done huge work developing robust strategies of InnoDB data recovery to provide our customers effective data recovery services and one of major parts of these strategies is our toolkit for InnoDB data recovery. Today I’m proud to announce its first public release which was used to help some of our customers to recover 95-100% of their deleted data.

This release already has a pretty decent set of features:

  • Supports both REDUNDANT (pre mysql 5.0) and COMPACT (mysql 5.0+) versions of tablespaces
  • Works with single tablespaces and file-per-table tablespaces
  • Able to recover data even when processed InnoDB page has been reassigned to another table and/or was partially destroyed
  • Supports all MySQL data types except BLOBs, SETs and BITs (will be implemented in next releases)
  • Has really great set of data filters to define data ranges (for numbers), field lengths (for variable length fields), character sets (for strings), date periods (for dates), etc.
  • Shipped with easy to use tool which could be used to create innodb table definitions based on CREATE TABLE clauses, so you don’t need to write table definitions yourself - you just need to add data filters and get your data back (well, in most of the cases)
  • Results are presented in CSV file format which could be used with MySQL’s LOAD DATA function

So, if you intrigued enough and would like to check it out, welcome to Google Code page of the project where you can find latest version of the toolset code and more links to information resources related to InnoDB data structures and recovery procedures.

April 5, 2007

MySQL Master-Master replication manager released

Posted by peter

The MySQL Master-Master replication (often in active-passive mode) is popular pattern used by many companies using MySQL for scale out. Most of the companies would have some internal scripts to handle things as automatic fallback and slave cloning but no Open Source solution was made available.

Few months ago we were asked to implement such solution for one of the customers and they kindly agreed to let us release things under GPL2 License, and we gave them reduced rate for being Open Source friendly.

So what does this tool do and how it works ?

Currently it is implemented based on Linux IP management tools and LVM for snapshot creation but we hope support for other operation systems added in the future.

It can manage master-master pair as well as other configurations such as master-master and bunch of slaves.

Typically you would define “roles” for example READER and WRITER roles for most simply case and assign them to the severs. For example you can say both servers in pair can be reader at the same time but only one of them should be writable at the same time for master-master pair.

Each of the roles will have pair of IPs associated with it in this case and it will make sure all of these IPs are handled by some server, so you can use DNS for load balancing without worrying about TTL and similar things.

If Active server fails in the pair both its READER and WRITER role will be taken over by passive node and depending on monitoring configuration it can happen within few seconds.

Such IP based high availability and load balancing does not require any extra hardware or software and works well for bunch of applications which could be implemented using different languages etc, which makes application based fallbacks problematic.

The tool also takes extra caution to prevent application mistakes. What will happen if you will write to the slave because of application error ? Well you’ll break replication often without knowing about it. You can kind of solve it with using read-only user for slave connection but what if your application simply was misconfigured and things the server is master when it is not ? To take care of this Master Master Manager makes sure only one of the nodes is writable at all times and other is set to –read-only, so unless you use user with SUPER privilege you should be safe.

It has some other neat features, for example you may configure it to remove READER role from the server if it gets too delayed with replication (or if replication breaks) so you do not have to do it in each application.

We also took extra caution about making sure things can’t run out of sync silently. For example you might know if slave server reboots (say power goes down) you can’t be sure about data consistency because replication may be restarted with wrong position. Sometimes this shows up as an errors but for some query pattern it will not. MMM will detect this situation and will hold the server for administrator to decide.

One command LVM based sync is also implemented so restoring broken replication safe way becomes very easy.
Besides simple “cloning” of the nodes you can use same tool to create a backup with number of methods supported including compressed backup or incremental backups with rdiff.

Finally we have implemented safe role switch, meaning you can move writer to other node in clean way (making sure replication is still in sync) - I see too often this switch happens “dirty” way potentially risking replication inconsistencies. This is very handy if you want to restart one of servers in clusters to upgrade OS or add more RAM to the system.

This tool works well for number of customers and users but it is surely early software version so try it on your own risk and make sure to provide your feedback and suggestions.

You can learn some more about MMM here, download the tool here and have your questions answered and suggestions welcomed here

September 9, 2006

Alternatives of PHP ?

Posted by Vadim

When I loaded GigaBytes of XMLs into mysql database with PHP script, I was thinking about PHP alternatives. Why do I need that ?
1. PHP is slow - I’m speaking about area of data processing and implementation of algorithms
2. No good cli debugger - I’m just tired of debugging with ‘echo’ and ‘var_dump’
3. Unpredictable memory consumption - it’s easy in processing of big files to eat all available memory
4. Need something new - I’ve been using PHP for almost 10 years, so I want to try something else to refresh my mind.
[read more...]

September 6, 2006

Slow Query Log analyzes tools

Posted by peter

MySQL has simple but quite handy feature - slow query log, which allows you to log all queries which took over define number of seconds to execute. There is also an option to enable logging queries which do not use indexes even if they take less time (–log-queries-not-using-indexes)

Slow query log is great to spot really slow queries which are often good candidates for optimization but it has few serious problems which limits extent to which it is helpful. First - it only allows you to set slow query time in seconds, having 1 second minimum value. For most of interactive applications this is way too large - if you’re developing Web application you probably want whole page to be generated less in 1 second, which issues many queries during generation. Second - if you enable option to log queries which do not use indexes it well can be flooded with fast and efficient queries, which just happen to do full table scans - for example if you would be having drop down list of states in your application and use SELECT * FROM STATES for that it would trigger and log the query.

Taking other Approach

For our clients we often need to find a queries which impact application the most. It does not always have to be slowest queries - query taking 10ms and run 1.000 times per second puts more load on server than 10 seconds query running once per second. We of course want to get rid of really slow queries but to really optimize application throughput queries which generate most of the load need to be investigated

[read more...]

August 18, 2006

SysBench - benchmark tool

Posted by Vadim

Sysbench is benchmark developed by Alexey Kopytov (software engineer @ MySQL AB)
- http://sysbench.sourceforge.net/ and I want to write a short intro about this tool as sysbench is one of software for my everyday use. For example, SUN published their Solaris vs RedHat stuff based on sysbench’s results (Peter and me provided performance consutling for this publishing).
Sysbench has a lot of options and details so my goal is describe common usage of benchmark.
Sysbench allows to test:

  • file I/O performance
  • scheduler performance
  • memory allocation and transfer speed
  • POSIX threads implementation performance
  • database server performance

First four is useful for the platform evalution, for example if you want to compare speed of file I/O and implementation of threads on different servers - I will write about in further notes. Regarding database benchmarks - in 0.4.7 and earlier versions Sysbench supported only predefined set of queries, but that will change in version 0.5 where sysbench is scriptable (Lua http://www.lua.org/ as scripting language) , and you can use your own tables and queries. 0.5 should be realesed soon. Originally Sysbench supported only MySQL, but later Oracle and PostgreSQL (and deviations, e.g EnterpriseDB) were added - it is not clear from Docs page, but it is :). Sysbench uses only a native API, not middle layers like ODBC, that is why list of supported DB is not wide. Other restriction Sysbench can’t be complided on Windows - well you can try and write about your experience. In next notes I’ll write how to setup and perform database benchmarks.

June 26, 2006

Full text search for all MySQL Storage Engines

Posted by peter

As we know build in full text search is currently limited only to MyISAM search engine as well as has few other limits.

Today Sphinx Search plugin for MySQL was released which now provides fast and easy to use full text search solution for all storage engines. This version also adds a lot of other new features, including boolean search and distributed searching.

A while ago I already wrote about Sphinx Search Egine, comparing it to built in FullText search and Mnogosearch. I guess I should soon repeat tests, adding Lucene to the list for complete picture.

And if you do not feel like patching MySQL or use MySQL 5.1 beta to use sphinx as MySQL Storage Engine you can still use it old fashion way as separate server.

May 3, 2006

Group commit and real fsync

Posted by peter

During the recent months I’ve seen few cases of customers upgrading to MySQL 5.0 and having serious performance slow downs, up to 10 times in certain cases. What was the most surprising for them is the problem was hardware and even OS specific - it could show up with one OS version but not in the other. Even more interesting performance may be dramatically affected by –log-bin settings, which usually has just couple of percent overhead. So what is going on?

Actually we’re looking at two issues here which interleave such funny way

  • Group commit is broken in MySQL 5.0 if binary loging is enabled (as it enables XA)
  • Certain OS/Hardware configurations still fake fsync delivering great performance at the cost of being non ACID

First one can be tracked by this bug. In the nutshell the problem is - new feature - XA was implemented in MySQL 5.0 which did not work with former group commit code. The new code for group commit however was never implemented. XA allows to keep different transactonal storage engines in sync, together with binary log. XA is enabled if binary log is enabled this is why this issue is trigered by enabled binary log. if binary log is disabled, so is XA and old group commit code works just fine.

Second one is interesting. Actually we would hear much more people screaming about this problem if OS would be honest with us. Happily for us many OS/Hardware pairs are still lying about fsync(). fsync() call suppose to place data on the disk securely, which unless you have battery backed up cache would give you only 80-200 sequential fsync() calls per second depending on your hard drive speed. With fake fsync() call the data is only written to the drives memory and so can be lost if power goes down. However it gives great performance improvement and you might see 1000+ of fsync() calls per second. So if your OS is not giving you real fsync you might not notice this bug. The performance degradation will still happen but it will be much smaller, especially with large transactions.

So how you can solve the problem ?

  • Disable binary log. This could be option for slaves for example which do not need point in time recovery etc.
  • Check if you OS is doing real fsync. You should to know anyway if you care about your data safety. This can be done for example by using SysBench: sysbench –test=fileio –file-fsync-freq=1 –file-num=1 –file-total-size=16384 –file-test-mode=rndwr. This will write and fsync the same page and you should see how many requests/sec it is doing. You also might want to check diskTest from this page http://www.faemalia.net/mysqlUtils/ which does some extra tests for fsync() correctness.
  • Install RAID with battery backed up cache. This gives about the same effect as fake fsync() but you can make it secure (However make sure your drives are not caching data by themselves). The good thing RAID with battery backed up cache are becoming really inexpensive.

You also probably want to know if this bug is going to be fixed ? I’m not authority in this question but as Heikki describes it as fundamental task I’m not sure it will be done in 5.0 Good if it is done in 5.1.