June 8, 2009

Impossible - possible, moving InnoDB tables between servers

Posted by Vadim

This is probably the feature I missed most from early days when I started to use InnoDB instead of MyISAM. Since that I figured out how to survive without it, but this is first question I hear from customers who migrated from MyISAM to InnoDB - can I just copy .ibd files from one server to another and answer “use mysqldump” is quite disappointed.
Jokes aside, I see real needs in this:
- when we need to restore only single table from backup (sometimes developers kill only single table, not whole database :) )
- to copy single table from production to QA environment. It may sound not so important, but I see needs in this quite often. QA boxes may have their own setup, not so powerful and with not enough space, but QA still needs to have some tables in fresh status.
- resharding databases, moving some shards from one server to another.

So long story short end - we made new mode for XtraBackup, now it can copy and prepare InnoDB/XtraDB tables, which later can be imported into XtraDB.

To be objective the process is still not so easy as with MyISAM tables when you just copy table.frm, table.MYD, table.MYI files, but so nice here - just can copy InnoDB tables in fully non-blocking mode, you do not need to shutdown neither source nor destination server (if destination server runs XtraDB).

In short, process looks like (ah, yes you also need innodb_file_table mode, which ones love, and ones hate)
1) do backup of specific tables on sources server with xtrabackup (server can run InnoDB or XtraDB, it does not matter)
2) do prepare of tables with –export options
3) create table on destination server with exactly the same CREATE TABLE statement as on source. (on destination you should have XtraDB with innodb_expand_import extension and with innodb_expand_import=1 setting)
4) run alter table IMPORTEDTABLE discard tablespace;
5) copy .exp and .ibd files to destination server
6) run alter table IMPORTEDTABLE import tablespace;
7) enjoy progress in error.log like:
InnoDB: import: extended import of test/img_out59
InnoDB: import: 3 indexes are detected.
InnoDB: Progress in %: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 done.

Some more information available on
http://www.percona.com/docs/wiki/percona-xtradb:patch:innodb_expand_import

If you want some internals: I expected the biggest problem with implementation would be merging insert buffer or handling undo records, stored in ibdata file, but it was easy - you just need to shutdown instance with innodb_fast_shutdown=1 (done in xtrabackup –prepare –export call). The more complex problem was that InnoDB stores pointers on root leafs of secondary indexes also in system tablespace. And that’s why we maintain .exp files - to keep information from system tablespace that related to exported table.

Both xtrabackup –export mode and XtraDB innodb_expand_import available only in source code for now, you can get them from Launchpad projects https://launchpad.net/percona-xtradb and https://launchpad.net/percona-xtrabackup. Binary releases will be ready as soon as we have done with testing of this feature. You are welcome to try!

May 15, 2009

Friday challenge: ibd recovery

Posted by Vadim

I want to make this Friday a bit more interesting - how do you feel to train a bit in InnoDB data recovery techniques.
I have .ibd datafile which was created by query
CREATE TABLE `tryme` ( `email` mediumblob, `content` mediumblob ) ENGINE=InnoDB … (SOME PARAMETERS SKIPPED) …;

and I inserted one record into this table with email address and English sentence into field (content). Some transformation where applied to both fields so just HEX viewer may not help you there.
Size of meaningful de-transformed content is
length( … (content) … ) = 15.

Can you recovery both email and content fields ? To add some competition I want to give a prize for winner, first who sends content (meaningful English sentence) to given email - will receive our book “High Performance MySQL” 2nd edition, or if he has already this book or just hate it by some reason - I can propose Amazon 25$ Gift Card as replacement.

File tryme.ibd you can download here http://mysqlperformanceblog.com/files/contest/tryme.ibd. Size 65536 bytes. md5sum a53d3ee3f3a33854f21b5f550ae53e18.

May 1, 2009

RAID vs SSD vs FusionIO

Posted by Vadim

In benchmarks passion (see my two previous posts) I managed to setup all three devices (RAID was on board; Intel X25-E SSD connected to HighPoint controller, FusionIO card) on our working horse Dell PowerEdge R900 (btw, to do that I had to switch from CentOS 5.2 to Ubuntu 8.10, as CentOS was not able to start with attached SSD card to HighPoint controller) and along with other tests I ran tpcc-like IO-bound workload on all devices.

For tests I used MySQL 5.4/InnoDB, and all other parameters are the same from previous posts (100W, buffer_pool 3GB). Filesystem - XFS mounted with nobarrier option.

Graphical results are here

and average results:

RAID10 - 7439.850 TPM
SSD - 10681.050 TPM
FusionIO - 17372.250 TPM

However what should be noted - both SSD and FusionIO are run in “non-durable” mode, that is you may lose some transactions in case of power outage (see my post http://www.mysqlperformanceblog.com/2009/03/02/ssd-xfs-lvm-fsync-write-cache-barrier-and-lost-transactions/).

While results for SSD (note it is single device, in comparison to RAID 10 on 8 disks) and FusionIO are impressive, it is worth to consider price/performance parameter.

Here is my very rough calculation:
For RAID 10 we use 8 73GB SAS 2.5″ 15K RPM disks, with price 190$ per disks it gives us 1520$ for 292GB useful space, or ~ 5.2$ per GB.
For SSD I can get 32GB card for 390$, which is ~12.1$ per GB
For FusionIO I really not sure what is price (it was given as only for tests), but quick googling gave me 30$ per GB, so for 160GB card gives 4800$.

Now simple dividing TPM on price of IO system, we have
RAID 10 - 4.8 TPM / $
SSD - 27 TPM / $
FusionIO - 3.6 TPM / $

Please note that price of transaction is not the main criteria to consider, as total TCO for systems with SSD may be much cheaper (considering you need less servers, less space, less power). Also worth to consider that SSD is only 32GB space and to have the same space as FusionIO we need 4 cards (but it still will be cheaper than FusionIO), but it also may improve performance as such setup will be able to handle IO requests in parallel.

April 29, 2009

Call for opinions: Do we need MySQL 5.0 with MySQL 5.4 performance

Posted by peter

MySQL 5.4 comes with Innodb engine which seems to have much better performance than MySQL 5.0 - this is due to locking and IO patches from Google integrated in this release (which are similar to appropriate Percona patches) as well as some unique fixes such as different innodb_thread_concurrency handling and other optimization.

Should we take Innodb from MySQL 5.4 and merry it with unique Percona patches (adaptive checkpoints, additional undo slots, profiling, etc) and integrate it with MySQL 5.0 ? How useful would you find it ?
[read more...]

April 15, 2009

How to decrease InnoDB shutdown times

Posted by Baron Schwartz

Sometimes a MySQL server running InnoDB takes a long time to shut down. The usual culprit is flushing dirty pages from the buffer pool. These are pages that have been modified in memory, but not on disk.

If you kill the server before it finishes this process, it will just go through the recovery phase on startup, which can be even slower in stock InnoDB than the shutdown process, for a variety of reasons.

[read more...]

April 8, 2009

XtraDB storage engine release 1.0.3-4 codename Sakura

Posted by Evgeniy Stepchenko

Today we glad to announce release 1.0.3-4 of our XtraDB storage engine.

Here is a list of enhancements in this release:

Percona XtraDB 1.0.3-4 (Sakura) available in source and several binary packages.

XtraDB is compatible with existing InnoDB tables (unless you used innodb_extra_undoslots) and we are going to keep compatibility in further releases. We are open for features requests for new engine and ready to accept community patches. You can monitor Percona’s current tasks and further plans on the Percona XtraDB Launchpad project. You can also request features and report bugs there. Also we have setup two maillists for General discussions and for Development related questions.

March 30, 2009

My “hot” list for next InnoDB features

Posted by Vadim

Many InnoDB scalability problems seem fixed in InnoDB-plugin-1.0.3 and I expect InnoDB-plugin will run fine on 16-24 cores boxes for many workloads. And now it is time to look on systems with 32GB+ of RAM which are not rare nowadays. Working with real customer systems I have wish-list of features I would like to see soon:

  • Fast recovery. Both recovery after crash and recovery from backup can take unacceptable long time, especially if you crashed with full 32GB buffer_pool. There is reported bug http://bugs.mysql.com/bug.php?id=29847, with ETA MySQL-6.0
  • Preload table / index into buffer_pool. You can use custom queries by primary / secondary key to “warm up” part of table, but this solution is ugly and may be slow due to random logical I/O. Implementing preload of full .ibd file with sequential read would be much better solution. This is actually more important feature than it may appear at first look - for example if you put load on slave which is not warmed up properly - slave may never catch up slave, but with small load warm up may take hours to complete, so basically it adds several hours for operations team to complete task which requires restart of slave
  • Copy single .ibd table from one server to different or (basically the same) restore single table from backup, possibly on different server (different slave). It’s all about time - copying whole 500GB backup while you need to restore only single 20GB table is very non-productive
  • Open InnoDB tables in parallel. Currently opening table is serialized, and it is especially bad at start time, when InnoDB takes probes during opening table, as it is slow operation. See also http://www.mysqlperformanceblog.com/2006/11/21/opening-tables-scalability/. Partially it can be fixed by recent patches by enabling / disabling probes and changing count of probes, but still the solution is far from perfect

As you see the list list is not about performance but mostly about operations tasks, but with current amount of data and memory on servers they become critical. I do not know what is InnoDB plans about it and would like to hear if this is will be implemented anytime soon or never. Anyway I was asked what are our current plans about XtraDB with recent InnoDB-pluging release, as performance improvements in plugin may make XtraDB out of game - so I consider list above as roadmap for XtraDB and hope some or all features are implemented this year.

Do you have any other features you miss in current InnoDB ?

March 2, 2009

SSD, XFS, LVM, fsync, write cache, barrier and lost transactions

Posted by Vadim

We finally managed to get Intel X25-E SSD drive into our lab. I attached it to our Dell PowerEdge R900. The story making it running is worth separate mentioning - along with Intel X25-E I got HighPoint 2300 controller and CentOS 5.2 just could not start with two RAID controllers (Perc/6i and HighPoint 2300). The problem was solved by installing Ubuntu 8.10 which is currently running all this system. Originally I wanted to publish some nice benchmarks where InnoDB on SSD outperforms RAID 10, but recently I faced issue which can make previous results inconsistent.
[read more...]

February 11, 2009

Limiting InnoDB Data Dictionary

Posted by Vadim

One of InnoDB’s features is that memory allocated for internal tables definitions is not limited and may grow indefinitely. You may not notice it if you have an usual application with say 100-1000 tables. But for hosting providers and for user oriented applications ( each user has dedicated database / table) it is disaster. For 100.000+ tables InnoDB is consuming gigabytes of memory, keeping definition in memory all time after table was once opened. Only way to cleanup memory is to drop table or restart mysqld - I can’t say this is good solution, so we made patch which allows to restrict memory dedicated for data dictionary.

Patch was made by request of our customer Vertical Response and released under GPL, so you can download it there http://mysqlperformanceblog.com/files/patches/innodb_dict_size_limit_standalone.patch. Currently patch is on testing stage, but later will be included into our releases. To limit memory we introduce new variable innodb_dict_size_limit (in bytes).

Some internals: There is already implemented in InnoDB LRU-based algorithm to keep only recent table entries, but it was not used by reason that InnoDB has to know if table is used or not on MySQL level. We made it by checking MySQL table_cache. If table is placed in table_cache we consider it as used, if not - we can delete it from InnoDB data dictionary. So there is the trick - if you have big enough table_cache, memory consumed by data dictionary may exceed innodb_dict_size_limit, as we can’t delete any table entry from it.

To finalize this post small marketing message - if you faced bug or problem which exists for long time and is not going to be solved by MySQL / InnoDB - contact us regarding Custom MySQL Development.

February 2, 2009

Pretending to fix broken group commit

Posted by Vadim

The problem with broken group commit was discusses many times, bug report was reported 3.5 years ago and still not fixed in MySQL 5.0/5.1 (and most likely will not be in MySQL 5.1). Although the rough truth is this bug is very hard (if possible) to fix properly. In short words if you enable replication (log-bin) on server without BBU (battery backup unit) your InnoDB write performance in concurrent load drops down significantly.
We wrote also about it before, see “Group commit and real fsync” and “Group commit and XA“.
[read more...]