May 14, 2008

Concurrent inserts on MyISAM and the binary log

Posted by Baron Schwartz

Recently I had an interesting surprise with concurrent inserts into a MyISAM table. The inserts were not happening concurrently with SELECT statements; they were blocking and the process list was filling up with queries in Locked status.

My first thought was that the customer had deleted from the table, which leaves "holes" in the middle of it and prevents concurrent inserts. (You can configure the server to permit concurrent inserts even when there are holes, but it's disabled by default.) However, that turned out not to be the cause; the table was only inserted into (and selected from). Instead, the blocked statements were because of INSERT... SELECT statements that were running against the table, selecting data from it and inserting into another table.

Let's look at what happens here: suppose you have two tables tbl1 and tbl2 and concurrent inserts into tbl2 are running fine. If you now run the following query,

SQL:
  1. INSERT INTO tbl1 SELECT * FROM tbl2

The concurrent inserts into tbl2 can block. This happens if you have the binary log enabled. If you think about it, this makes sense and is correct behavior. The statements have to be serialized for the binary log; otherwise replaying the binary log can result in a different order of execution.

The MySQL manual actually says this, but not in the clearest way. It just says

If you are using the binary log, concurrent inserts are converted to normal inserts for CREATE ... SELECT or INSERT ... SELECT statements.

If you use mysqladmin debug, you'll see an ordinary SELECT gets a lock on the table like this:

CODE:
  1. Locked - read         Low priority read lock

But on INSERT...SELECT, you'll see this:

CODE:
  1. Read lock  without concurrent inserts

That read lock is what's blocking the concurrent inserts from happening.

There's no solution to this, if you need the binary log enabled. (It needs to be enabled for replication.) There are workarounds, though. You can use the old trick of SELECT INTO OUTFILE followed by LOAD DATA INFILE. You can use InnoDB instead. Or you can do something more elaborate and application-specific, but that's a topic for another post.

March 31, 2008

MySQL Performance on Memory Appliance

Posted by peter

Recently I have had a chance to check out MySQL Performance on "Memory Appliance" by Violin Memory which can be used as extremely high speed storage system.

I helped Violin Memory to optimize MySQL for customer workload and Violin memory and also had a chance to do some benchmarks on my own. 2*Quad Core Xeon running CentOS5 was tested using ext2 filesystem and SysBench tool.

Using 16K read sizes (matches Innodb page size) I could get 1.2GB/sec (80K req/sec) for reads and about 500MB/sec writes with 16 concurrent threads. Things scaled well and with 256 threads I got even a bit better performance.

Interesting enough utilization in iostat never went over few percents and load was mostly CPU bound.
[read more...]

March 21, 2008

MySQL File System Fragmentation Benchmarks

Posted by peter

Few days ago I wrote about testing writing to many files and seeing how this affects sequential read performance. I was very interested to see how it shows itself with real tables so I've got the script and ran tests for MyISAM and Innodb tables on ext3 filesystem. Here is what I found:

[read more...]

January 24, 2008

Enum Fields VS Varchar VS Int + Joined table: What is Faster?

Posted by Alexey Kovyrin

Really often in customers' application we can see a huge tables with varchar/char fields, with small sets of possible values. These are "state", "gender", "status", "weapon_type", etc, etc. Frequently we suggest to change such fields to use ENUM column type, but is it really necessary (from performance standpoint)? In this post I'd like to present a small benchmark which shows MySQL performance when you use 3 different approaches: ENUM, VARCHAR and tinyint (+joined table) columns.

In practice you can also often use 4th variant which is not comparable directly, which is using integer value and having value mapping done on application level.
[read more...]

October 29, 2007

Hacking to make ALTER TABLE online for certain changes

Posted by Aurimas Mikalauskas

Suppose you want to remove auto_increment from 100G table. No matter if it's InnoDB or MyISAM, you'd usually ALTER TABLE `huge_table` CHANGE `id` `id` int(6) NOT NULL and then wait hours for table rebuild to complete. If you're unlucky i.e. you have a lot of indexes and not too much RAM - you could end up waiting days. If you want to make this happen quick - there's another way. Not documented, but works well with both - InnoDB and MyISAM.
[read more...]

October 12, 2007

MyISAM Scalability and Innodb, Falcon Benchmarks

Posted by Vadim

We many times wrote about InnoDB scalability problems, this time We are faced with one for MyISAM tables. We saw that several times in synthetic benchmarks but never in production, that's why we did not escalate MyISAM scalability question. This time working on the customer system we figured out that box with 1 CPU Core is able to handle more queries per second than identical box, but with 4 CPU Cores.
[read more...]

June 18, 2007

Using CHAR keys for joins, how much is the overhead ?

Posted by peter

I prefer to use Integers for joins whenever possible and today I worked with client which used character keys, in my opinion without a big need. I told them this is suboptimal but was challenged with rightful question about the difference. I did not know so I decided to benchmark.

The results below are for MySQL 5.1.18 using MyISAM and Innodb tables. This time unlike other benchmarks I decided to do Join not on primary key and have query to read data for both tables. If the query would be index covering I would expect us to see different ratio. The query I use here is constructed to stress out join code while avoid sending data to the client Do not try to find any good meaning for query or schema. For joins which fetch just few rows difference is likely to be less as the join code itself is likely to be responsible for less portion of response time.

[read more...]

May 9, 2007

Merge Tables Gotcha

Posted by peter

I had the interesting customer case today which made me to do a bit research on the problem.

You can create merge table over MyISAM tables which contain primary key and global uniqueness would not be enforced in this case, this is as far as most people will think about it. In fact however it is worse than that - if you have same key values in underlying base tables some of the queries may give you wrong results:
[read more...]

March 15, 2007

MySQL MyISAM Active Active Clustering - looking for trouble ?

Posted by peter

Reading last few days worth of planet MySQL and commenting on some entries as you can see. The post by Oli catches my attention. There is also PDF with more details available

Oli is saying you can use MySQL with Active Active Clustering and MyISAM tables if you follow certain rules like enabling external locking, disabling query cache and delayed key writes etc. This is as far as many articles on this point go, and in theory this is what should work. In practice however you should think about what exactly are you looking for with this setup.

Performance ? This configuration adds effectively global table locks besides extra overhead you get from external locking. Plus key cache have to be invalidated with remote writes which makes even less efficient. So this configuration would unlikely be well performing compared to other setups, ie replication based. If you can't use replication because it is async and your load loads CPU more than IO subsystem it may sense configure multiple MySQL servers this way.

High availability ? This is where the biggest gotcha is waiting for you. If other MySQL node crashes it leaves table in potentially corrupted state which you need to check and repair before you can safely use it, which takes long time for large tables. Furthermore as I remember MySQL simply was not repairing corrupted tables in this configuration even if myisam_repair is set.

Space Saving ? As I mentioned replication can be better idea but may be the problem is your data is so large you can't afford multiple copies ? Well in this case you're risking more in terms of downtime while repairing large tables. Disk space is cheap these days so it is rarely the problem.

Stability ? But the main problem probably comes from the fact barely anyone is using this setup and there is no QA being run with it (at least was not running) which means you can't be sure all new features are really being compatible with such setup. As Oli describes even query cache which was implemented many years ago in 4.0 was implemented without looking at this feature, so what are you expecting from newer stuff. The other stuff which Oli mentions is table cache which does not seems to be properly controlled. Manually doing FLUSH TABLES does not really works as it does not allow to prevent race condition.

Of course sometimes your Boss simply heard Active Active clustering is cool and you should just set up that one with MySQL is yet another argument to use it :)

March 5, 2007

Power of MySQL Storage Engines

Posted by peter

Where does real power of MySQL Storage Engines, and pluggable storage engines as MySQL 5.1 lays ? It is very much advertised this allows third parties to create their own storage engines and we can see solutions as Solid and PBXT . Plus there is Falcon storage engine being developed inside MySQL.

All of these storage engines are however similar in their features and focus - all of them are transactional storage engines with multi versioning and row level locks so it is question of architecture implementation details of how it would compare to Innodb storage engine - old timer in this space. Does it add choice to the people ? Yes! but it also adds high level of complexity because it will not be easy to figure out which storage engine or storage engine mix is best match for you. Believe me people even often have hard time figuring out between MyISAM and Innodb and if they do they use some generic (and very wrong) ideas such as "MyISAM is always faster for reads".

The main benefit I see from these general purpose storage engine is they apply pressure on each other and as books says competition leads to customer winning. When Falcon looses to Innodb in certain cases it makes Jim to try harder, if Heikki sees Innodb does not scale as well as PBXT it will be an extra pressure to do the fixes. While Innodb was only transactional storage engine there surely was less pressure - if you're migrating from Oracle to MySQL you of course may consider to delay migration but if you're already with MySQL and requiring transactions you're stuck with Innodb if you you're happy or not. (This is not to say Innodb is bad storage engine, but we all know nothing is perfect and has its problems).

The great Innovation I think happens with custom storage engines which are not designed to be run as core storage engines and are not match of functions which you'd find in Oracle, PostgreSQL and other databases. MEMORY storage engine is old timer of this type, ARCHIVE, CSV are other good examples. There are also storage engines out where to read sensor data, talk to memcache or external search engine . As MySQL 5.1 and plugable storage engine interface matures I hope we'll see more storage engines which are great serving one particular talk but doing it in simple efficient and innovative way. NDB cluster is yet another example I'd put in here. I do not feel it is ready as main storage engine for most of applications just yet but for some cases it may be handy.

The fact which is limiting innovation at this point is the fact MySQL has only Storage Engine pluggable but SQL layer is kind of fixed same for all. This dramatically limits what you can do. Such limit is not a big surprise though as execution layer is much more complicated is constantly changed and different optimizations may change internal code paths dramatically.

I know number of projects being interested in special techniques applying to optimizes certain types of queries with MySQL, it is not easy to do however as there is no interface. We had the same problem with Sphinx - Sphinx can run sorting and group by operations much faster than MySQL can and it can do it in parallel, which means it can be 100 times faster in the end, but there is no way to hook it up to standard order by and group by functions so it is implemented in rather hackish way. Same applies to checking number of rows in result set and LIMIT implementation - these can optimized but not by use of MySQL Storage Engine interface, which means it is also has to be implemented through hacks.

In any case this is very interesting how it will be evolving and how more parts in MySQL will become pluggable and so hopefully more innovative tools and techniques will be developed.


This page was found by: mysql synthetic keys... innodb myisam group ... difference between r... mysql myisam table "... mysql ndbcluster mig...