One of the most important features of InnoDB is the row level locking. This feature provides better concurrency under heavy write load but needs additional precautions to avoid phantom reads and to get a consistent Statement based replication. To accomplish that, row level locking databases also acquire gap locks. What is a Phantom Read A [...]
How Does Semisynchronous MySQL Replication Work?
With the recent release of Percona XtraDB Cluster, I am increasingly being asked about MySQL’s semi-synchronous replication. I find that there are often a number of misconceptions about how semi-synchronous replication really works. I think it is very important to understand what guarantees you actually get with semi-synchronous replication, and what you don’t get. The [...]
Percona testing: Quick test clusters with kewpie!
The announcement of Percona XtraDB Cluster seems to have generated a fair bit of interest : ) Although the documentation contains more formal instructions for setting up a test cluster, I wanted to share a quick way to set up an ad-hoc cluster on a single machine to help people play with this (imho) rather [...]
Avoiding auto-increment holes on InnoDB with INSERT IGNORE
Are you using InnoDB tables on MySQL version 5.1.22 or newer? If so, you probably have gaps in your auto-increment columns. A simple INSERT IGNORE query creates gaps for every ignored insert, but this is undocumented behaviour. This documentation bug is already submitted. Firstly, we will start with a simple question. Why do we have [...]
Improved InnoDB fast index creation
One of the serious limitations in the fast index creation feature introduced in the InnoDB plugin is that it only works when indexes are explicitly created using ALTER TABLE or CREATE INDEX. Peter has already blogged about it before, here I’ll just briefly reiterate other cases that might benefit from that feature: when ALTER TABLE [...]
Using any general purpose computer as a special purpose SIMD computer
Often times, from a computing perspective, one must run a function on a large amount of input. Often times, the same function must be run on many pieces of input, and this is a very expensive process unless the work can be done in parallel. Shard-Query introduces set based processing, which on the surface appears [...]
Distributed Set Processing with Shard-Query
Can Shard-Query scale to 20 nodes? Peter asked this question in comments to to my previous Shard-Query benchmark. Actually he asked if it could scale to 50, but testing 20 was all I could due to to EC2 and time limits. I think the results at 20 nodes are very useful to understand the performance: [...]
Shard-Query EC2 images available
Infobright and InnoDB AMI images are now available There are now demonstration AMI images for Shard-Query. Each image comes pre-loaded with the data used in the previous Shard-Query blog post. The data in the each image is split into 20 “shards”. This blog post will refer to an EC2 instances as a node from here [...]
MySQL caching methods and tips
“The least expensive query is the query you never run.” Data access is expensive for your application. It often requires CPU, network and disk access, all of which can take a lot of time. Using less computing resources, particularly in the cloud, results in decreased overall operational costs, so caches provide real value by avoiding [...]
Using Flexviews – part two, change data capture
In my previous post I introduced materialized view concepts. This post begins with an introduction to change data capture technology and describes some of the ways in which it can be leveraged for your benefit. This is followed by a description of FlexCDC, the change data capture tool included with Flexviews. It continues with an [...]

