May 22, 2013

How to recover deleted rows from an InnoDB Tablespace

In my previous post I explained how it could be possible to recover, on some specific cases, a single table from a full backup in order to save time and make the recovery process more straightforward. Now the scenario is worse because we don’t have a backup or the backup restore process doesn’t work. How [...]

Identifying the load with the help of pt-query-digest and Percona Server

Overview Profiling, analyzing and then fixing queries is likely the most oft-repeated part of a job of a DBA and one that keeps evolving, as new features are added to the application new queries pop up that need to be analyzed and fixed. And there are not too many tools out there that can make [...]

Helgrinding MySQL with InnoDB for Synchronisation Errors, Fun and Profit

It is no secret that bugs related to multithreading–deadlocks, data races, starvations etc–have a big impact on application’s stability and are at the same time hard to find due to their nondeterministic nature.  Any tool that makes finding such bugs easier, preferably before anybody is aware of their existence, is very welcome.

Eventual Consistency in MySQL

We’re told that foreign key constraints are the best way to enforce logical referential integrity (RI) in SQL, preventing rows from becoming orphaned.  But then we learn that the enforcement of foreign keys incurs a significant performance overhead.1,2 MySQL allows us to set FOREIGN_KEY_CHECKS=0 to disable enforcement of RI when the overhead is too high.  But [...]

The case for getting rid of duplicate “sets”

The most useful feature of the relational database is that it allows us to easily process data in sets, which can be much faster than processing it serially. When the relational database was first implemented, write-ahead-logging and other technologies did not exist. This made it difficult to implement the database in a way that matched [...]

Distributed Set Processing with Shard-Query

Can Shard-Query scale to 20 nodes? Peter asked this question in comments to to my previous Shard-Query benchmark. Actually he asked if it could scale to 50, but testing 20 was all I could due to to EC2 and time limits. I think the results at 20 nodes are very useful to understand the performance: [...]

Shard-Query EC2 images available

Infobright and InnoDB AMI images are now available There are now demonstration AMI images for Shard-Query. Each image comes pre-loaded with the data used in the previous Shard-Query blog post. The data in the each image is split into 20 “shards”. This blog post will refer to an EC2 instances as a node from here [...]

Shard-Query turbo charges Infobright community edition (ICE)

Shard-Query is an open source tool kit which helps improve the performance of queries against a MySQL database by distributing the work over multiple machines and/or multiple cores. This is similar to the divide and conquer approach that Hive takes in combination with Hadoop. Shard-Query applies a clever approach to parallelism which allows it to [...]

MySQL caching methods and tips

“The least expensive query is the query you never run.” Data access is expensive for your application. It often requires CPU, network and disk access, all of which can take a lot of time. Using less computing resources, particularly in the cloud, results in decreased overall operational costs, so caches provide real value by avoiding [...]

Flexviews – part 3 – improving query performance using materialized views

Combating “data drift” In my first post in this series, I described materialized views (MVs). An MV is essentially a cached result set at one point in time. The contents of the MV will become incorrect (out of sync) when the underlying data changes. This loss of synchronization is sometimes called drift. This is conceptually [...]