May 24, 2013

Distributed set processing performance analysis with ICE 3.5.2pl1 at 20 nodes.

Demonstrating distributed set processing performance Shard-Query + ICE scales very well up to at least 20 nodes This post is a detailed performance analysis of what I’ve coined “distributed set processing”. Please also read this post’s “sister post” which describes the distributed set processing technique. Also, remember that Percona can help you get up and [...]

Shard-Query turbo charges Infobright community edition (ICE)

Shard-Query is an open source tool kit which helps improve the performance of queries against a MySQL database by distributing the work over multiple machines and/or multiple cores. This is similar to the divide and conquer approach that Hive takes in combination with Hadoop. Shard-Query applies a clever approach to parallelism which allows it to [...]

Flexviews – part 3 – improving query performance using materialized views

Combating “data drift” In my first post in this series, I described materialized views (MVs). An MV is essentially a cached result set at one point in time. The contents of the MV will become incorrect (out of sync) when the underlying data changes. This loss of synchronization is sometimes called drift. This is conceptually [...]

Is VoltDB really as scalable as they claim?

Before I begin, a disclaimer. VoltDB is not a customer, and did not pay Percona or me to investigate VoltDB’s scalability or publish this blog post. More disclaimers at the end. Short version: VoltDB is very scalable; it should scale to 120 partitions, 39 servers, and 1.6 million complex transactions per second at over 300 [...]

Advanced index analysis with mk-index-usage

The new release of Maatkit has a useful feature in mk-index-usage to help you determine how indexes are used in more flexible ways. The default report just prints out ALTER statements for removing unused indexes, which is nice, but it’s often helpful to ask more sophisticated questions about index usage. I’ll use this blog’s queries [...]

Tuning InnoDB Concurrency Tickets

InnoDB has an oft-unused parameter innodb_concurrency_tickets that seems widely misunderstood. From the docs: “The number of threads that can enter InnoDB concurrently is determined by the innodb_thread_concurrency variable. A thread is placed in a queue when it tries to enter InnoDB if the number of threads has already reached the concurrency limit. When a thread [...]

A workaround for the performance problems of TEMPTABLE views

MySQL supports two different algorithms for views: the MERGE algorithm and the TEMPTABLE algorithm. These two algorithms differ greatly. A view which uses the MERGE algorithm can merge filter conditions into the view query itself. This has significant performance advantages over TEMPTABLE views. A view which uses the TEMPTABLE algorithm will have to compute the [...]

InnoDB: look after fragmentation

One problem made me puzzled for couple hours, but it was really interesting to figure out what’s going on. So let me introduce problem at first. The table is

Table has 11864696 rows and takes Data_length: 698,351,616 bytes on disk The problem is that after restoring table from mysqldump, the query that scans data [...]

Air traffic queries in InfiniDB: early alpha

As Calpont announced availability of InfiniDB I surely couldn’t miss a chance to compare it with previously tested databases in the same environment. See my previous posts on this topic: Analyzing air traffic performance with InfoBright and MonetDB Air traffic queries in LucidDB I could not run all queries against InfiniDB and I met some [...]

Air traffic queries in LucidDB

After my first post Analyzing air traffic performance with InfoBright and MonetDB where I was not able to finish task with LucidDB, John Sichi contacted me with help to setup. You can see instruction how to load data on LucidDB Wiki page You can find the description of benchmark in original post, there I will [...]