July 30, 2014

The case for getting rid of duplicate “sets”

The most useful feature of the relational database is that it allows us to easily process data in sets, which can be much faster than processing it serially. When the relational database was first implemented, write-ahead-logging and other technologies did not exist. This made it difficult to implement the database in a way that matched […]

Distributed Set Processing with Shard-Query

Can Shard-Query scale to 20 nodes? Peter asked this question in comments to to my previous Shard-Query benchmark. Actually he asked if it could scale to 50, but testing 20 was all I could due to to EC2 and time limits. I think the results at 20 nodes are very useful to understand the performance: […]

Flexviews – part 3 – improving query performance using materialized views

Combating “data drift” In my first post in this series, I described materialized views (MVs). An MV is essentially a cached result set at one point in time. The contents of the MV will become incorrect (out of sync) when the underlying data changes. This loss of synchronization is sometimes called drift. This is conceptually […]

Estimating Replication Capacity

It is easy for MySQL replication to become bottleneck when Master server is not seriously loaded and the more cores and hard drives the get the larger the difference becomes, as long as replication remains single thread process. At the same time it is a lot easier to optimize your system when your replication runs […]

Data mart or data warehouse?

This is part two in my six part series on business intelligence, with a focus on OLAP analysis. Part 1 – Intro to OLAP Identifying the differences between a data warehouse and a data mart. (this post) Introduction to MDX and the kind of SQL which a ROLAP tool must generate to answer those queries. […]

A workaround for the performance problems of TEMPTABLE views

MySQL supports two different algorithms for views: the MERGE algorithm and the TEMPTABLE algorithm. These two algorithms differ greatly. A view which uses the MERGE algorithm can merge filter conditions into the view query itself. This has significant performance advantages over TEMPTABLE views. A view which uses the TEMPTABLE algorithm will have to compute the […]

Goal driven performance optimization

When your goal is to optimize application performance it is very important to understand what goal do you really have. If you do not have a good understanding of the goal your performance optimization effort may well still bring its results but you may waste a lot of time before you reach same results as […]

How to load large files safely into InnoDB with LOAD DATA INFILE

Recently I had a customer ask me about loading two huge files into InnoDB with LOAD DATA INFILE. The goal was to load this data on many servers without putting it into the binary log. While this is generally a fast way to load data (especially if you disable unique key checks and foreign key […]

Enum Fields VS Varchar VS Int + Joined table: What is Faster?

Really often in customers’ application we can see a huge tables with varchar/char fields, with small sets of possible values. These are “state”, “gender”, “status”, “weapon_type”, etc, etc. Frequently we suggest to change such fields to use ENUM column type, but is it really necessary (from performance standpoint)? In this post I’d like to present […]

Heikki Tuuri Innodb answers – Part I

Its almost a month since I promised Heikki Tuuri to answer Innodb Questions. Heikki is a busy man so I got answers to only some of the questions but as people still poking me about this I decided to publish the answers I have so far. Plus we may get some interesting follow up questions […]