May 23, 2013

Benchmarking Percona Server TokuDB vs InnoDB

After compiling Percona Server with TokuDB, of course I wanted to compare InnoDB performance vs TokuDB. I have a particular workload I’m interested in testing – it is an insert-intensive workload (which is TokuDB’s strong suit) with some roll-up aggregation, which should produce updates in-place (I will use INSERT .. ON DUPLICATE KEY UPDATE statements [...]

InnoDB Full-text Search in MySQL 5.6 (part 1)

I’ve never been a very big fan of MyISAM; I would argue that in most situations, any possible advantages to using MyISAM are far outweighed by the potential disadvantages and the strengths of InnoDB. However, up until MySQL 5.6, MyISAM was the only storage engine with support for full-text search (FTS). And I’ve encountered many [...]

Edge-case behavior of INSERT…ODKU

A few weeks back, I was working on a customer issue wherein they were observing database performance that dropped through the floor (to the point of an outage) roughly every 4 weeks or so. Nothing special about the environment, the hardware, or the queries; really, the majority of the database was a single table with [...]

MySQL Indexing Best Practices: Webinar Questions Followup

I had a lot of questions on my MySQL Indexing: Best Practices Webinar (both recording and slides are available now) We had lots of questions. I did not have time to answer some and others are better answered in writing anyway. Q: One developer on our team wants to replace longish (25-30) indexed varchars with [...]

The case for getting rid of duplicate “sets”

The most useful feature of the relational database is that it allows us to easily process data in sets, which can be much faster than processing it serially. When the relational database was first implemented, write-ahead-logging and other technologies did not exist. This made it difficult to implement the database in a way that matched [...]

Checking the subset sum set problem with set processing

Hi, Here is an easy way to run the subset sum check from SQL, which you can then distribute with Shard-Query:

Notice there is no 16 in the list. We did not pass the check. There are enough 15s though. The distinct value count for each item in the output set, must at least [...]

Flexviews – part 3 – improving query performance using materialized views

Combating “data drift” In my first post in this series, I described materialized views (MVs). An MV is essentially a cached result set at one point in time. The contents of the MV will become incorrect (out of sync) when the underlying data changes. This loss of synchronization is sometimes called drift. This is conceptually [...]

MySQL Partitioning – can save you or kill you

I wanted for a while to write about using MySQL Partitioning for Performance Optimization and I just got a relevant customer case to illustrate it. First you need to understand how partitions work internally. Partitions are on the low level are separate table. This means when you’re doing lookup by partitioned key you will look [...]

Getting History of Table Sizes in MySQL

One data point which is very helpful but surprisingly few people have is the history of the table sizes. Projection of data growth is very important component for capacity planning and simply watching the growth of space used on partition is not very helpful. Now as MySQL 5.0+ has information schema collecting and keeping this [...]

A workaround for the performance problems of TEMPTABLE views

MySQL supports two different algorithms for views: the MERGE algorithm and the TEMPTABLE algorithm. These two algorithms differ greatly. A view which uses the MERGE algorithm can merge filter conditions into the view query itself. This has significant performance advantages over TEMPTABLE views. A view which uses the TEMPTABLE algorithm will have to compute the [...]