April 24, 2014

Talking MySQL to Sphinx

In the recently released Sphinx version 0.9.9-rc2 there is a support for MySQL wire protocol and SphinxQL – SQL-like language to query Sphinx indexes. This support is currently in its early preview stage but it is still fun to play with. A thing to mention – unlike MySQL Storage Engines, some of which as InfoBright […]

Beware of MySQL Data Truncation

Here is nice gotcha which I’ve seen many times and which can cause just a minefield for many reasons. Lets say you had a system storing articles and you use article_id as unsigned int. As the time goes and you see you may get over 4 billions of articles you change the type for article_id […]

Limited disk space? Compact backups with Percona Xtrabackup 2.1

One very interesting feature, “Compact Backup,” is introduced in Percona XtraBackup 2.1. You can run “compact backups” with the  –compact option, which is very useful for those who have limited disk space to keep the database backup. Now let’s first understand how it works. When we are using –compact option with Innobackupex, it will omit the secondary […]

Benchmarking Percona Server TokuDB vs InnoDB

After compiling Percona Server with TokuDB, of course I wanted to compare InnoDB performance vs TokuDB. I have a particular workload I’m interested in testing – it is an insert-intensive workload (which is TokuDB’s strong suit) with some roll-up aggregation, which should produce updates in-place (I will use INSERT .. ON DUPLICATE KEY UPDATE statements […]

Edge-case behavior of INSERT…ODKU

A few weeks back, I was working on a customer issue wherein they were observing database performance that dropped through the floor (to the point of an outage) roughly every 4 weeks or so. Nothing special about the environment, the hardware, or the queries; really, the majority of the database was a single table with […]

Logging Deadlock errors

The principal source of information for InnoDB diagnostics is the output of SHOW ENGINE INNODB STATUS but there are some sections that are not very useful. For example, LATEST DETECTED DEADLOCK only shows, as the name implies, the latest error detected. If you have 100 deadlocks per minute you will be able to see only […]

The case for getting rid of duplicate “sets”

The most useful feature of the relational database is that it allows us to easily process data in sets, which can be much faster than processing it serially. When the relational database was first implemented, write-ahead-logging and other technologies did not exist. This made it difficult to implement the database in a way that matched […]

Checking the subset sum set problem with set processing

Hi, Here is an easy way to run the subset sum check from SQL, which you can then distribute with Shard-Query:

Notice there is no 16 in the list. We did not pass the check. There are enough 15s though. The distinct value count for each item in the output set, must at least […]

Using any general purpose computer as a special purpose SIMD computer

Often times, from a computing perspective, one must run a function on a large amount of input. Often times, the same function must be run on many pieces of input, and this is a very expensive process unless the work can be done in parallel. Shard-Query introduces set based processing, which on the surface appears […]

Connecting orphaned .ibd files

There are two ways InnoDB can organize tablespaces. First is when all data, indexes and system buffers are stored in a single tablespace. This is typicaly one or several ibdata files. A well known innodb_file_per_table option brings the second one. Tables and system areas are split into different files. Usually system tablespace is located in […]