April 20, 2014

Multi Column indexes vs Index Merge

The mistake I commonly see among MySQL users is how indexes are created. Quite commonly people just index individual columns as they are referenced in where clause thinking this is the optimal indexing strategy. For example if I would have something like AGE=18 AND STATE=’CA’ they would create 2 separate indexes on AGE and STATE […]

High-Performance Click Analysis with MySQL

We have a lot of customers who do click analysis, site analytics, search engine marketing, online advertising, user behavior analysis, and many similar types of work.  The first thing these have in common is that they’re generally some kind of loggable event. The next characteristic of a lot of these systems (real or planned) is […]

Stored Function to generate Sequences

Today a customer asked me to help them to convert their sequence generation process to the stored procedure and even though I have already seen it somewhere I did not find it with two minutes of googling so I wrote a simple one myself and posting it here for public benefit or my later use

MySQL 6.0 vs 5.1 in TPC-H queries

Last week I played with queries from TPC-H benchmarks, particularly comparing MySQL 6.0.4-alpha with 5.1. MySQL 6.0 is interesting here, as there is a lot of new changes in optimizer, which should affect execution plan of TPC-H queries. In reality only two queries (from 22) have significantly better execution time (about them in next posts), […]

Enum Fields VS Varchar VS Int + Joined table: What is Faster?

Really often in customers’ application we can see a huge tables with varchar/char fields, with small sets of possible values. These are “state”, “gender”, “status”, “weapon_type”, etc, etc. Frequently we suggest to change such fields to use ENUM column type, but is it really necessary (from performance standpoint)? In this post I’d like to present […]

To SQL_CALC_FOUND_ROWS or not to SQL_CALC_FOUND_ROWS?

When we optimize clients’ SQL queries I pretty often see a queries with SQL_CALC_FOUND_ROWS option used. Many people think, that it is faster to use this option than run two separate queries: one – to get a result set, another – to count total number of rows. In this post I’ll try to check, is […]

Do you always need index on WHERE column ?

I believe we wrote about this before, but this topic popups again and again. Today I’ve read opinion that if we have clause WHERE has_something=1 we should have index on column has_something (the column has two values 0 and 1). In reality the right answer is not so simple.

Redundant index is not always bad

About year ago Peter wrote about redundant indexes and mentioned sometimes it is good to leave two indexes, even one is first part of another. I’m speaking about BTREE indexes, for example, KEY (A), and KEY (A,B). From SQL point of view KEY(A) is not needed, as for queries like WHERE A=5 the index (A,B) […]

Memory allocation in Stored Function

UPDATE : Post is not actual anymore Not so long time ago I had task to update string column in table with 10mil+ rows, and, as the manipulation was non-trivial, I decided this task is good to try Stored Function. Function written – go ahead. Since 5 min I got totally frozen box with no […]

What exactly is read_rnd_buffer_size

Looking for documentation for read_rnd_buffer_size you would find descriptions such as “The read_rnd_buffer_size is used after a sort, when reading rows in sorted order. If you use many queries with ORDER BY, upping this can improve performance” which is cool but it does not really tell you how exactly read_rnd_buffer_size works as well as which […]