April 23, 2014

Using Apache Hadoop and Impala together with MySQL for data analysis

Apache Hadoop is commonly used for data analysis. It is fast for data loads and scalable. In a previous post I showed how to integrate MySQL with Hadoop. In this post I will show how to export a table from  MySQL to Hadoop, load the data to Cloudera Impala (columnar format) and run a reporting […]

Generating test data for MySQL tables

One of the common tasks requested by our support customers is to optimize slow queries. We normally ask for the table structure(s), the problematic query and sample data to be able to reproduce the problem and resolve it by modifying the query, table structure, or global/session variables. Sometimes, we are given access to the server […]

QA: Advanced Option Combinatorics (Pairwise Testing): Combinatorial mysqld Option Test Case Generation

How do we ensure that, when we have 35+ testable option combinations for mysqld, we test each and every combination of them? For example: will a different innodb_log_file_size combined with more innodb_log_files_in_group and a modified innodb_fast_shutdown setting truly not affect Percona’s log archiving feature? Most option-related bugs are caused by the setting of 1 or […]

TokuDB vs InnoDB in timeseries INSERT benchmark

This post is a continuation of my research of TokuDB’s  storage engine to understand if it is suitable for timeseries workloads. While inserting LOAD DATA INFILE into an empty table shows great results for TokuDB, what’s more interesting is seeing some realistic workloads. So this time let’s take a look at the INSERT benchmark.

Crash-resistant replication: How to avoid MySQL replication errors

Percona Server’s “crash-resistant replication” feature is useful in versions 5.1 through 5.5. However, in Percona Server 5.6 it’s replaced with Oracle MySQL 5.6′s “crash safe replication” feature, which has it’s own implementation (you can read more about it here). A MySQL slave normally stores its position in files master.info and relay-log.info which are updated by […]

When it’s faster to use SQL in MySQL NDB Cluster over memcache API

Memcache access for MySQL Cluster (or NDBCluster) provides faster access to the data because it avoids the SQL parsing overhead for simple lookups – which is a great feature. But what happens if I try to get multiple records via memcache API (multi-GET) and via SQL (SELECT with IN())? I’ve encountered this a few times […]

Analyzing Slow Query Table in MySQL 5.6

Next week I’m teaching an online Percona Training class, called Analyzing SQL Queries with Percona Toolkit.  This is a guided tour of best practices for pt-query-digest, the best tool for evaluating where your database response time is being spent. This month we saw the GA release of MySQL 5.6, and I wanted to check if any […]

MySQL 5.6.7-RC in tpcc-mysql benchmark

MySQL 5.6.7 RC is there, so I decided to test how it performs in tpcc-mysql workload from both performance and stability standpoints. I can’t say that my experience was totally flawless, I bumped into two bugs: MySQL 5.6.7 locks itself on CREATE INDEX MySQL 5.6.7-rc crashed under tpcc-mysql workload But at the end, is not […]

Benchmarking single-row insert performance on Amazon EC2

I have been working for a customer benchmarking insert performance on Amazon EC2, and I have some interesting results that I wanted to share. I used a nice and effective tool iiBench which has been developed by Tokutek. Though the “1 billion row insert challenge” for which this tool was originally built is long over, […]

Side load may massively impact your MySQL Performance

When we’re looking at benchmarks we typically run some stable workload and we run it in isolation – nothing else is happening on the system. This is not however how things happen in real world when we have significant variance in the load and many things can be happening concurrently. It is very typical to […]