July 31, 2014

Using InfiniDB MySQL server with Hadoop cluster for data analytics

In my previous post about Hadoop and Impala I benchmarked performance of analytical queries in Impala. This time I’ve tried InfiniDB for Hadoop (open-source version) on the modern hardware with an 8-node Hadoop cluster. One of the main advantages (at least for me) of InifiniDB for Hadoop is that it stores the data inside the Hadoop cluster but uses the […]

Using Apache Hadoop and Impala together with MySQL for data analysis

Apache Hadoop is commonly used for data analysis. It is fast for data loads and scalable. In a previous post I showed how to integrate MySQL with Hadoop. In this post I will show how to export a table from  MySQL to Hadoop, load the data to Cloudera Impala (columnar format) and run a reporting […]

Increasing slow query performance with the parallel query execution

MySQL and Scaling-up (using more powerful hardware) was always a hot topic. Originally MySQL did not scale well with multiple CPUs; there were times when InnoDB performed poorer with more  CPU cores than with less CPU cores. MySQL 5.6 can scale significantly better; however there is still 1 big limitation: 1 SQL query will eventually use only […]

Sphinx search performance optimization: attribute-based filters

One of the most common causes of a poor Sphinx search performance I find our customers face is misuse of search filters. In this article I will cover how Sphinx attributes (which are normally used for filtering) work, when they are a good idea to use and what to do when they are not, but […]

Shard-Query turbo charges Infobright community edition (ICE)

Shard-Query is an open source tool kit which helps improve the performance of queries against a MySQL database by distributing the work over multiple machines and/or multiple cores. This is similar to the divide and conquer approach that Hive takes in combination with Hadoop. Shard-Query applies a clever approach to parallelism which allows it to […]

Shard-Query adds parallelism to queries

Preamble: On performance, workload and scalability: MySQL has always been focused on OLTP workloads. In fact, both Percona Server and MySQL 5.5.7rc have numerous performance improvements which benefit workloads that have high concurrency. Typical OLTP workloads feature numerous clients (perhaps hundreds or thousands) each reading and writing small chunks of data. The recent improvements to […]

New OLAP Wikistat benchmark: Introduction and call for feedbacks

I’ve seen my posts on Ontime Air traffic and Star Schema Benchmark got a lot of interest (links: http://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-infobright-infinidb-and-luciddb/ http://www.mysqlperformanceblog.com/2009/10/02/analyzing-air-traffic-performance-with-infobright-and-monetdb/ http://www.mysqlperformanceblog.com/2009/10/26/air-traffic-queries-in-luciddb/ http://www.mysqlperformanceblog.com/2009/11/02/air-traffic-queries-in-infinidb-early-alpha/ ). However benchmarks by itself did not cover all cases I would want, so I was thinking about better scenario. The biggest problem is to get real big enough dataset, and I thank […]

Air traffic queries in MyISAM and Tokutek (TokuDB)

This is next post in series Analyzing air traffic performance with InfoBright and MonetDB Air traffic queries in LucidDB Air traffic queries in InfiniDB: early alpha Let me explain the reason of choosing these engines. After initial three posts I am often asked “What is baseline ? Can we compare results with standard MySQL engines […]

Air traffic queries in InfiniDB: early alpha

As Calpont announced availability of InfiniDB I surely couldn’t miss a chance to compare it with previously tested databases in the same environment. See my previous posts on this topic: Analyzing air traffic performance with InfoBright and MonetDB Air traffic queries in LucidDB I could not run all queries against InfiniDB and I met some […]

State of the art: Galera – synchronous replication for InnoDB

First time I heard about Galera on Percona Performance Conference 2009, Seppo Jaakola was presenting “Galera: Multi-Master Synchronous MySQL Replication Clusters”. It was impressed as I personally always wanted it for InnoDB, but we had it in plans at the bottom of the list, as this is very hard to implement properly. The idea by […]