August 23, 2014

MySQL: Data Storage or Data Processing

I was thinking today of how people tend to use MySQL in modern applications and it stroke me in many cases MySQL is not used to process the data, at least not on the large scale – instead it is used for data storage and light duty data retrieval. Even in this case however the […]

A closer look at the MySQL ibdata1 disk space issue and big tables

A recurring and very common customer issue seen here at the Percona Support team involves how to make the ibdata1 file “shrink” within MySQL. I can only imagine there’s a degree of regret by some of the InnoDB architects on their design decisions regarding disk-space management by the shared tablespace* because this has been a big […]

Using InfiniDB MySQL server with Hadoop cluster for data analytics

In my previous post about Hadoop and Impala I benchmarked performance of analytical queries in Impala. This time I’ve tried InfiniDB for Hadoop (open-source version) on the modern hardware with an 8-node Hadoop cluster. One of the main advantages (at least for me) of InifiniDB for Hadoop is that it stores the data inside the Hadoop cluster but uses the […]

How can we bring query to the data?

Baron recently wrote about sending the query to the data looking at distributed systems like Cassandra. I want to take a look at more simple systems like MySQL and see how we’re doing in this space. It is obvious getting computations as closer to the data as possible is the most efficient as we will […]

Join my Oct. 2 webinar: ‘Implementing MySQL and Hadoop for Big Data’

MySQL DBAs know that integrating MySQL and a big data solution can be challenging. That’s why I invite you to join me this Wednesday (Oct. 2) at 10 a.m. Pacific time for a free webinar in which I’ll walk you through how to implement a successful big data strategy with Apache Hadoop and MySQL. This […]

Why is the ibdata1 file continuously growing in MySQL?

We receive this question about the ibdata1 file in MySQL very often in Percona Support. The panic starts when the monitoring server sends an alert about the storage of the MySQL server – saying that the disk is about to get filled. After some research you realize that most of the disk space is used […]

Big Data with MySQL and Hadoop at MySQL Connect 2013

I will be talking about Big Data with MySQL and Hadoop at MySQL Connect 2013 (Sept. 21-22) in San Francisco as well as at Percona University at Washington, DC (September 12, 2013). Apache Hadoop is a very popular Big Data solution and we can nowadays easily integrate it with MySQL. I will start with a brief introduction […]

Distributed set processing performance analysis with ICE 3.5.2pl1 at 20 nodes.

Demonstrating distributed set processing performance Shard-Query + ICE scales very well up to at least 20 nodes This post is a detailed performance analysis of what I’ve coined “distributed set processing”. Please also read this post’s “sister post” which describes the distributed set processing technique. Also, remember that Percona can help you get up and […]

Distributed Set Processing with Shard-Query

Can Shard-Query scale to 20 nodes? Peter asked this question in comments to to my previous Shard-Query benchmark. Actually he asked if it could scale to 50, but testing 20 was all I could due to to EC2 and time limits. I think the results at 20 nodes are very useful to understand the performance: […]

Data mart or data warehouse?

This is part two in my six part series on business intelligence, with a focus on OLAP analysis. Part 1 – Intro to OLAP Identifying the differences between a data warehouse and a data mart. (this post) Introduction to MDX and the kind of SQL which a ROLAP tool must generate to answer those queries. […]