July 28, 2014

Using InfiniDB MySQL server with Hadoop cluster for data analytics

In my previous post about Hadoop and Impala I benchmarked performance of analytical queries in Impala. This time I’ve tried InfiniDB for Hadoop (open-source version) on the modern hardware with an 8-node Hadoop cluster. One of the main advantages (at least for me) of InifiniDB for Hadoop is that it stores the data inside the Hadoop cluster but uses the […]

Beware of MySQL 5.6 server UUID when cloning slaves

The other day I was working on an issue where one of the slaves was showing unexpected lag. Interestingly with only the IO thread running the slave was doing significantly more IO as compared to the rate at which the IO thread was fetching the binary log events from the master. I found this out […]

Identifying the load with the help of pt-query-digest and Percona Server

Overview Profiling, analyzing and then fixing queries is likely the most oft-repeated part of a job of a DBA and one that keeps evolving, as new features are added to the application new queries pop up that need to be analyzed and fixed. And there are not too many tools out there that can make […]

Impact of the number of idle connections in MySQL

Be careful with my findings, I appear to have compile in debug mode, I am redoing the benchmarks. Updated version here. I recently had to work with many customers having large number of connections opened in MySQL and although I told them this was not optimal, I had no solid arguments to present. More than […]

Star Schema Bechmark: InfoBright, InfiniDB and LucidDB

In my previous rounds with DataWarehouse oriented engines I used single table without joins, and with small (as for DW) datasize (see http://www.mysqlperformanceblog.com/2009/10/02/analyzing-air-traffic-performance-with-infobright-and-monetdb/, http://www.mysqlperformanceblog.com/2009/10/26/air-traffic-queries-in-luciddb/, http://www.mysqlperformanceblog.com/2009/11/02/air-traffic-queries-in-infinidb-early-alpha/). Addressing these issues, I took Star Schema Benchmark, which is TPC-H modification, and tried run queries against InfoBright, InfiniDB, LucidDB and MonetDB. I did not get results for MonetDB, will […]

How to generate per-database traffic statistics using mk-query-digest

We often encounter customers who have partitioned their applications among a number of databases within the same instance of MySQL (think application service providers who have a separate database per customer organization … or wordpress-mu type of apps). For example, take the following single MySQL instance with multiple (identical) databases:

Missing Data – rows used to generate result set

As Baron writes it is not the number of rows returned by the query but number of rows accessed by the query will most likely be defining query performance. Of course not all row accessed are created equal (such as full table scan row accesses may be much faster than random index lookups row accesses […]

Stored Function to generate Sequences

Today a customer asked me to help them to convert their sequence generation process to the stored procedure and even though I have already seen it somewhere I did not find it with two minutes of googling so I wrote a simple one myself and posting it here for public benefit or my later use

Progress with ClickAider project

About three months ago I announced ClickAider to become available to general public. And I think it is about the time to write about the progress we have with this project for those who interested. The project generates decent interest and we have about 3000 sites Registered over this time, which I consider decent number […]

ClickAider – Track Adsense Clicks and much more

Let me announce ClickAider – another projects we were working on in stealth mode for last several Months. ClickAider is Hosted Web Statistics system but it tracks Clicks rather than page views as most web counters do. And by clicks I mean not just clicks on the urls and images but clicks on many sophisticated […]