April 25, 2014

Percona XtraDB Cluster – installation and setup webinar follow up Q&A

Thanks for all, who attended my webinar, I got many questions and I wanted to take this opportunity to answer them. Q: Even ntp has a delay of 0.3-0.4 between servers does that mean a 0.25 as from logs can be an issue ? A: My demo vms were running for a few hours before […]

High availability for MySQL on Amazon EC2 – Part 2 – Setting up the initial instances

This post is the second of a series that started here. The first step to build the HA solution is to create two working instances, configure them to be EBS based and create a security group for them. A third instance, the client, will be discussed in part 7. Since this will be a proof […]

Missing Data – rows used to generate result set

As Baron writes it is not the number of rows returned by the query but number of rows accessed by the query will most likely be defining query performance. Of course not all row accessed are created equal (such as full table scan row accesses may be much faster than random index lookups row accesses […]

Speeding up GROUP BY if you want aproximate results

Doing performance analyzes today I wanted to count how many hits come to the pages which get more than couple of visits per day. We had SQL logs in the database so It was pretty simple query:

Unfortunately this query ran for over half an hour badly overloaded server and I had to kill […]

Using GROUP BY WITH ROLLUP for Reporting Performance Optimization

Quite typical query for reporting applications is to find top X values. If you analyze Web Site logs you would look at most popular web pages or search engine keywords which bring you most of the traffic. If you’re looking at ecommerce reporting you may be interested in best selling product or top sales people. […]

PHP Large result sets and summary tables.

We’re working with web site preparing for massive growth. To make sure it handles large data sets as part of the process we work on generation test database of significant size as testing your application on table with 1000 rows may well give you very dangerous false sense of security. One of the process web […]

Are larger buffers always better ?

Sometimes I see people thinking about buffers as “larger is always better” so if “large” MySQL sample configuration is designed for 2GB and they happen to have 16, they would simply multiply all/most values by 10 and hope it will work well. Obviously it does not. The least problem would be wasting memory, allocating a […]

Group commit and real fsync

During the recent months I’ve seen few cases of customers upgrading to MySQL 5.0 and having serious performance slow downs, up to 10 times in certain cases. What was the most surprising for them is the problem was hardware and even OS specific – it could show up with one OS version but not in […]

Using Apache Hadoop and Impala together with MySQL for data analysis

Apache Hadoop is commonly used for data analysis. It is fast for data loads and scalable. In a previous post I showed how to integrate MySQL with Hadoop. In this post I will show how to export a table from  MySQL to Hadoop, load the data to Cloudera Impala (columnar format) and run a reporting […]

A conversation with 5 Facebook MySQL gurus

Facebook, the undisputed king of online social networks, has 1.23 billion monthly active users collectively contributing to an ocean of data-intensive tasks – making the company one of the world’s top MySQL users. A small army of Facebook MySQL experts will be converging on Santa Clara, Calif. next week where several of them are leading […]