July 29, 2014

Speeding up GROUP BY if you want aproximate results

Doing performance analyzes today I wanted to count how many hits come to the pages which get more than couple of visits per day. We had SQL logs in the database so It was pretty simple query:

Unfortunately this query ran for over half an hour badly overloaded server and I had to kill […]

Getting real life query speeds with MySQL

To check for query performance improvements followed indexing/query changes or MySQL configuration changes our customers often decide to run the query and see if there is any significant improvement. Leaving aside question of checking single query alone might not be the best way to see real improvement for your application, the problem they usually run […]

Reference architecture for a write-intensive MySQL deployment

We designed Percona Cloud Tools (both hardware and software setup) to handle a very high-intensive MySQL write workload. For example, we already observe inserts of 1bln+ datapoints per day. So I wanted to share what kind of hardware we use to achieve this result. Let me describe what we use, and later I will explain […]

Using UDFs for geo-distance search in MySQL

In my previous post about geo-spatial search in MySQL I described (along with other things) how to use geo-distance functions. In this post I will describe the geo-spatial distance functions in more details. If you need to calculate an exact distance between 2 points on Earth in MySQL (very common for geo-enabled applications) you have at […]

Using InfiniDB MySQL server with Hadoop cluster for data analytics

In my previous post about Hadoop and Impala I benchmarked performance of analytical queries in Impala. This time I’ve tried InfiniDB for Hadoop (open-source version) on the modern hardware with an 8-node Hadoop cluster. One of the main advantages (at least for me) of InifiniDB for Hadoop is that it stores the data inside the Hadoop cluster but uses the […]

Why ALTER TABLE runs faster on Percona Server 5.5 vs. MySQL 5.5

Some of us Perconians are at OpenStack summit this week in Atlanta. Matt Griffin, our director of product management, tweeted about the turbo-hipster CI talk about their experience of ALTER TABLEs running faster on Percona Server. Oracle’s Morgan Tocker then tweeted in response, asking why this was the case. I decided that the simplest way to […]

Exploring message brokers

Message brokers are not regularly covered here but are, nonetheless, important web-related technologies. Some time ago, I was asked by one of our customer to review a selection of OSS message brokers and propose a couple of good candidates. The requirements were fairly simple: behave well when there’s a large backlog of messages, be able […]

Parallel Query for MySQL with Shard-Query

While Shard-Query can work over multiple nodes, this blog post focuses on using Shard-Query with a single node.  Shard-Query can add parallelism to queries which use partitioned tables.  Very large tables can often be partitioned fairly easily. Shard-Query can leverage partitioning to add paralellism, because each partition can be queried independently. Because MySQL 5.6 supports the […]

Schema changes – what’s new in MySQL 5.6?

Among many of the improvements you can enjoy in MySQL 5.6, there is one that addresses a huge operational problem that most DBAs and System Administrators encounter in their life: schema changes. While it is usually not a problem for small tables or those in early stages of product life cycle, schema changes become a […]

When it’s faster to use SQL in MySQL NDB Cluster over memcache API

Memcache access for MySQL Cluster (or NDBCluster) provides faster access to the data because it avoids the SQL parsing overhead for simple lookups – which is a great feature. But what happens if I try to get multiple records via memcache API (multi-GET) and via SQL (SELECT with IN())? I’ve encountered this a few times […]