August 21, 2014

How can we bring query to the data?

Baron recently wrote about sending the query to the data looking at distributed systems like Cassandra. I want to take a look at more simple systems like MySQL and see how we’re doing in this space. It is obvious getting computations as closer to the data as possible is the most efficient as we will […]

Joining many tables in MySQL – optimizer_search_depth

Working on customer case today I ran into interesting problem – query joining about 20 tables (thank you ORM by joining all tables connected with foreign keys just in case) which would take 5 seconds even though in the read less than 1000 rows and doing it completely in memory. The plan optimizer picked was […]

Shard-Query turbo charges Infobright community edition (ICE)

Shard-Query is an open source tool kit which helps improve the performance of queries against a MySQL database by distributing the work over multiple machines and/or multiple cores. This is similar to the divide and conquer approach that Hive takes in combination with Hadoop. Shard-Query applies a clever approach to parallelism which allows it to […]

Should we give a MySQL Query Cache a second chance ?

Over last few years I’ve been suggesting more people to disable Query Cache than to enable it. It can cause contention problems as well as stalls and due to coarse invalidation is not as efficient as it could be. These are however mostly due to neglect Query Cache received over almost 10 years, with very […]

Recovering Innodb table Corruption

Assume you’re running MySQL with Innodb tables and you’ve got crappy hardware, driver bug, kernel bug, unlucky power failure or some rare MySQL bug and some pages in Innodb tablespace got corrupted. In such cases Innodb will typically print something like this: InnoDB: Database page corruption on disk or a failed InnoDB: file read of […]

Performance impact of complex queries

What is often underestimated is impact of MySQL Performance by complex queries on large data sets(ie some large aggregate queries) and batch jobs. It is not rare to see queries which were taking milliseconds to stall for few seconds, especially in certain OS configurations, and on low profile servers (ie having only one disk drive) […]

MySQL Query Cache

MySQL has a great feature called “Query Cache” which is quite helpful for MySQL Performance optimization tasks but there are number of things you need to know. First let me clarify what MySQL Query Cache is – I’ve seen number of people being confused, thinking MySQL Query Cache is the same as Oracle Query Cache […]

Why MySQL could be slow with large tables ?

If you’ve been reading enough database related forums, mailing lists or blogs you probably heard complains about MySQL being unable to handle more than 1.000.000 (or select any other number) rows by some of the users. On other hand it is well known with customers like Google, Yahoo, LiveJournal,Technocarati MySQL has installations with many billions […]

Using InfiniDB MySQL server with Hadoop cluster for data analytics

In my previous post about Hadoop and Impala I benchmarked performance of analytical queries in Impala. This time I’ve tried InfiniDB for Hadoop (open-source version) on the modern hardware with an 8-node Hadoop cluster. One of the main advantages (at least for me) of InifiniDB for Hadoop is that it stores the data inside the Hadoop cluster but uses the […]

High Availability with MySQL Fabric: Part II

This is the third post in our MySQL Fabric series. If you missed the previous two, we started with an overall introduction, and then a discussion of MySQL Fabric’s high-availability (HA) features. MySQL Fabric was RC when we started this series, but it went GA recently. You can read the press release here, and see this blog post from Oracle’s Mats […]