May 22, 2013

The case for getting rid of duplicate “sets”

The most useful feature of the relational database is that it allows us to easily process data in sets, which can be much faster than processing it serially. When the relational database was first implemented, write-ahead-logging and other technologies did not exist. This made it difficult to implement the database in a way that matched [...]

Optimizing slow web pages with mk-query-digest

I don’t use many tools in my consulting practice but for the ones I do, I try to know them as best as I can. I’ve been using mk-query-digest for almost as long as it exists but it continues to surprise me in ways I couldn’t imagine it would. This time I’d like to share [...]

MySQL on Amazon RDS part 2: Determining Peak Throughput

This is a continuation of my series of benchmark posts comparing Amazon RDS to a server running on Amazon EC2. Upcoming posts (probably 6 or 8 in total) will extend the scope of the benchmark to include data on our Dell r900 with traditional hard drives in RAID10, and a server in the Joyent cloud. [...]

Modeling MySQL Capacity by Measuring Resource Consumptions

There are many angles you can look at the system to predict in performance, the model baron has published for example is good for measuring scalability of the system as concurrency growths. In many cases however we’re facing a need to answer a question how much load a given system can handle when load is [...]

Shard-Query adds parallelism to queries

Preamble: On performance, workload and scalability: MySQL has always been focused on OLTP workloads. In fact, both Percona Server and MySQL 5.5.7rc have numerous performance improvements which benefit workloads that have high concurrency. Typical OLTP workloads feature numerous clients (perhaps hundreds or thousands) each reading and writing small chunks of data. The recent improvements to [...]

FlashCache: more benchmarks

Previously I covered simple case with FlashCache, when data fits into cache partitions, now I am trying to test when data is bigger than cache. But before test setup let me address some concern (which I also had). Intel X25-M has a write cache which is not battery backuped, so there is suspect you may [...]

FlashCache: first experiments

I wrote about FlashCache there, and since that I run couple benchmarks, to see what performance benefits we can expect. For initial tries I took sysbench oltp tests ( read-only and read-write) and case when data fully fits into L2 cache. I made binaries for FlashCache for CentOS 5.4, kernel 2.6.18-164.15, you can download it [...]

MySQL 5.5.4 in tpcc-like workload

MySQL-5.5.4 ® is the great release with performance improvements, let’s see how it performs in tpcc-like workload. The full details are on Wiki page http://www.percona.com/docs/wiki/benchmark:mysql:554-tpcc:start I took MySQL-5.5.4 with InnoDB-1.1, tpcc-mysql benchmark with 200W ( about 18GB worth of data), InnoDB log files are 3.8GB size, and run with different buffer pools from 20GB to [...]

RAID throughput on FusionIO

Along with maximal possible fsync/sec it is interesting how different software RAID modes affects throughput on FusionIO cards. In short conclusion, RAID10 modes really disappoint me, the detailed numbers to follow. To get numbers I run

test with 16KB page size, random read and writes, 1 and 16 threads, O_DIRECT mode. FusionIO cards are [...]

Getting around optimizer limitations with an IN() list

There was a discussion on LinkedIn one month ago that caught my eye: Database search by “within x number of miles” radius? Anyone out there created a zipcode database and created a “search within x numer of miles” function ? Thankful for any tips you can throw my way.. J A few people commented that [...]