There are many ways to slice and aggregate metrics of activity on a system such as MySQL. In the best case, we want to know everything about the system’s activity: we want to know how many things happened, how big they were, and how long they took. We want to know precisely when they happened. [...]
MySQL on Amazon RDS part 2: Determining Peak Throughput
This is a continuation of my series of benchmark posts comparing Amazon RDS to a server running on Amazon EC2. Upcoming posts (probably 6 or 8 in total) will extend the scope of the benchmark to include data on our Dell r900 with traditional hard drives in RAID10, and a server in the Joyent cloud. [...]
Percona white paper: Forecasting MySQL Scalability
Ewen and I have just published Percona’s latest white paper, Forecasting MySQL Scalability with the Universal Scalability Law. This is essentially a streamlined walk-through of Dr. Neil J. Gunther’s book Guerrilla Capacity Planning, with examples to show how you can apply it to MySQL servers. One thing alluded to in the paper is extracting the [...]
The perils of uniform hardware and RAID auto-learn cycles
Last night a customer had an emergency in selected machines on a large cluster of quite uniform database servers. Some of the servers were slowing down in a very puzzling way over a short time span (a couple of hours). Queries were taking multiple seconds to execute instead of being practically instantaneous. But nothing seemed [...]
More on dangers of the caches
I wrote couple of weeks ago on dangers of bad cache design. Today I’ve been troubleshooting the production down case which had fair amount of issues related to how cache was used. The deal was as following. The update to the codebase was performed and it caused performance issues, so it was rolled back but [...]
Introducing tcprstat, a TCP response time tool
Ignacio Nin and I (mostly Ignacio) have worked together to create tcprstat[1], a new tool that times TCP requests and prints out statistics on them. The output looks somewhat like vmstat or iostat, but we’ve chosen the statistics carefully so you can compute meaningful things about your TCP traffic. What is this good for? In [...]
Estimating Replication Capacity
It is easy for MySQL replication to become bottleneck when Master server is not seriously loaded and the more cores and hard drives the get the larger the difference becomes, as long as replication remains single thread process. At the same time it is a lot easier to optimize your system when your replication runs [...]
On Good Instrumentation
In so many cases troubleshooting applications I keep thinking how much more efficient things could be going if only there would be a good instrumentation available. Most of applications out there have very little code to help understand what is going on and if it is there it is frequently looking at some metrics which [...]
Why you should ignore MySQL’s key cache hit ratio
I have not caused a fist fight in a while, so it’s time to take off the gloves. I claim that somewhere around of 99% of advice about tuning MySQL’s key cache hit ratio is wrong, even when you hear it from experts. There are two major problems with the key buffer hit ratio, and [...]
Watch out for your CRON jobs
Resolving extreme database overload for the customer recently I have found about 80 copies of same cron job running hammering the database. This number is rather extreme typically the affect is noticed and fixed well before that but the problem with run away cron jobs is way to frequent. If slow down happens on the [...]

