A new Percona Toolkit series has been released: Percona Toolkit 2.2 for MySQL 5.6. Several months in the making and many new features, changes, and improvements make this a great new series. It replaces the 2.1 series for which we plan to do only one more bug fix release (as 2.1.10) then retire. 2.2 is [...]
Read/Write Splitting with PHP Webinar Questions Followup
Today I gave a presentation on “Read/Write Splitting with PHP” for Percona Webinars. If you missed it, you can still register to view the recording and my slides. Thanks to everyone who attended, and especially to folks who asked the great questions. I answered as many as I could during the session, but here are [...]
MySQL Indexing Best Practices: Webinar Questions Followup
I had a lot of questions on my MySQL Indexing: Best Practices Webinar (both recording and slides are available now) We had lots of questions. I did not have time to answer some and others are better answered in writing anyway. Q: One developer on our team wants to replace longish (25-30) indexed varchars with [...]
Find and remove duplicate indexes
Having duplicate keys in our schemas can hurt the performance of our database: They make the optimizer phase slower because MySQL needs to examine more query plans. The storage engine needs to maintain, calculate and update more index statistics DML and even read queries can be slower because MySQL needs update fetch more data to [...]
Using any general purpose computer as a special purpose SIMD computer
Often times, from a computing perspective, one must run a function on a large amount of input. Often times, the same function must be run on many pieces of input, and this is a very expensive process unless the work can be done in parallel. Shard-Query introduces set based processing, which on the surface appears [...]
Distributed Set Processing with Shard-Query
Can Shard-Query scale to 20 nodes? Peter asked this question in comments to to my previous Shard-Query benchmark. Actually he asked if it could scale to 50, but testing 20 was all I could due to to EC2 and time limits. I think the results at 20 nodes are very useful to understand the performance: [...]
Introducing our Percona Live speakers
We have mostly finalized the Percona Live schedule at this point, and I thought I’d take a few minutes to introduce who’s going to be speaking and what they’ll cover. A brief explanation first: we’ve personally recruited the speakers, which is why it has been a slow process to finalize and get abstracts on the [...]
How to Identify Bad Queries in MySQL
Finding bad queries is a big part of optimization. A scientific optimization process can be simplified to “can anything be improved for less than it costs not to improve it? – if not, we’re done.” In databases, we care most about the work the database is doing. That is, queries. There are other things we [...]
Data mart or data warehouse?
This is part two in my six part series on business intelligence, with a focus on OLAP analysis. Part 1 – Intro to OLAP Identifying the differences between a data warehouse and a data mart. (this post) Introduction to MDX and the kind of SQL which a ROLAP tool must generate to answer those queries. [...]
“Shard early, shard often”
I wrote a post a while back that said why you don’t want to shard. In that post that I tried to explain that hardware advances such as 128G of RAM being so cheap is changing the point at which you need to shard, and that the (often omitted) operational issues created by sharding can [...]

