April 19, 2014

Database problems in MySQL/PHP Applications

Article about database design problems is being discussed by Kristian. Both article itself and responce cause mixed feellings so I decided it is worth commenting: 1. Using mysql_* functions directly This is probably bad but I do not like solutions proposed by original article ether. PEAR is slow as well as other complex conectors. I […]

Why MySQL could be slow with large tables ?

If you’ve been reading enough database related forums, mailing lists or blogs you probably heard complains about MySQL being unable to handle more than 1.000.000 (or select any other number) rows by some of the users. On other hand it is well known with customers like Google, Yahoo, LiveJournal,Technocarati MySQL has installations with many billions […]

Read/Write Splitting with PHP Webinar Questions Followup

Today I gave a presentation on “Read/Write Splitting with PHP” for Percona Webinars.  If you missed it, you can still register to view the recording and my slides. Thanks to everyone who attended, and especially to folks who asked the great questions.  I answered as many as I could during the session, but here are […]

Find and remove duplicate indexes

Having duplicate keys in our schemas can hurt the performance of our database: They make the optimizer phase slower because MySQL needs to examine more query plans. The storage engine needs to maintain, calculate and update more index statistics DML and even read queries can be slower because MySQL needs update fetch more data to […]

Using any general purpose computer as a special purpose SIMD computer

Often times, from a computing perspective, one must run a function on a large amount of input. Often times, the same function must be run on many pieces of input, and this is a very expensive process unless the work can be done in parallel. Shard-Query introduces set based processing, which on the surface appears […]

Distributed Set Processing with Shard-Query

Can Shard-Query scale to 20 nodes? Peter asked this question in comments to to my previous Shard-Query benchmark. Actually he asked if it could scale to 50, but testing 20 was all I could due to to EC2 and time limits. I think the results at 20 nodes are very useful to understand the performance: […]

Introducing our Percona Live speakers

We have mostly finalized the Percona Live schedule at this point, and I thought I’d take a few minutes to introduce who’s going to be speaking and what they’ll cover. A brief explanation first: we’ve personally recruited the speakers, which is why it has been a slow process to finalize and get abstracts on the […]

Data mart or data warehouse?

This is part two in my six part series on business intelligence, with a focus on OLAP analysis. Part 1 – Intro to OLAP Identifying the differences between a data warehouse and a data mart. (this post) Introduction to MDX and the kind of SQL which a ROLAP tool must generate to answer those queries. […]

“Shard early, shard often”

I wrote a post a while back that said why you don’t want to shard.  In that post that I tried to explain that hardware advances such as 128G of RAM being so cheap is changing the point at which you need to shard, and that the (often omitted) operational issues created by sharding can […]

Should you move from MyISAM to Innodb ?

There is significant portion of customers which are still using MyISAM when they come to us, so one of the big questions is when it is feasible to move to Innodb and when staying on MyISAM is preferred ? I generally prefer to see Innodb as the main storage engine because it makes life much […]