August 11, 2006

Database problems in MySQL/PHP Applications

Posted by peter |

Article about database design problems is being discussed by Kristian.

Both article itself and responce cause mixed feellings so I decided it is worth commenting:

1. Using mysql_* functions directly This is probably bad but I do not like solutions proposed by original article ether. PEAR is slow as well as other complex conectors. I have not yet tested PDO but would not expect it to beat MySQLi in speed. It is however bad idea to use mysql_ functions directly as well - I would go for using mysqli object approach. The great things about objects is you can easily overload methods and get debugging and profiling tools, as well as have tools which protect you from SQL Injections.

For example I have little wrapper which allows to do $dbcon->query("Select email from user where name=%s",$name) - wrapper will detect query is being called with multiple parameters and will perform needed checks and query rewriting. You also can use pretty much direct path to mysqli extension to performance critical queries if you need.

I would also note for many PHP applications abstraction layer is not the main performance problem, also benefit from persistent connections can be much more modest. DVD Store was special type of application which was designed to have very simple logic besides database - in most cases you would have beautiful page rendering as well as much more queries per page which will make performance improvement much smaller. Notable exception being AJAX applications which may do very little work and formating, so database connection may become the issue. Caching should be good help in this case though.

About Consulting - it is worth to mention it was my group which was Dell DVD Store optimization, and I'm now on my own, offering MySQL and LAMP Consuilting Services.

2. Not using auto_increment functionality This is right. With some exception however. For example Innodb tables do internal full table lock if auto_increment is used so using values generated elseware might be faster.

3. Using multiple databases Honestly I do not see application using one database per table that often. I however often see applications using multiple databases to group tables by certain logic, such as you do with directories to group files. I think this makes a lot of sense. Sometimes grouping is done so a lot of databases are needed - for example if grouping is done by user. This might be a bit extreem if you have thousands of users - I would rather do many to many relationship between users and tables but it also might work.

Regarding if you use many tables you're doing something wrong it is frequently told by people with traditional database background. Things are different with MySQL.

There are many successful applications, using tens of thousands of tables per host and archiving great performance by doing so.
Using multiple tables gives some very important benefits - your data becomes managable, your ALTER TABLE or OPTIMIZE TABLE now locks small table for few seconds rather than giant 100GB table for few hours so can be done pretty much online. You also get good data clustering so table becomes hot very quickly due to data locality once this user starts his queries. It is also much easier to do backup and restore if you need only portion of your data recovered.

There are some performance problems with many tables some are OS and File System dependent, others correspond to Innodb storage engine or using innodb_file_per_table option in particular.

4. Not using relations This one is right one but also with the catch. It is very traditional recommendation to normalize your data however it does not always bring good performance. Joins are expensive and you can often do much better with denormalized data. You may wish to use denormalized data as cached lookup table however so you do not have all these problems with loosing data etc. Read more in my Why MySQL Could be slow with Large Tables article.

5. The n+1 pattern This probably should rather be called Not using Join. This is typical error. On other hand in MySQL you might be better of using several queries than doing complicated ones. Of course you would rather use IN() than do 100 of queries in this case. This most applies to subqueries Where Subselects with IN() become corellated even if they are not, and so using IN() list of values derived by previous query. For example you can do:

SQL:
  1. SELECT id FROM users WHERE featured=1;
  2.  
  3. Now populate List FOR IN ON your PHP application:
  4.  
  5. SELECT * FROM articles WHERE user_id IN(23,545,654,34)
  6.  
  7. instead of:
  8.  
  9. SELECT * FROM articles WHERE user_id IN (SELECT id FROM users WHERE featured=1)

Some day this should be fixed however but do not expect it soon.

Use Indexes This item was not in original article, however I think this is the most common mistake and it is very important to fix it. Most applications I have to fix have number of indexing missing which requires queries to do full table scans. Funny enough this is often not the problem in the beginning - if application is bought or custom ordered it frequently can pass customer QA - it will work quite fast with almost empty database. With database growth it will however start to crawl.
So developing you PHP applications use test database with reasonable amount of data in it. And do run EXPLAIN for your queries, especially if you see them in slow query log. If you have trouble understanding EXPLAIN or optimizing your queries remember
we're here to help.

MySQL Server Upgrade

Posted by peter |

Today I've upgraded MySQL Server on the host running MySQL Performance Blog. MySQL 4.1.12 was running here for well over a year before that.

Why Have not I upgraded before ? Well because it just worked fine. Yes I know there were some security fixes but I have dedicated server with remote MySQL access closed by firewall and only trusted people having local access so I was not worried too much about it.

Why did I upgrade now ? Because new forum application was exposing one of the bugs in MySQL 4.1.12 so there was a need for upgrade.

Which Version did I upgrade to ? I Upgraded to MySQL 4.1.21. Why Did not I go with MySQL 5.0 ? There are three reasons. First there are number of changes in 5.0 some of which are incompatible with MySQL 4.1, for example regarding join syntax. I can't be 100% sure none of my applications will be affected and I see no reasons to spend time testing it carefully or downgrading if problem finally discovered a week later in some rare circumstances. Not to mention there is the chance new application I might be installing does not yet works with MySQL 5.0.

Second - I know my applications designed to work with MySQL 4.1, they avoid queries which MySQL 4.1 does not optimize well and so MySQL 5.0 will likely reduce performance. It is probably just 5-10% for most queries but this site runs on old Celeron box which does not have too much CPU cycles to waste.

Third - My sense of MySQL 5.0 stability is still not at MySQL 4.1 level. This is not exactly problem in this case - this site uses rather simple workload so MySQL 5.0 probably would do as well as MySQL 4.1 which even in its ancient 4.1.12 version never crashed on this site. However new applications which bring new workloads have higher chance of problems on MySQL 5.0 than on 4.1

This is upgrade strategy I follow for this server which has mix of various third party PHP/MySQL applications. In different situations upgrade strategies may be different. The main point I'm making is - you do not have to rush and use latest MySQL version neither it always makes sense to upgrade to each minor release if there are no fixes which affect you.