Both article itself and responce cause mixed feellings so I decided it is worth commenting:
1. Using mysql_* functions directly This is probably bad but I do not like solutions proposed by original article ether. PEAR is slow as well as other complex conectors. I have not yet tested PDO but would not expect it to beat MySQLi in speed. It is however bad idea to use mysql_ functions directly as well – I would go for using mysqli object approach. The great things about objects is you can easily overload methods and get debugging and profiling tools, as well as have tools which protect you from SQL Injections.
For example I have little wrapper which allows to do $dbcon->query(“Select email from user where name=%s”,$name) – wrapper will detect query is being called with multiple parameters and will perform needed checks and query rewriting. You also can use pretty much direct path to mysqli extension to performance critical queries if you need.
I would also note for many PHP applications abstraction layer is not the main performance problem, also benefit from persistent connections can be much more modest. DVD Store was special type of application which was designed to have very simple logic besides database – in most cases you would have beautiful page rendering as well as much more queries per page which will make performance improvement much smaller. Notable exception being AJAX applications which may do very little work and formating, so database connection may become the issue. Caching should be good help in this case though.
About Consulting – it is worth to mention it was my group which was Dell DVD Store optimization, and I’m now on my own, offering MySQL and LAMP Consuilting Services.
2. Not using auto_increment functionality This is right. With some exception however. For example Innodb tables do internal full table lock if auto_increment is used so using values generated elseware might be faster.
3. Using multiple databases Honestly I do not see application using one database per table that often. I however often see applications using multiple databases to group tables by certain logic, such as you do with directories to group files. I think this makes a lot of sense. Sometimes grouping is done so a lot of databases are needed – for example if grouping is done by user. This might be a bit extreem if you have thousands of users – I would rather do many to many relationship between users and tables but it also might work.
Regarding if you use many tables you’re doing something wrong it is frequently told by people with traditional database background. Things are different with MySQL.
There are many successful applications, using tens of thousands of tables per host and archiving great performance by doing so.
Using multiple tables gives some very important benefits – your data becomes managable, your ALTER TABLE or OPTIMIZE TABLE now locks small table for few seconds rather than giant 100GB table for few hours so can be done pretty much online. You also get good data clustering so table becomes hot very quickly due to data locality once this user starts his queries. It is also much easier to do backup and restore if you need only portion of your data recovered.
There are some performance problems with many tables some are OS and File System dependent, others correspond to Innodb storage engine or using innodb_file_per_table option in particular.
4. Not using relations This one is right one but also with the catch. It is very traditional recommendation to normalize your data however it does not always bring good performance. Joins are expensive and you can often do much better with denormalized data. You may wish to use denormalized data as cached lookup table however so you do not have all these problems with loosing data etc. Read more in my Why MySQL Could be slow with Large Tables article.
5. The n+1 pattern This probably should rather be called Not using Join. This is typical error. On other hand in MySQL you might be better of using several queries than doing complicated ones. Of course you would rather use IN() than do 100 of queries in this case. This most applies to subqueries Where Subselects with IN() become corellated even if they are not, and so using IN() list of values derived by previous query. For example you can do:
SELECT id FROM users WHERE featured=1; Now populate List for IN on your PHP application: SELECT * FROM articles WHERE user_id IN(23,545,654,34) instead of: SELECT * FROM articles WHERE user_id IN (SELECT id FROM users WHERE featured=1)
Some day this should be fixed however but do not expect it soon.
Use Indexes This item was not in original article, however I think this is the most common mistake and it is very important to fix it. Most applications I have to fix have number of indexing missing which requires queries to do full table scans. Funny enough this is often not the problem in the beginning – if application is bought or custom ordered it frequently can pass customer QA – it will work quite fast with almost empty database. With database growth it will however start to crawl.
So developing you PHP applications use test database with reasonable amount of data in it. And do run EXPLAIN for your queries, especially if you see them in slow query log. If you have trouble understanding EXPLAIN or optimizing your queries remember
we’re here to help.