November 30, 2006

Interesting MySQL and PostgreSQL Benchmarks

Posted by peter |

I had found pile of MySQL and PostgreSQL benchmarks on various platforms which I have not seen before. Very interesting reading.
It does not share too much information about how MySQL or PostgreSQL was configured or about queries. Furthermore MySQL and PostgreSQL has a bit different implementations (ie SubQueries avoided for MySQL) so do not just compare it directly.

It also does not mention if Innodb or MyISAM tables are used – it turns out Both are used in the benchmark. This is CPU bound benchmark with working set fitting in memory.

MySQL and PostrgreSQL Scalability on Xeon Woodcrest, Opteron and Niagra
Pretty interesting to see how PostgreSQL scales just as systems should scale in theory – gradually goes up with number of threads about matches number of Cores/Threads and stays at this level at higher concurrency. MySQL with Innodb shows its ugly face and drops pretty quickly as concurrency growths with peak at about number of CPUs. I guess this is lucky case as Innodb may well start to slow down before concurrency reaches number of CPUs.

Yes, Innodb Team has provided the fix for this scalability problem and it is merged into MySQL 5.0.30 “Enterprise” but according to the tests I’ve done so far it is far from full solution yet.

It is also interesting to see CPU comparison in this test. Woodcrest has best performance in this test (and in many other MySQL tests), Opteron comes second and older Intel Xeons as well as Niagra being outsiders.

Niagra scalability is one more interesting story. As you can see MySQL 4.1 actually scaled pretty well with Niagra, suffering slow regression with increased concurrency rather than quick drop. In MySQL 5.0 it is changed dramatically – it climbs to higher peak but it drops down very quickly as well as concurrency growths. It is seen much better on this picture

Linux vs Solaris comparison is also pretty interesting. With MySQL Linux has higher peak but Solaris suffers less with increased concurrency.

Note: I have not validated these benchmarks and as I already mentioned they do not have full disclosure. They however do match my own experience with MySQL so I tend to trust PostgreSQL data points as well.

November 29, 2006

BoardReader – Forum Search Engine

Posted by peter |

One may have notice we were not blogging too much recently, this is because we were quite busy, mainly building BoardReader.com – Search Engine which indexes tens of thousands of forums from all over the world. This project was built by us as consulting project so too bad we do not own it completely but we’re still quite excited it is live now. We did not work on crawler in this project only on database Backend and full text search engine implementation. In this part it is standard LAMPS application. I guess you know what LAMP is and S Stands for Sphinx – Full Text Search Engine which we love to use where large scale search is needed. At this point we have over 300 millions of posts indexed with only 3 search servers and still counting. I guess we’ll have half a billion of forum posts soon.
[read more...]

November 27, 2006

Trailing spaces in MySQL

Posted by peter |

In the past life was easy in MySQL. Both CHAR and VARCHAR types meant the same, only being difference in the sense of fixed or dynamic row length used. Trailing spaces were removed in both cases.

With MySQL 5.0 however things changed so now VARCHAR keeps trailing spaces while CHAR columns do not any more. Well in reality CHAR columns are padded to full length with spaces but it is invisible as those trailing spaces are removed upon retrieval. This is something you need to watch both upgrading to MySQL 5.0 as well as designing your applications – you should keep into account if you mind trailing spaces stored choosing VARCHAR vs CHAR in addition to fixed length vs dynamic level rows and space spent for column size counter.

There is more fun stuff with trailing spaces. When comparison is done trailing spaces are always removed, even if VARCHAR column is used which is pretty counterintuitive. So “a “=”a”=”a ” for all textual column types – CHAR, VARCHAR, TEXT. BLOB is exception it will preserve trailing spaces and use them in comparison.

November 23, 2006

Covering index and prefix indexes

Posted by peter |

I already wrote in the blog as well mentioned in presentation there is often a choice you have to make between having prefix index – which can be significantly smaller in size and having index being covering index, which means query can be executed using only data from the index without reading the row itself.

Today I had a chance to do couple of experiments to see when exactly it works or does not work:
[read more...]

November 21, 2006

Opening Tables scalability

Posted by peter |

I was restarting MySQL on box with 50.000 of Innodb tables and again it took couple of hours to reach decent performance because of “Opening Tables” stage was taking long.

Part of the problem is Innodb is updating stats on each table open which is possibly expensive operation, but really it is only great test case for general MySQL problem.

During warmup process I noticed I get very low CPU usage and disk Usage and IOWait about 25% (4CPU box) which indicates table opens and stats updates are serialized rather than performed in parallel.

I’ve checked with Heikki and he confirmed MySQL has global table cache mutex (LOCK_open) which is held for all open table operation so table opens are serialized.

Do not think however it is only Innodb problem. I’ve seen similar problems with MyISAM – these also take few IO operations to open and could take quite a while to close if there were unflushed key blocks and we have no idea what other storage engines may do to perform table open – some may need network operation etc.

In general I think it is extremely poor design choice to have global mutexes for anything which may require blocking physical IO or network operation if you care about scalability. MySQL has same problem with key buffer but that was fixed in MySQL 4.1, this one still remains.

I do not blame Monty – implementing first MySQL version in 3 months he had to take shortcuts and implement most simple solutions for many things. Now years later and having 50+ developers this should have been fixed.

Until this is fixed it is especially important to keep your table_cache large enough so table opens will be rare (I use 1/sec as a number to worry about) and also worry about possible limited performance while table cache is being warmed up in addition to all other caches :)

November 17, 2006

Feature Idea: Finding columns which query needs to access

Posted by peter |

In query examinations it is often interesting which columns query needs to access to provide result set as it gives you ideas if you can use covering indexes to speed things up or even cache some data by denormalizing tables. So far it has to be done manually – look at SELECT clause, WHERE clause, ORDER BY GROUP BY and HAVING to sum it up, not to mention more complex questions of subselects.

It would be great tool for MySQL Performance Optimization if MySQL Server could show this table and I know it already has this data as it is used by the optimizer.

It would be also good if such tool would show mapping of WHERE clauses to the tables as well as other clauses such as GROUP BY and ORDER BY. For complex queries especially if they do not refer to the columns with table prefix it may take a while to figure our where these columns come from and so how they can be optimized.

November 16, 2006

Rebuilding MySQL Binary

Posted by peter |

Running RPM based or other packaged MySQL Binary you may have a problem if you would like to rebuild binary for some reason – change some build settings, apply some third party patches or simply try latest snapshot (This time I was both applying patches for Innodb scalability and Vadim’s patches to get proper slow query log). There are number of things as default paths and GCC build and link options you would like to keep the same as in original binary to minimize the change and finding this was not pleasant.

This time however Vadim advised me better solution – run mysqlbug script and get configure options from it. This is so easy.

After build is completed I simply take mysqld from sql directory and try it out as this is the only piece which I’m interested in. Quick to try, quick to rollback. You may run make install though if you need to but better to be ready to reinstall rpm after it :)

November 14, 2006

Bug fix of InnoDB scalability problem

Posted by Vadim |

I was pretty busy last month with project which will be annonced very soon (I hope), but I can’t miss bug fix
of my favorite bug 15815. I wrote about this problem before and also investigated in my presentation.
Finally bug fix was pushed into 5.0-bk tree and now I have it in my hands.

Let me refresh results with MySQL 5.0.27 (without bugfix):

select sql_calc_found_rows * from b limit 5;
executes for 20 sec (table b contains ~2 mil rows)

the same query but in 4 concurrent threads:
98 sec
101 sec
103 sec
103 sec

(I am still using 4-CPU box and expecting the same time of execution for each thread).

The results with 5.0.30-bk:

1 thread:
19 sec

4 threads:
28 sec
29 sec
39 sec
41 sec

Great improvement! But scalablity factor is not such perfect as I’d expect, because with my own patch
I was able to reach better performance. Also as you see there is a thread starvation – last thread executes 1.4 times slower. So I’d want InnoDB team continue to work on performance improvement.

Anyway I think it’s good news.

November 12, 2006

Trying Archive Storage Engine

Posted by peter |

Today I noticed one of server used for web request profiling stats logging is taking about 2GB per day for logs, which are written in MyISAM table without indexes. So I thought it is great to try how much archive storage engine could help me in this case.
[read more...]

Are PHP persistent connections evil ?

Posted by peter |

As you probably know PHP “mysql” extension supported persistent connections but they were disabled in new “mysqli” extension, which is probably one of the reasons some people delay migration to this extension.

The reason behind using persistent connections is of course reducing number of connects which are rather expensive, even though they are much faster with MySQL than with most other databases.
[read more...]