We were doing MySQL Performance evaluation on TPC-H queries for the client and they kindly allowed us to publish results which are very interesting.
This is obviously not audited TPC-H run, and it can’t be because we used MyISAM tables which are not ACID complaint. Plus we only measured Power to keep things simple.

We tested 10G and 100G data sets which was CPU bound and IO bound box on the Dell 2950 box w 16G of RAM which we used for testing. Even though box had 8 cores it is little use for MySQL as only one query is ran concurrently, same can be told about 8 hard drives which this box had.
MySQL Also was very slow running some queries so we changed scripts a bit to kill extremely long running queries to get results for others this means we can’t really get a valid TPC-H result from MySQL,though at least we get to see performance of individual queries.
We also packaged the toolset we used for benchmarks so you can repeat them if you like. It can be downloaded here

So let us first take a look at MySQL 5.1.23 vs 6.0.4 results for 10GB data set which “fits in memory”. The cut-off time for this test was 1 hour so query taking over 1 hour has NA. Times are given in seconds.
Ratio is MySQL 6.0 time divided by MySQL 5.1 time so if it is less than 1 MySQL 6.0 is faster if more than 1 slower:

Query MySQL 5.0.23 MySQL 6.0.4Ratio
Query1370.50372.161.00
Query2724.63623.680.86
Query3328.25354.621.08
Query47.958.081.02
Query5154.36161.651.05
Query629.3538.511.31
Query7322.04331.051.03
Query8469.31457.180.97
Query9123.75121.330.98
Query10343.37341.230.99
Query112.322.341.01
Query1228.6838.611.35
Query1346.0047.841.04
Query148.838.230.93
Query1521.5023.611.10
Query1655.4861.421.11
Query179.009.331.04
Query18N/A1962.96N/A
Query195.035.301.05
Query202.060.260.13
Query2133.1632.980.99
Query227.848.061.03

As you can see for 10G results there is only one query which does not complete within an hour in 5.1 and MySQL 6.0 can complete all queries within an hour.
We can also see MySQL 6.0 improving query 20 speed dramatically, the rest of the queries is however close and MySQL 6.0 is even significantly slower than 5.1 for
number of queries. Honestly I expected more from MySQL 6.0 optimizer improvements effort.

100GB Results are more interesting because this is database size for which you can find results published on TPC Web site. For example These Are results recently published for Microsoft SQL Server.
Of course this is much more powerful system but still you can get an idea – This system has geomean of 7.7 sec for Power test which is good order of magnitude better compared to MySQL results on 10G database.
TPC does discourage from comparing results against different database sizes so let’s see 100G data set results.
In this case we set cut-off time to 3 hours to give MySQL more time to complete the queries:

Query MySQL 5.0.23 MySQL 6.0.4Ratio
Query13784.453737.540.99
Query2NANANA
Query3NANANA
Query4NANANA
Query5NANANA
Query6NANANA
Query74328.224533.621.05
Query88947.724122.620.46
Query9NANANA
Query10NANANA
Query112726.203395.021.25
Query12NANANA
Query13NANANA
Query142345.68?NA
Query15NANANA
Query16725.31693.560.96
Query171895.55NANA
Query18NANANA
Query194896.274682.190.96
Query203117.451450.420.47
Query21NANANA
Query22108.56100.720.93

As you can see on 100G data set MySQL 5.1 could only complete 10 out of 22 queries within 3hours run time allowed for each query and MySQL 6.0 has similar number of queries it can execute in reasonable time frame. There 2 queries (Query8 and Query20) which MySQL 6.0 does better but there is also Query11 in which significant regression is observed. Vadim has already Wrote about it in his MySQL 5.1 vs 6.0 in TPC-H Queries post

As a Summary: We can see MySQL capabilities to run complex analytics queries, in particular those presented in TPC-H benchmark are still subpar even with changes which are currently seen in MySQL 6.0 tree. There is a long way till Release and may be MySQL 6.0 performance will improve. Though considering a lot of talks about optimizer improvements in MySQL 6.0 I expected drastically better results.
Though may we should not be as surprised by these results – MySQL 6.0 still does not have hash or merge join which are frequently used for these kind of queries, and there is also no query parallelization which would allow MySQL to use CPUs and hard drives this system has more efficiently. Parallel query will be growing issue for MySQL because CPUs just continue to add more cores. Having 2 CPUs and being able to use only half of system CPU resources for query processing was not too bad, however with 16,32,64 cores it is just too bad.

16 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Matthew Montgomery

In the 3rd paragraph you probably meant this:

“if it is less than 1 MySQL 6.0 is faster if [larger] than 1 slower:”

Ivan Evtuhovich

Hello, Peter.

At this time I can’t post bug on bugs.mysql.com. When i press Add Comment on existing bug, i see “An error occured when trying to verify your login credentials. Please log out and try again.” I relogin, but problem still exists (sorry for my english).

On site i don’t find any email address, where to post this problem.

Igor

Given that Kickfire published 100GB TPC-H results (audited) where none of the queries took more than 37 SECONDS – so for example on Q1 it’s 100 TIMES faster than result above (http://www.tpc.org/results/individual_results/Kickfire/Kickfire_2300_ES_TPCH_100G_041408.pdf) it appears MySQL/Sun have some real challenges ahead in this area (unless they’ll buy Kickfire or sue it out of existence etc).
I’m not affiliated with Kickfire in any way, shape or form no do I know anyone working there.

Igor

Q1 in TPC-H is one of the most basic queries there is – and it’s 100x slower using ~30K Kickfire appliance (so presumably no tuning is required – is it really the case?) vs what looks like 4-6K$ Dell server you’ve used. Time difference on other queries (except Q22) is even more dramatic.

Noah Freire

Hi Peter,

It would be great to know which MySQL settings you used for the 10GB test and 100GB test. What about posting your my.cnf? 😉

Thank you,

-Noah

martin kersten

Hi Peter,

Indeed, understanding the implications of a DBMS architecture on the application requirements is one of the hardest DBA tasks. The TPC-H and similar applications are meant to aid in this process, albeit a little. Ideally, a user does not require an extensive DBA course to handle reasonably sized database and its application.

The development in the area of column-stores and main-memory databases is moving at great pace.
Some throw hardware at the problem (Kickfire), others exploit different ways of software layering (Vertica, MonetDB). From an software architecture point of view understanding the pro-s and con-s are highly desirable and benchmark comparisons are a step into this direction.

From your comments on april 22, should i conclude that you don’t consider MySQL the premiere choice for datawarehouses?
If it is, did the performance team produce better results on the MySQL versions as presented by Peter at the beginning of this thread, including those beyond the 100GB. What are the tricks and their impact?
I am eager to learn those from the insiders, because the performance of MySQL does not stand out against the fullscale (OLTP +DW) version of MonetDB. Any reference to independent and public available numbers, albeit non-official, are appreciated.

regards, and keep up the good work on database kernels
Martin

ps http://monetdb.cwi.nl/projects/monetdb//SQL/Benchmark/TPCH/ for an indication. Numbers are confirmed recently, trends are the same.

Teratzul

Hi. I’m trying to execute the tpc-h queries on a mysql 5.0.27 system. Some of my results are somewhat similar to what you posted while others largely differ. I was wondering if I made a mistake while defining an index or something. Could I have a look at your create table scripts for the tpc-h standard ?

Thanks in advance.

Joel

Peter,
Excellent Job on your dilligent work, I love reading your blogs. I am working on a system that I beleive will put MySQL at the top of the TPC-H charts. My dilemma is that I do not see the data that is to be loaded as referrenced in the file: load_mysql.sh – PATH_DATA=/usb/rawdata/dbgen.10G

Is that information publically available for download?

Thanks in advance and keep up the great work!

luca

Hi Peter,
i was trying to repeat some tests but dbgen won’t to complete the writing of a 10GB dataset.
I have a “file size exceeded” when the size of lineitem table reaches 2GB. I can manage big files in my system (ext3+ related libraries)… i can’t actually understand what the problem is.
Any ideas?

Eddie

Hi Peter,

Re utilising multi cores, MySQL does claim to be capable of utilising multiple CPU cores. (http://dev.mysql.com/doc/refman/5.5/en/faqs-general.html#qandaitem-B-1-1-8).

Is that a different kind of assessment or different benchmarking?
Cheers.

AfEf

Hi Peter,

Can you help me to import data from TCPH benchmark to HBase nosql Database via sqoop and Wamp Server (MySql 5.1) ?

Thanks in advance,