<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>MySQL Performance Blog &#187; OLAP</title>
	<atom:link href="http://www.mysqlperformanceblog.com/category/olap/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mysqlperformanceblog.com</link>
	<description>Everything about MySQL Performance</description>
	<lastBuildDate>Sat, 24 Jul 2010 21:39:04 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=6348</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Intro to OLAP</title>
		<link>http://www.mysqlperformanceblog.com/2010/07/12/intro-to-olap/</link>
		<comments>http://www.mysqlperformanceblog.com/2010/07/12/intro-to-olap/#comments</comments>
		<pubDate>Mon, 12 Jul 2010 19:26:27 +0000</pubDate>
		<dc:creator>Justin Swanhart</dc:creator>
				<category><![CDATA[Innodb]]></category>
		<category><![CDATA[OLAP]]></category>
		<category><![CDATA[dw]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[BI]]></category>
		<category><![CDATA[business intelligence]]></category>

		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=3050</guid>
		<description><![CDATA[This is the first of a series of posts about business intelligence tools, particularly OLAP (or online analytical processing) tools using MySQL and other free open source software.  OLAP tools are a part of the larger topic of business intelligence, a topic that has not had a lot of coverage on MPB.  Because [...]]]></description>
			<content:encoded><![CDATA[<p>This is the first of a series of posts about business intelligence tools, particularly OLAP (or online analytical processing) tools using MySQL and other free open source software.  OLAP tools are a part of the larger topic of business intelligence, a topic that has not had a lot of coverage on MPB.  Because of this, I am going to start out talking about these topics in general, rather than getting right to gritty details of their performance. </p>
<p>I plan on covering the following topics:</p>
<ol>
<li>Introduction to <a href="http://en.wikipedia.org/wiki/OLAP">OLAP</a> and business intelligence. (this post)</li>
<li>Identifying the differences between a data warehouse, and a data mart.</li>
<li>Introduction to <a href="http://en.wikipedia.org/wiki/Multidimensional_Expressions">MDX</a> queries and the kind of SQL which a ROLAP tool must generate to answer those queries.</li>
<li>Performance challenges with larger databases, and some ways to help performance using aggregation.</li>
<li>Using materialized views to automate that aggregation process.</li>
<li>Comparing the performance of OLAP with and without aggregation over multiple MySQL storage engines at various data scales.
</ol>
<p><strong>What is BI?</strong><br />
Chances are that you have heard the term business intelligence.  Business intelligence (or BI) is a term which encompasses many different tools and methods for analyzing data, usually presenting it in a way that is easily consumed by upper management.  This analysis is often used to determine how effectively the business has been at meeting certain performance goals, and to forecast how they will do in the future.  To put it another way the tools are designed to provide insight about the business process, hence the name.  Probably the most popular BI activity for web sites is click analysis. </p>
<p>As far as BI is concerned, this series of posts focuses on OLAP analysis and in a lesser sense, on data warehousing.  Data warehouses often provide the information upon which OLAP analysis is performed, but more on this in post #2.</p>
<p><strong>OLAP?  What is that?</strong><br />
OLAP is an acronym which stands for online analytical processing.  OLAP analysis, which is really just another name for multidimensional analysis, consists of displaying summary aggregations of the data broken down into different groups.  A typical OLAP analysis might show &#8220;sale total,  by year,  by sales rep, by product category&#8221;.   OLAP analysis is usually used for reporting on current data, looking at historical trends and trying to make predictions about future trends.  </p>
<p><strong>Multidimensional Analysis</strong><br />
Multidimensional analysis is a form of statistical analysis.  In multidimensional analysis samples representing a particular <i>measure</i> are compared or broken down into different <i>dimensions</i>.  For example, in a sales analysis, the &#8220;sale amount&#8221; is a <i>measure</i>.   Measures are always aggregated values.  That is, total sales might be expressed as SUM(sale_amt).  This is because the SUM of the individual sales will be grouped along different dimensions, such as by year or by product.  I&#8217;m getting a little ahead of myself.  Before we talk about measures and dimensions, we should talk about the two ways in which this information can be stored.</p>
<p><strong>There are two main ways to store multidimensional data for OLAP analysis</strong><br />
OLAP servers typically come in two basic flavors.  Some servers have specialized data stores which store data in a form which is highly effective for multidimensional analysis.  These servers are termed MOLAP and they tend to have exceptional performance due to their specialized data store. Almost all MOLAP solutions pre-compute many (or even all) of the possible answers to multi-dimensional queries.  <a href="https://sourceforge.net/projects/palo/">Palo</a> is an example of an open source version of this technology.  <a href="http://en.wikipedia.org/wiki/Essbase">ESSbase</a> is an example of closed source product.  MOLAP servers often feature extensive compression of data which can improve performance.  Loading data into a MOLAP server usually takes a very long time because many of  the answers in the cube must be calculated.  The extra time spent during the load is usually called &#8220;processing&#8221; time.</p>
<p>A relational OLAP (or ROLAP) server uses data stored in an RDBMS.   These systems trade the performance of a multidimensional store for the convenience of an RDBMS.  These servers almost always query over a database which is structured as a <a href="http://en.wikipedia.org/wiki/Star_schema">STAR</a> or <a href="http://en.wikipedia.org/wiki/Snowflake_schema">snowflake</a> type schema.   To go back to the sales analysis example above, in a STAR schema the facts about the sales would be stored in the fact table, and the list of customers and products would be stored in separate dimension tables.  Some ROLAP servers support the aggregation of data into additional tables, and can use the tables automatically.  These servers can approach the performance of MOLAP with the convenience of ROLAP, but there are still challenges with this approach.  The biggest challenges are the amount of time that it takes to keep the tables updated and in the complexity of the many scripts or jobs which might be necessary to keep the tables in sync.  Part five of my series will introduce materialized views which attempt to address these challenges in a manageable way.</p>
<p><strong>What makes a ROLAP so great?</strong><br />
An OLAP server usually returns information to the user as a &#8216;<a href="http://en.wikipedia.org/wiki/Pivot_table">pivot table</a>&#8216; or &#8216;pivot report&#8217;.  While you could create such a report in a spreadsheet, the ROLAP tool is designed to deal with millions or even billions of rows of data, much more than a spreadsheet can usually handle.  MOLAP servers usually require that all, or almost all of the data must fit it memory.  Another difference is the ease by which this analysis is constructed.  You don&#8217;t necessarily have to write queries or drag and drop a report together in order to analyze multidimensional data using an OLAP tool.  </p>
<p><strong>Data before pivoting:</strong><br />
<a href="http://www.mysqlperformanceblog.com/wp-content/uploads/2010/07/Pivottable-Flatdata.png"><img src="http://www.mysqlperformanceblog.com/wp-content/uploads/2010/07/Pivottable-Flatdata.png" alt="Example image from Wikimedia commons showing detail data for sales" title="Pivottable-Flatdata" width="475" height="222" class="aligncenter size-full wp-image-3310" /></a></p>
<p><strong>Data summarized in pivot form:</strong><br />
<a href="http://www.mysqlperformanceblog.com/wp-content/uploads/2010/07/Pivottable-Pivoted.png"><img src="http://www.mysqlperformanceblog.com/wp-content/uploads/2010/07/Pivottable-Pivoted.png" alt="Wikimedia commons image showing data summarized in pivot format" title="Pivottable-Pivoted" width="558" height="137" class="aligncenter size-full wp-image-3311" /></a></p>
<p><strong>ROLAP tools use star schema</strong><br />
As I said before, a sale amount would be considered a measure, and it would usually be aggregated with SUM.  The other information about the sale, such as the product, when it was sold and to whom it was sold would be defined in dimension tables.  The fact table contains columns which are joined to the dimension tables, such as product_id and customer_id.  These are often defined as foreign keys from the fact table to the dimension tables.  </p>
<p>A note about <em>degenerate dimensions</em>:<br />
Any values in the fact table that don&#8217;t join to dimensions are either considered degenerate dimensions or measures.  In the example below the status of the order is a degenerate dimension.  A degenerate dimension is stored as an ENUM in many cases.  In the example below that there is no actual dimension table which includes the two different order statuses.  Such a dimension would add an extra join, which is expensive.  Any yes/no field and/or fields with a very low cardinality (such as gender or order status) will probably be stored in the fact table instead of in a dedicated dimension.  In the &#8220;pivot data&#8221; example above, all the dimensions are degenerate: gender, region, style, date.</p>
<div id="attachment_3264" class="wp-caption aligncenter" style="width: 303px"><a href="http://www.mysqlperformanceblog.com/wp-content/uploads/2010/07/star_schema_with_degenerate.png"><img src="http://www.mysqlperformanceblog.com/wp-content/uploads/2010/07/star_schema_with_degenerate.png" alt="Star schema with degenerate dimension" title="Star schema with degenerate dimension" width="293" height="444" class="size-full wp-image-3264" /></a><p class="wp-caption-text">Example star schema about sales.  </p></div>
<p>Often a dimension will include redundant information to make reporting easier, a process called &#8220;denormalization&#8221;.  Hierarchical information may be stored in a single dimension.  For example, a dimension for products may include both the category AND a sub-category.  A time dimension includes year, month and quarter.  You can create multiple different hierarchies from a single dimension.  This allows &#8216;drill down&#8217; into the dimension.  By default the data would be summarized by year, but you can drill down to quarter or month level aggregation.<br />
<a href="http://www.mysqlperformanceblog.com/wp-content/uploads/2010/07/date_hierarchy.png"><img src="http://www.mysqlperformanceblog.com/wp-content/uploads/2010/07/date_hierarchy.png" alt="Sample date hierarchy, showing quarter, month, year and day hierarchies." title="date_hierarchy" width="304" height="422" class="aligncenter size-full wp-image-3269" /></a></p>
<p>The screenshots here <a href="http://jpivot.sourceforge.net/temp-N101F1.html">in the jPivot (an OLAP cube browser) documentation</a> can give you a better idea about the display of data.  The examples break down sales by product, by category, and by region.</p>
<p>The information is presented in such a fashion that it can be &#8220;drilled into&#8221; and &#8220;filtered on&#8221; to provide an easy to use interface to the underlying data.  Graphical display of the data as pie, line or bar charts is possible.</p>
<p><strong>Focusing on ROLAP.</strong><br />
This is the MySQL performance blog, and as such an in depth discussion of MOLAP technology is not particularly warranted here.  Our discussion will focus on <a href="http://en.wikipedia.org/wiki/Mondrian_OLAP_server">Mondrian</a>.  Mondrian is an open source ROLAP server featuring an in-memory OLAP cache.  Mondrian is part of the <a href="http://en.wikipedia.org/wiki/Pentaho">Pentaho</a> open source business intelligence suite.  Mondrian is also used by other projects such as <a href="http://code.google.com/p/wabit/">Wabit</a> and <a href="http://jasperforge.org/">Jaspersoft</a>.  If you are using open source BI then you are probably already using Mondrian.  Closed source ROLAP servers include <a href="http://en.wikipedia.org/wiki/Microstrategy">Microstrategy</a>, <a href="http://en.wikipedia.org/wiki/Microsoft_Analysis_Services">Microsoft Analysis Services</a> and <A href="http://en.wikipedia.org/wiki/Oracle_Business_Intelligence_Suite_Enterprise_Edition">Oracle BI</a>.  </p>
<p>Mondrian speaks <a href="http://en.wikipedia.org/wiki/Multidimensional_Expressions">MDX</a>, <a href="http://www.olap4j.org/">olap4j</a> and <a href="http://en.wikipedia.org/wiki/XML_for_Analysis">XML for analysis</a>.  This means that there is a very high chance that your existing BI tools (if you have them) will work with it.  MDX is a query language that looks similar to SQL but is actually very different.  Olap4j is an OLAP interface for java applications.  XML for analysis (XMLA) is an industry standard analytical interface originally created by Microsoft, SAS and Hyperion.</p>
<p><strong>Whats next?</strong><br />
Next we&#8217;ll talk about the difference between data marts and data warehouses.  The former are usually used for OLAP analysis, but they can be fundamentally related to a warehouse. </p>
    <hr noshade style="margin:0;height:1px" />
    <p>Entry posted by Justin Swanhart |
      <a href="http://www.mysqlperformanceblog.com/2010/07/12/intro-to-olap/#comments">No comment</a></p>
    <p>Add to: <a href="http://del.icio.us/post?url=http://www.mysqlperformanceblog.com/2010/07/12/intro-to-olap/&amp;title=Intro to OLAP" title="Bookmark this post on del.icio.us"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/delicious.png" alt="delicious" /></a> | <a href="http://digg.com/submit?phase=2&amp;url=http://www.mysqlperformanceblog.com/2010/07/12/intro-to-olap/&amp;title=Intro to OLAP" title="Digg this post on Digg.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/digg.png" alt="digg" /></a> | <a href="http://reddit.com/submit?url=http://www.mysqlperformanceblog.com/2010/07/12/intro-to-olap/&amp;title=Intro to OLAP" title="Submit this post on reddit.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/reddit.png" alt="reddit" /></a> | <a href="http://www.netscape.com/submit/?U=http://www.mysqlperformanceblog.com/2010/07/12/intro-to-olap/&amp;T=Intro to OLAP" title="Vote for this article on Netscape"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/netscape.gif" alt="netscape" /></a> | <a href="http://www.google.com/bookmarks/mark?op=add&amp;bkmk=http://www.mysqlperformanceblog.com/2010/07/12/intro-to-olap/&amp;title=Intro to OLAP" title="Add to Google Bookmarks"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/google.png" alt="Google Bookmarks" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.mysqlperformanceblog.com/2010/07/12/intro-to-olap/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New OLAP Wikistat benchmark: Introduction and call for feedbacks</title>
		<link>http://www.mysqlperformanceblog.com/2010/01/28/new-olap-wikistat-benchmark-introduction-and-call-for-feedbacks/</link>
		<comments>http://www.mysqlperformanceblog.com/2010/01/28/new-olap-wikistat-benchmark-introduction-and-call-for-feedbacks/#comments</comments>
		<pubDate>Fri, 29 Jan 2010 03:08:47 +0000</pubDate>
		<dc:creator>Vadim</dc:creator>
				<category><![CDATA[OLAP]]></category>
		<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[mysql]]></category>

		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=2160</guid>
		<description><![CDATA[I've seen my posts on Ontime Air traffic and Star Schema Benchmark got a lot of interest
(links:

http://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-infobright-infinidb-and-luciddb/
http://www.mysqlperformanceblog.com/2009/10/02/analyzing-air-traffic-performance-with-infobright-and-monetdb/
http://www.mysqlperformanceblog.com/2009/10/26/air-traffic-queries-in-luciddb/
http://www.mysqlperformanceblog.com/2009/11/02/air-traffic-queries-in-infinidb-early-alpha/

).
However benchmarks by itself did not cover all cases I would want, so I was thinking about better scenario. The biggest problem is to get real big enough dataset, and I thank to Bradley C. Kuszmaul, he pointed me [...]]]></description>
			<content:encoded><![CDATA[<p>I've seen my posts on Ontime Air traffic and Star Schema Benchmark got a lot of interest<br />
(links:</p>
<ul>
<li><a href="http://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-infobright-infinidb-and-luciddb/">http://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-infobright-infinidb-and-luciddb/</a></li>
<li><a href="http://www.mysqlperformanceblog.com/2009/10/02/analyzing-air-traffic-performance-with-infobright-and-monetdb/">http://www.mysqlperformanceblog.com/2009/10/02/analyzing-air-traffic-performance-with-infobright-and-monetdb/</a></li>
<li><a href="http://www.mysqlperformanceblog.com/2009/10/26/air-traffic-queries-in-luciddb/">http://www.mysqlperformanceblog.com/2009/10/26/air-traffic-queries-in-luciddb/</a></li>
<li><a href="http://www.mysqlperformanceblog.com/2009/11/02/air-traffic-queries-in-infinidb-early-alpha/">http://www.mysqlperformanceblog.com/2009/11/02/air-traffic-queries-in-infinidb-early-alpha/</a></li>
</ul>
<p>).<br />
However benchmarks by itself did not cover all cases I would want, so I was thinking about better scenario. The biggest problem is to get real big enough dataset, and I thank to Bradley C. Kuszmaul, he pointed me on Wikipedia statistics on access to Wikipedia pages, and thank to Domas, who made stats accessible. Link to the archives: <a href="http://dammit.lt/wikistats/archive/">http://dammit.lt/wikistats/archive/</a> or the original <a href="http://lists.wikimedia.org/pipermail/wikitech-l/2007-December/035435.html">Domas's announcement </a>. </p>
<p>Although the table does not  have very much different information,  I think it is good enough to represent cases you can face in Web application ( log processing, page visits, clickstream, etc).</p>
<p>I made some efforts to normalize data to have model in classic star schema and prepared queries that could be run on proposed dataset (John Sichi, lead of LucidDB helped me to draft some queries).<br />
You can see details on our Percona <a href="http://www.percona.com/docs/wiki/benchmark:wikistat:start">Wikistat benchmark Wiki</a>.</p>
<p>I have next goals with proposed benchmark:</p>
<ul>
<li>Compare engines in OLAP queries for planning, predicting growth, analyzing access patterns to wiki pages, draw trends.</li>
<li>Compare engines in statistical queries for end users, which can be executed in real-time. I.e. How many times that or another page was accessed yesterday vs today.</li>
<li>Understand specific features and characteristic of each engine.</li>
<li>Compare throughput on simple queries (queries and scenario to be drafted yet)</li>
<li>Check ability to load data and serve queries at the same time ( availability during data load ) (queries and scenario to be drafted yet)</li>
</ul>
<p>So in proposed schema I have four tables:<br />
<code>pagestat </code>(fact table), and <code>pages, datesinfo, projects</code> (dimensions tables).</p>
<p>Dimensions tables are supposed to be static and not changed, and we can change datasize<br />
by varying amount of months loaded into fact table (so this is scale factor).</p>
<p>EER diagram<br />
<img src="http://www.percona.com/docs/wiki/_media/benchmark:wikistat:wikistat.png"><br />
(  made with MySQL Workbench  )</p>
<p>In current dataset, which you can download from Amazon snapshot (name: “percona-wikistat”, ID:snap-a5f9bacc) we have:</p>
<ul>
<li>Table <code>pages</code>: 724.550.811 rows. data size: 40476M</li>
<li>Table <code>datesinfo</code>: 9624 rows, one entry represents 1 hour</li>
<li>Table <code>projects:</code> 2025 rows</li>
<li>Table <code>pagestats</code><br />
Data for 2009-06: # 3.453.013.109 rows / size 68352M<br />
Data for 2009-07: # 3.442.375.618 rows / size 68152M
</li>
</ul>
<p>So with two months of stats we have about 172GB of data with about 7 billion rows in fact table.</p>
<p>Example of query ( again, full list on <a href="http://www.percona.com/docs/wiki/benchmark:wikistat:start">Benchmark Wiki</a>)</p>
<div class="igBar"><span id="lsql-2"><a href="#" onclick="javascript:showPlainTxt('sql-2'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">SQL:</span>
<div id="sql-2">
<div class="sql">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #993333; font-weight: bold;">SELECT</span> project, sum<span style="color:#006600; font-weight:bold;">&#40;</span>page_count<span style="color:#006600; font-weight:bold;">&#41;</span> sm </div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp;<span style="color: #993333; font-weight: bold;">FROM</span> pagestat </div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp;<span style="color: #993333; font-weight: bold;">JOIN</span> datesinfo di <span style="color: #993333; font-weight: bold;">ON</span> <span style="color:#006600; font-weight:bold;">&#40;</span> di.id=date_id <span style="color:#006600; font-weight:bold;">&#41;</span> </div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp;<span style="color: #993333; font-weight: bold;">JOIN</span> projects p <span style="color: #993333; font-weight: bold;">ON</span>&nbsp; <span style="color:#006600; font-weight:bold;">&#40;</span>p.id=project_id <span style="color:#006600; font-weight:bold;">&#41;</span> </div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp;<span style="color: #993333; font-weight: bold;">WHERE</span> di.calmonth=<span style="color: #cc66cc;color:#800000;">7</span> <span style="color: #993333; font-weight: bold;">AND</span> di.calyear=<span style="color: #cc66cc;color:#800000;">2009</span> </div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp;<span style="color: #993333; font-weight: bold;">GROUP</span> <span style="color: #993333; font-weight: bold;">BY</span> project </div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp;<span style="color: #993333; font-weight: bold;">ORDER</span> <span style="color: #993333; font-weight: bold;">BY</span> sm <span style="color: #993333; font-weight: bold;">DESC</span> </div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp;<span style="color: #993333; font-weight: bold;">LIMIT</span> <span style="color: #cc66cc;color:#800000;">20</span>; </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<p>I am going to load data and run queries against available engines:</p>
<ul>
<li>MySQL MyISAM / InnoDB (to have reference results)</li>
<li>InfoBright</li>
<li>InfiniDB</li>
<li>MonetDB</li>
<li>LucidDB</li>
<li>Greenplum</li>
</ul>
<p>and I will report my results ( so stay with MySQLPerformanceBlog <img src='http://www.mysqlperformanceblog.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> )</p>
<p>I'd like also to test also Paraccel, Vertica and KickFire systems, but I do not have access to.</p>
<p>I welcome your feedback on the  benchmark, and what else you would like to see here.</p>
    <hr noshade style="margin:0;height:1px" />
    <p>Entry posted by Vadim |
      <a href="http://www.mysqlperformanceblog.com/2010/01/28/new-olap-wikistat-benchmark-introduction-and-call-for-feedbacks/#comments">16 comments</a></p>
    <p>Add to: <a href="http://del.icio.us/post?url=http://www.mysqlperformanceblog.com/2010/01/28/new-olap-wikistat-benchmark-introduction-and-call-for-feedbacks/&amp;title=New OLAP Wikistat benchmark: Introduction and call for feedbacks" title="Bookmark this post on del.icio.us"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/delicious.png" alt="delicious" /></a> | <a href="http://digg.com/submit?phase=2&amp;url=http://www.mysqlperformanceblog.com/2010/01/28/new-olap-wikistat-benchmark-introduction-and-call-for-feedbacks/&amp;title=New OLAP Wikistat benchmark: Introduction and call for feedbacks" title="Digg this post on Digg.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/digg.png" alt="digg" /></a> | <a href="http://reddit.com/submit?url=http://www.mysqlperformanceblog.com/2010/01/28/new-olap-wikistat-benchmark-introduction-and-call-for-feedbacks/&amp;title=New OLAP Wikistat benchmark: Introduction and call for feedbacks" title="Submit this post on reddit.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/reddit.png" alt="reddit" /></a> | <a href="http://www.netscape.com/submit/?U=http://www.mysqlperformanceblog.com/2010/01/28/new-olap-wikistat-benchmark-introduction-and-call-for-feedbacks/&amp;T=New OLAP Wikistat benchmark: Introduction and call for feedbacks" title="Vote for this article on Netscape"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/netscape.gif" alt="netscape" /></a> | <a href="http://www.google.com/bookmarks/mark?op=add&amp;bkmk=http://www.mysqlperformanceblog.com/2010/01/28/new-olap-wikistat-benchmark-introduction-and-call-for-feedbacks/&amp;title=New OLAP Wikistat benchmark: Introduction and call for feedbacks" title="Add to Google Bookmarks"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/google.png" alt="Google Bookmarks" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.mysqlperformanceblog.com/2010/01/28/new-olap-wikistat-benchmark-introduction-and-call-for-feedbacks/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>Star Schema Bechmark: InfoBright, InfiniDB and LucidDB</title>
		<link>http://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-infobright-infinidb-and-luciddb/</link>
		<comments>http://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-infobright-infinidb-and-luciddb/#comments</comments>
		<pubDate>Fri, 08 Jan 2010 05:51:20 +0000</pubDate>
		<dc:creator>Vadim</dc:creator>
				<category><![CDATA[OLAP]]></category>
		<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[mysql]]></category>

		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=1955</guid>
		<description><![CDATA[In my previous rounds with DataWarehouse oriented engines I used single table without joins, and with small (as for DW) datasize (see http://www.mysqlperformanceblog.com/2009/10/02/analyzing-air-traffic-performance-with-infobright-and-monetdb/, http://www.mysqlperformanceblog.com/2009/10/26/air-traffic-queries-in-luciddb/, http://www.mysqlperformanceblog.com/2009/11/02/air-traffic-queries-in-infinidb-early-alpha/).  Addressing these issues, I took Star Schema Benchmark, which is TPC-H modification, and tried run queries against InfoBright, InfiniDB, LucidDB and MonetDB. I did not get results for MonetDB, [...]]]></description>
			<content:encoded><![CDATA[<p>In my previous rounds with DataWarehouse oriented engines I used single table without joins, and with small (as for DW) datasize (see <a href="http://www.mysqlperformanceblog.com/2009/10/02/analyzing-air-traffic-performance-with-infobright-and-monetdb/">http://www.mysqlperformanceblog.com/2009/10/02/analyzing-air-traffic-performance-with-infobright-and-monetdb/</a>, <a href="http://www.mysqlperformanceblog.com/2009/10/26/air-traffic-queries-in-luciddb/">http://www.mysqlperformanceblog.com/2009/10/26/air-traffic-queries-in-luciddb/</a>, <a href="http://www.mysqlperformanceblog.com/2009/11/02/air-traffic-queries-in-infinidb-early-alpha/">http://www.mysqlperformanceblog.com/2009/11/02/air-traffic-queries-in-infinidb-early-alpha/</a>).  Addressing these issues, I took Star Schema Benchmark, which is TPC-H modification, and tried run queries against InfoBright, InfiniDB, LucidDB and MonetDB. I did not get results for MonetDB, will explain later why. Again primary goal for test was not to get just numbers, but understand specifics of each engine and their ability to handle amount of data and execute queries.</p>
<p>All details I have are available on our Wiki <a href="http://www.percona.com/docs/wiki/benchmark:ssb:start">http://www.percona.com/docs/wiki/benchmark:ssb:start</a> and the specification of benchmarks you can get there <a href="http://www.percona.com/docs/wiki/_media/benchmark:ssb:starschemab.pdf">http://www.percona.com/docs/wiki/_media/benchmark:ssb:starschemab.pdf</a>.</p>
<p>I generated data with scale factor = 1000, which gave me 610GB of data in raw format and loaded into each engine.</p>
<p>There difference in engines gets into play. While InfoBright and InfiniDB does not need indexes at all (you actually can't create indexes here), they needed for LucidDB and MonetDB, and it changes load time and datasize after load significantly. The numbers<br />
I put in results do not include indexing time, but it also should be considered.</p>
<p>And indexes are exactly reason why I could not get results for MonetDB, there I faced issue<br />
I was not prepared for. MonetDB requires that index fits into memory during indexing procedure, and for 610GB the index may get to 120GB size, and I did not have that amount of memory ( the box is only 16GB of RAM).  MonetDB experts recommended me to extended<br />
swap partition to 128GB, but my partitions layout was not really prepared for, I just did not expect I need big swap partition.</p>
<p><strong>Loading</strong><br />
So load time.<br />
InfiniDB can really utilize all available cores/cpus in systems ( I run benchmark on 8 cores box), and it allowed to load data faster than other engines. Though LucidDB and MonetDB<br />
are also have multi-thread loaders, only InfoBright ICE used single core.</p>
<p>InfiniDB: <strong>24 010 sec</strong><br />
MonetDB: <strong>42 608 sec</strong> (without indexes)<br />
InfoBright: <strong>51 779 sec</strong><br />
LucidDB: <strong>140 736 sec</strong> (without indexes)</p>
<p>I should note that time to create indexes in LucidDB was also significant and exceeded loading time. Full report on indexes are available here <a href="http://www.percona.com/docs/wiki/benchmark:ssb:luciddb:start">http://www.percona.com/docs/wiki/benchmark:ssb:luciddb:start</a></p>
<p><strong>Data size</strong></p>
<p>Size after load is also interesting factor. InfoBright is traditionally good with compression,<br />
though compression rate is less than in case with AirTraffic table. I was told this is because<br />
lineorder table comes not in sorted order, which one would expect in real life. Actually<br />
the same complain I heard from InfiniDB experts - if put lineorder data in sorted order, loading<br />
time can decrease significantly.</p>
<p>Datasize after load:<br />
InfoBright: <strong>112G</strong><br />
LucidDB: <strong>120GB</strong> (without indexes)<br />
InfiniDB: <strong>626GB</strong><br />
MonetDB: <strong>650GB</strong> (without indexes)</p>
<p><strong>Queries time</strong></p>
<p>Now on queries time.<br />
Full results you can find on page <a href="http://www.percona.com/docs/wiki/benchmark:ssb:start">http://www.percona.com/docs/wiki/benchmark:ssb:start</a>,<br />
and graph is below. There couple comments from me.</p>
<p>InfoBright was fully 1 CPU bound during all queries. I think the problem<br />
that engine can use only single cpu/core is getting significant limitation<br />
for them. For query 3.1 I got the  surprising result, after 36h of work I got<br />
error that query can't be resolved by InfoBright optimizer and I need<br />
to enable MySQL optimizer.</p>
<p>InfiniDB is otherwise was  IO-bound, and processed data fully utilizing<br />
sequential reads and reading data with speed 120MB/s. I think it allowed<br />
InfiniDB to get the best time in the most queries.</p>
<p>LucidDB on this stage is also can utilize only singe thread with results sometime better,<br />
sometime worse than InfoBright.</p>
<p>Results:</p>
<table class="inline" border="1">
<tr class="row0">
<th class="col0">Query</th>
<th class="col1">InfoBright</th>
<th class="col2">InfiniDB</th>
<th class="col3"> LucidDB</th>
</tr>
<tr class="row1">
<td class="col0">Q1.1 </td>
<td class="col1"> 48 min 21.67 sec (2901.67 sec) </td>
<td class="col2"> 24 min 26.05 sec (1466.05 sec) </td>
<td class="col3"> 3503.792 sec </td>
</tr>
<tr class="row2">
<td class="col0">Q1.2 </td>
<td class="col1"> 44 min 55.37 sec (2695.37 sec) </td>
<td class="col2"> 24 min 25.83 sec (1465.83 sec) </td>
<td class="col3"> 2889.903 sec </td>
</tr>
<tr class="row3">
<td class="col0">Q1.3 </td>
<td class="col1"> 45 min 53.49 sec (2753.49 sec) </td>
<td class="col2"> 24 min 27.25 sec (1467.25 sec) </td>
<td class="col3"> 2763.464 sec </td>
</tr>
<tr class="row4">
<td class="col0">Q2.1 </td>
<td class="col1"> 1 hour 54 min 27.74 sec (6867.74) </td>
<td class="col2"> 19 min 44.35 sec (1184.35 sec) </td>
<td class="col3"> 9694.534 sec </td>
</tr>
<tr class="row5">
<td class="col0">Q2.2 </td>
<td class="col1"> 1 hour 13 min 33.15 sec (4413.15) </td>
<td class="col2"> 19 min 49.56 sec (1189.56 sec) </td>
<td class="col3"> 9399.965 sec </td>
</tr>
<tr class="row6">
<td class="col0">Q2.3 </td>
<td class="col1"> 1 hour 8 min 23.41 sec (4103.41) </td>
<td class="col2"> 19 min 52.27 sec (1192.25 sec) </td>
<td class="col3"> 8875.349 sec </td>
</tr>
<tr class="row7">
<td class="col0">Q3.1 </td>
<td class="col1"> NA </td>
<td class="col2"> 19 min 11.23 sec (1151.23 sec) </td>
<td class="col3"> 16376.93 sec </td>
</tr>
<tr class="row8">
<td class="col0">Q3.2 </td>
<td class="col1"> 3 hours 30 min 17.64 sec (12617.64 sec) </td>
<td class="col2"> 19 min 28.55 sec (1168.55 sec) </td>
<td class="col3"> 5560.977 sec </td>
</tr>
<tr class="row9">
<td class="col0">Q3.3 </td>
<td class="col1"> 2 hours 58 min 18.87 sec (10698.87 sec) </td>
<td class="col2"> 19 min 58.29 sec (1198.29 sec) </td>
<td class="col3"> 2517.621 sec </td>
</tr>
<tr class="row10">
<td class="col0">Q3.4 </td>
<td class="col1"> 1 hour 41 min 41.29 sec (6101.29 sec) </td>
<td class="col2"> 12 min 57.96 sec (777.96 sec) </td>
<td class="col3"> 686.202 sec </td>
</tr>
<tr class="row11">
<td class="col0">Q4.1 </td>
<td class="col1"> 8 hours 53 min 52.55 sec (32032.55 sec) </td>
<td class="col2"> 32 min 57.49 sec  (1977.49 sec )</td>
<td class="col3"> 19843.213 sec </td>
</tr>
<tr class="row12">
<td class="col0">Q4.2 </td>
<td class="col1"> 5 hours 38 min 7.60 sec / 5 hours 36 min 35.69 sec (20195.69 sec) </td>
<td class="col2"> 33 min 35.45 sec (2015.45 sec) </td>
<td class="col3"> 15292.648 sec </td>
</tr>
<tr class="row13">
<td class="col0">Q4.3 </td>
<td class="col1"> 12 hours 58 min 4.27 sec (46684.27 sec) </td>
<td class="col2"> 33 min 47.32 sec (2027.32 sec) </td>
<td class="col3"> 7241.791 sec </td>
</tr>
</table>
<p>Graph with results (time in sec, less time is better)<br />
<img src="https://spreadsheets.google.com/a/percona.com/oimg?key=0AjsVX7AnrCYwdGhQM1dkTnJjYWVTY3pNaHVGSzh2VkE&#038;oid=1&#038;v=1262848088265" /></p>
<p><strong>Conclusions</strong></p>
<ul>
<li>InfiniDB is doing just great using available CPU cores full IO bandwidth reading from disk. You can see more details on InfiniDB scalability on InfiniDB's blog <a href="http://infinidb.org/infinidb-blog/mysql-parallel-query-processing-of-ssb-queries-via-infinidb-.html">http://infinidb.org/infinidb-blog/mysql-parallel-query-processing-of-ssb-queries-via-infinidb-.html</a></li>
<li>SSB benchmark may be not good for InfoBright, the synthetic nature of benchmark<br />
does not allow InfoBright to show better results. But I hope InfoBright will be able to reuse multi-cores / multi-disks soon.</li>
<li>I'd like MonetDB is able to use disk to build indexes, not only rely on available memory</li>
<li>Taking complains on SSB I am looking to get another more realistic dataset and<br />
compare bigger set of available DW solutions</li>
<ul>
    <hr noshade style="margin:0;height:1px" />
    <p>Entry posted by Vadim |
      <a href="http://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-infobright-infinidb-and-luciddb/#comments">43 comments</a></p>
    <p>Add to: <a href="http://del.icio.us/post?url=http://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-infobright-infinidb-and-luciddb/&amp;title=Star Schema Bechmark: InfoBright, InfiniDB and LucidDB" title="Bookmark this post on del.icio.us"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/delicious.png" alt="delicious" /></a> | <a href="http://digg.com/submit?phase=2&amp;url=http://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-infobright-infinidb-and-luciddb/&amp;title=Star Schema Bechmark: InfoBright, InfiniDB and LucidDB" title="Digg this post on Digg.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/digg.png" alt="digg" /></a> | <a href="http://reddit.com/submit?url=http://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-infobright-infinidb-and-luciddb/&amp;title=Star Schema Bechmark: InfoBright, InfiniDB and LucidDB" title="Submit this post on reddit.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/reddit.png" alt="reddit" /></a> | <a href="http://www.netscape.com/submit/?U=http://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-infobright-infinidb-and-luciddb/&amp;T=Star Schema Bechmark: InfoBright, InfiniDB and LucidDB" title="Vote for this article on Netscape"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/netscape.gif" alt="netscape" /></a> | <a href="http://www.google.com/bookmarks/mark?op=add&amp;bkmk=http://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-infobright-infinidb-and-luciddb/&amp;title=Star Schema Bechmark: InfoBright, InfiniDB and LucidDB" title="Add to Google Bookmarks"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/google.png" alt="Google Bookmarks" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-infobright-infinidb-and-luciddb/feed/</wfw:commentRss>
		<slash:comments>43</slash:comments>
		</item>
		<item>
		<title>Air traffic queries in MyISAM and Tokutek (TokuDB)</title>
		<link>http://www.mysqlperformanceblog.com/2009/11/05/air-traffic-queries-in-myisam-and-tokutek-tokudb/</link>
		<comments>http://www.mysqlperformanceblog.com/2009/11/05/air-traffic-queries-in-myisam-and-tokutek-tokudb/#comments</comments>
		<pubDate>Fri, 06 Nov 2009 06:21:03 +0000</pubDate>
		<dc:creator>Vadim</dc:creator>
				<category><![CDATA[OLAP]]></category>
		<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[dw]]></category>
		<category><![CDATA[mysql]]></category>

		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=1641</guid>
		<description><![CDATA[This is next post in series
Analyzing air traffic performance with InfoBright and MonetDB
Air traffic queries in LucidDB
Air traffic queries in InfiniDB: early alpha
Let me explain the reason of choosing these engines.  After initial three posts I am often asked "What is baseline ? Can we compare results with standard MySQL engines ?". So there [...]]]></description>
			<content:encoded><![CDATA[<p>This is next post in series<br />
<a href="http://www.mysqlperformanceblog.com/2009/10/02/analyzing-air-traffic-performance-with-infobright-and-monetdb/">Analyzing air traffic performance with InfoBright and MonetDB</a><br />
<a href="http://www.mysqlperformanceblog.com/2009/10/26/air-traffic-queries-in-luciddb/">Air traffic queries in LucidDB</a><br />
<a href="http://www.mysqlperformanceblog.com/2009/11/02/air-traffic-queries-in-infinidb-early-alpha/">Air traffic queries in InfiniDB: early alpha</a></p>
<p>Let me explain the reason of choosing these engines.  After initial three posts I am often asked "What is baseline ? Can we compare results with standard MySQL engines ?". So there come MyISAM to consider it as base point to see how column-oriented-analytic engines are better here. </p>
<p>However, take into account, that for MyISAM we need to choose proper indexes to execute queries effectively, and there is pain coming with indexes: - load of data is getting slower; - to design proper indexes is additional research,  especially when MySQL optimizer is not smart in picking best one.</p>
<p>The really nice thing about MonetDB, InfoBright, InfiniDB is that they do not need indexes, so you may not worry about maintaining them and picking best one. I am not sure about LucidDB, I was told indexes are needed, but creating new index was really fast even on full database, so I guess, it's not B-Tree indexes. So this my reflexion on indexes turned me onto TokuDB direction.</p>
<p>What is so special about TokuDB ? There two things: indexes have special structure and are "cheap", by "cheap" I mean the maintenance cost is constant and independent on datasize. With regular B-Tree indexes cost grows  exponentially on datasize (Bradley Kuszmaul from Tokutek will correct me if I am wrong in this statement). Another point with TokuDB, it uses compression, so I expect less size of loaded data and less IO operations during query execution.</p>
<p>So what indexes we need for queries. To recall you details, the schema is available in this post<br />
<a href="http://www.mysqlperformanceblog.com/2009/10/02/analyzing-air-traffic-performance-with-infobright-and-monetdb/">http://www.mysqlperformanceblog.com/2009/10/02/analyzing-air-traffic-performance-with-infobright-and-monetdb/</a>, and<br />
queries I posted on sheet "Queries" in my summary <a href="https://spreadsheets.google.com/a/percona.com/ccc?key=0AjsVX7AnrCYwdERIZFVqakRrcXplM0g0UktaUkRwenc&#038;hl=en#">Spreadsheet</a>.</p>
<p>With Bradley's help we chose  next indexes:</p>
<div class="igBar"><span id="lcode-4"><a href="#" onclick="javascript:showPlainTxt('code-4'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">CODE:</span>
<div id="code-4">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">KEY `Year` <span style="color:#006600; font-weight:bold;">&#40;</span>`Year`,`Month`<span style="color:#006600; font-weight:bold;">&#41;</span>,</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; KEY `Year_2` <span style="color:#006600; font-weight:bold;">&#40;</span>`Year`,`DayOfWeek`<span style="color:#006600; font-weight:bold;">&#41;</span>,</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; KEY `DayOfWeek` <span style="color:#006600; font-weight:bold;">&#40;</span>`DayOfWeek`,`Year`,`DepDelay`<span style="color:#006600; font-weight:bold;">&#41;</span>,</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; KEY `DestCityName` <span style="color:#006600; font-weight:bold;">&#40;</span>`DestCityName`,`OriginCityName`,`Year`<span style="color:#006600; font-weight:bold;">&#41;</span>,</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; KEY `Year_3` <span style="color:#006600; font-weight:bold;">&#40;</span>`Year`,`DestCityName`,`OriginCityName`<span style="color:#006600; font-weight:bold;">&#41;</span>,</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; KEY `Year_4` <span style="color:#006600; font-weight:bold;">&#40;</span>`Year`,`Carrier`,`DepDelay`<span style="color:#006600; font-weight:bold;">&#41;</span>,</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; KEY `Origin` <span style="color:#006600; font-weight:bold;">&#40;</span>`Origin`,`Year`,`DepDelay`<span style="color:#006600; font-weight:bold;">&#41;</span> </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<p>And I measured load time for both MyISAM and TokuDB in empty table with created indexes.</p>
<p>Load time for MyISAM: <strong>16608 sec</strong><br />
For TokuDB: <strong>19131 sec</strong></p>
<p>Datasize (including indexes)</p>
<p>MyISAM: <strong>36.7GB</strong><br />
TokuDB: <strong>6.7GB</strong></p>
<p>I am a bit surprised that TokuDB is slower loading data, but my guess it is related to compression, and I expect with bigger amount of data TokuDB will be faster MyISAM.</p>
<p>Now to queries. Bradley pointed me that query Q5 <code>SELECT t.carrier, c, c2, c*1000/c2 as c3 FROM (SELECT carrier,<br />
count(*) AS c FROM ontime WHERE DepDelay>10 AND Year=2007 GROUP BY<br />
carrier) t JOIN (SELECT carrier, count(*) AS c2 FROM ontime WHERE<br />
Year=2007 GROUP BY carrier) t2 ON (t.Carrier=t2.Carrier) ORDER BY c3</code> can be rewritten as<br />
<code>SELECT carrier,totalflights,ndelayed,ndelayed*1000/totalflights as c3 FROM (SELECT carrier,count(*) as totalflights,sum(if(depdelay>10,1,0)) as ndelayed from ontime where year=2007 group by carrier) t order by c3 desc;</code> ( I name it as Query Q5i)</p>
<p>The summary table with queries execution time (in sec, less is better):</p>
<table border=1>
<tr>
<td>Query</td>
<td>MyISAM</td>
<td>TokuDB</td>
</tr>
<tr>
<td>Q0</td>
<td>72.84</td>
<td>50.25</td>
</tr>
<tr>
<td>Q1</td>
<td>61.03</td>
<td>55.01</td>
</tr>
<tr>
<td>Q2</td>
<td>98.12</td>
<td>58.36</td>
</tr>
<tr>
<td>Q3</td>
<td>123.04</td>
<td>66.87</td>
</tr>
<tr>
<td>Q4</td>
<td>6.92</td>
<td>6.91</td>
</tr>
<tr>
<td>Q5</td>
<td>13.61</td>
<td>11.86</td>
</tr>
<tr>
<td>Q5i</td>
<td>7.68</td>
<td>6.96</td>
</tr>
<tr>
<td>Q6</td>
<td>123.84</td>
<td>69.03</td>
</tr>
<tr>
<td>Q7</td>
<td>187.22</td>
<td>159.62</td>
</tr>
<tr>
<td>Q8 (1y)</td>
<td>8.75</td>
<td>7.59</td>
</tr>
<tr>
<td>Q8 (2y)</td>
<td>102.17</td>
<td>64.95</td>
</tr>
<tr>
<td>Q8 (3y)</td>
<td>104.7</td>
<td>69.76</td>
</tr>
<tr>
<td>Q8 (4y)</td>
<td>107.05</td>
<td>70.46</td>
</tr>
<tr>
<td>Q8 (10y)</td>
<td>119.54</td>
<td>84.64</td>
</tr>
<tr>
<td>Q9</td>
<td>69.05</td>
<td>47.67</td>
</tr>
</table>
<p>For reference I used 5.1.36-Tokutek-2.1.0 for both MyISAM and TokuDB tests.</p>
<p>And if you are interested to compare MyISAM with previous engines:</p>
<table border=1>
<tr>
<td>Query</td>
<td>MyISAM</td>
<td>MonetDB</td>
<td>InfoBright</td>
<td>LucidDB</td>
<td>InfiniDB</td>
</tr>
<tr>
<td>Q0</td>
<td>72.84</td>
<td>29.9</td>
<td>4.19</td>
<td>103.21</td>
<td>NA</td>
</tr>
<tr>
<td>Q1</td>
<td>61.03</td>
<td>7.9</td>
<td>12.13</td>
<td>49.17</td>
<td>6.79</td>
</tr>
<tr>
<td>Q2</td>
<td>98.12</td>
<td>0.9</td>
<td>6.73</td>
<td>27.13</td>
<td>4.59</td>
</tr>
<tr>
<td>Q3</td>
<td>123.04</td>
<td>1.7</td>
<td>7.29</td>
<td>27.66</td>
<td>4.96</td>
</tr>
<tr>
<td>Q4</td>
<td>6.92</td>
<td>0.27</td>
<td>0.99</td>
<td>2.34</td>
<td>0.75</td>
</tr>
<tr>
<td>Q5</td>
<td>13.61</td>
<td>0.5</td>
<td>2.92</td>
<td>7.35</td>
<td>NA</td>
</tr>
<tr>
<td>Q6</td>
<td>123.84</td>
<td>12.5</td>
<td>21.83</td>
<td>78.42</td>
<td>NA</td>
</tr>
<tr>
<td>Q7</td>
<td>187.22</td>
<td>27.9</td>
<td>8.59</td>
<td>106.37</td>
<td>NA</td>
</tr>
<tr>
<td>Q8 (1y)</td>
<td>8.75</td>
<td>0.55</td>
<td>1.74</td>
<td>6.76</td>
<td>8.13</td>
</tr>
<tr>
<td>Q8 (2y)</td>
<td>102.17</td>
<td>1.1</td>
<td>3.68</td>
<td>28.82</td>
<td>16.54</td>
</tr>
<tr>
<td>Q8 (3y)</td>
<td>104.7</td>
<td>1.69</td>
<td>5.44</td>
<td>35.37</td>
<td>24.46</td>
</tr>
<tr>
<td>Q8 (4y)</td>
<td>107.05</td>
<td>2.12</td>
<td>7.22</td>
<td>41.66</td>
<td>32.49</td>
</tr>
<tr>
<td>Q8 (10y)</td>
<td>119.54</td>
<td>29.14</td>
<td>17.42</td>
<td>72.67</td>
<td>70.35</td>
</tr>
<tr>
<td>Q9</td>
<td>69.05</td>
<td>6.3</td>
<td>0.31</td>
<td>76.12</td>
<td>9.54</td>
</tr>
</table>
<p>The all results are available in <a href="https://spreadsheets.google.com/a/percona.com/ccc?key=0AjsVX7AnrCYwdERIZFVqakRrcXplM0g0UktaUkRwenc&#038;hl=en#">summary Spreadsheet</a></p>
<p>I especially do not put TokuDB in the same table with analytic oriented databases, to highlight TokuDB is  OLTP engine for general purposes.<br />
As you see it is doing better than MyISAM in all queries.</p>
    <hr noshade style="margin:0;height:1px" />
    <p>Entry posted by Vadim |
      <a href="http://www.mysqlperformanceblog.com/2009/11/05/air-traffic-queries-in-myisam-and-tokutek-tokudb/#comments">26 comments</a></p>
    <p>Add to: <a href="http://del.icio.us/post?url=http://www.mysqlperformanceblog.com/2009/11/05/air-traffic-queries-in-myisam-and-tokutek-tokudb/&amp;title=Air traffic queries in MyISAM and Tokutek (TokuDB)" title="Bookmark this post on del.icio.us"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/delicious.png" alt="delicious" /></a> | <a href="http://digg.com/submit?phase=2&amp;url=http://www.mysqlperformanceblog.com/2009/11/05/air-traffic-queries-in-myisam-and-tokutek-tokudb/&amp;title=Air traffic queries in MyISAM and Tokutek (TokuDB)" title="Digg this post on Digg.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/digg.png" alt="digg" /></a> | <a href="http://reddit.com/submit?url=http://www.mysqlperformanceblog.com/2009/11/05/air-traffic-queries-in-myisam-and-tokutek-tokudb/&amp;title=Air traffic queries in MyISAM and Tokutek (TokuDB)" title="Submit this post on reddit.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/reddit.png" alt="reddit" /></a> | <a href="http://www.netscape.com/submit/?U=http://www.mysqlperformanceblog.com/2009/11/05/air-traffic-queries-in-myisam-and-tokutek-tokudb/&amp;T=Air traffic queries in MyISAM and Tokutek (TokuDB)" title="Vote for this article on Netscape"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/netscape.gif" alt="netscape" /></a> | <a href="http://www.google.com/bookmarks/mark?op=add&amp;bkmk=http://www.mysqlperformanceblog.com/2009/11/05/air-traffic-queries-in-myisam-and-tokutek-tokudb/&amp;title=Air traffic queries in MyISAM and Tokutek (TokuDB)" title="Add to Google Bookmarks"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/google.png" alt="Google Bookmarks" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.mysqlperformanceblog.com/2009/11/05/air-traffic-queries-in-myisam-and-tokutek-tokudb/feed/</wfw:commentRss>
		<slash:comments>26</slash:comments>
		</item>
		<item>
		<title>Air traffic queries in InfiniDB: early alpha</title>
		<link>http://www.mysqlperformanceblog.com/2009/11/02/air-traffic-queries-in-infinidb-early-alpha/</link>
		<comments>http://www.mysqlperformanceblog.com/2009/11/02/air-traffic-queries-in-infinidb-early-alpha/#comments</comments>
		<pubDate>Mon, 02 Nov 2009 21:29:28 +0000</pubDate>
		<dc:creator>Vadim</dc:creator>
				<category><![CDATA[OLAP]]></category>
		<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[dw]]></category>
		<category><![CDATA[mysql]]></category>

		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=1593</guid>
		<description><![CDATA[As Calpont announced availability of InfiniDB I surely couldn't miss a chance to compare it with previously tested databases in the same environment.
See my previous posts on this topic:
Analyzing air traffic performance with InfoBright and MonetDB
Air traffic queries in LucidDB
I could not run all queries against InfiniDB and I met some hiccups during my experiment, [...]]]></description>
			<content:encoded><![CDATA[<p>As Calpont announced availability of <a href="http://infinidb.org/">InfiniDB</a> I surely couldn't miss a chance to compare it with previously tested databases in the same environment.<br />
See my previous posts on this topic:<br />
<a href="http://www.mysqlperformanceblog.com/2009/10/02/analyzing-air-traffic-performance-with-infobright-and-monetdb/">Analyzing air traffic performance with InfoBright and MonetDB</a><br />
<a href="http://www.mysqlperformanceblog.com/2009/10/26/air-traffic-queries-in-luciddb/">Air traffic queries in LucidDB</a></p>
<p>I could not run all queries against InfiniDB and I met some hiccups during my experiment, so it was less plain experience than with other databases.</p>
<p>So let's go by the same steps:</p>
<p><strong>Load data</strong></p>
<p>InfiniDB supports MySQL's <code>LOAD DATA</code> statement and it's own <code>colxml / cpimport</code> utilities. As <code>LOAD DATA</code> is more familiar for me, I started with that, however after issuing LOAD DATA on 180MB file ( for 1989 year, 1st month) very soon it caused extensive swapping (my box has 4GB of RAM) and statement failed with<br />
<code>ERROR 1 (HY000) at line 1: CAL0001: Insert Failed:  St9bad_alloc</code></p>
<p>Alright, <code>colxml / cpimport</code> was more successful, however it has less flexibility  in syntax than <code>LOAD DATA</code>, so I had to transform the input files  into a format that <code>cpimport</code> could understand.</p>
<p>Total load time was <strong>9747 sec</strong> or  <strong>2.7h</strong> (not counting time spent on files transformation)</p>
<p>I put summary data into on load data time, datasize and query time to <a href="https://spreadsheets.google.com/ccc?key=0AjsVX7AnrCYwdERIZFVqakRrcXplM0g0UktaUkRwenc&#038;hl=en">Google Spreadsheet</a> so you can easy compare with previous results. There are different sheets for queries, datasize and time of load.</p>
<p><strong>Datasize</strong></p>
<p>Size  of database after loading is another confusing point. InfiniDB data directory has complex structure like</p>
<div class="igBar"><span id="lcode-9"><a href="#" onclick="javascript:showPlainTxt('code-9'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">CODE:</span>
<div id="code-9">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">./<span style="color:#800000;color:#800000;">000</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">000</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">003</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">233</span>.<span style="">dir</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">./<span style="color:#800000;color:#800000;">000</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">000</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">003</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">233</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">000</span>.<span style="">dir</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">./<span style="color:#800000;color:#800000;">000</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">000</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">003</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">233</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">000</span>.<span style="">dir</span>/FILE000.<span style="">cdf</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">./<span style="color:#800000;color:#800000;">000</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">000</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">003</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">241</span>.<span style="">dir</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">./<span style="color:#800000;color:#800000;">000</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">000</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">003</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">241</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">000</span>.<span style="">dir</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">./<span style="color:#800000;color:#800000;">000</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">000</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">003</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">241</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">000</span>.<span style="">dir</span>/FILE000.<span style="">cdf</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">./<span style="color:#800000;color:#800000;">000</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">000</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">003</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">238</span>.<span style="">dir</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">./<span style="color:#800000;color:#800000;">000</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">000</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">003</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">238</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">000</span>.<span style="">dir</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">./<span style="color:#800000;color:#800000;">000</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">000</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">003</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">238</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">000</span>.<span style="">dir</span>/FILE000.<span style="">cdf</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">./<span style="color:#800000;color:#800000;">000</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">000</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">003</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">235</span>.<span style="">dir</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">./<span style="color:#800000;color:#800000;">000</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">000</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">003</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">235</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">000</span>.<span style="">dir</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">./<span style="color:#800000;color:#800000;">000</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">000</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">003</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">235</span>.<span style="">dir</span>/<span style="color:#800000;color:#800000;">000</span>.<span style="">dir</span>/FILE000.<span style="">cdf</span> </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<p>so it's hard to day what files are related to table. But after load, the size of 000.dir is <strong>114G</strong>, which is as twice big as original data files. <strong>SHOW TABLE STATUS</strong> does not really help there, it shows</p>
<div class="igBar"><span id="lcode-10"><a href="#" onclick="javascript:showPlainTxt('code-10'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">CODE:</span>
<div id="code-10">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">Name: ontime</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Engine: InfiniDB</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; Version: <span style="color:#800000;color:#800000;">10</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp;Row_format: Dynamic</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Rows: <span style="color:#800000;color:#800000;">2000</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp;Avg_row_length: <span style="color:#800000;color:#800000;">0</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; Data_length: <span style="color:#800000;color:#800000;">0</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">Max_data_length: <span style="color:#800000;color:#800000;">0</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp;Index_length: <span style="color:#800000;color:#800000;">0</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; Data_free: <span style="color:#800000;color:#800000;">0</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp;Auto_increment: NULL</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; Create_time: NULL</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; Update_time: NULL</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp;Check_time: NULL</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; Collation: latin1_swedish_ci</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp;Checksum: NULL</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp;Create_options: </div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; Comment: </div>
</li>
</ol>
</div>
</div>
</div>
<p>
with totally misleading information.</p>
<p>So I put <strong>114GB</strong> as size of data after load, until someone points me how to get real size, and also explains what takes so much space.</p>
<p><strong>Queries</strong></p>
<p>First count start query <code>SELECT count(*) FROM ontime</code> took <strong>2.67 sec</strong>, which shows that InfiniDB does not store counter of records, however calculates it pretty fast.</p>
<p>Q0:<br />
<code>select avg(c1) from (select year,month,count(*) as c1 from ontime group by YEAR,month) t;</code></p>
<p>Another bumper, on this query InfiniDB complains<br />
<code><br />
ERROR 138 (HY000):<br />
The query includes syntax that is not supported by InfiniDB. Use 'show warnings;' to get more information. Review the Calpont InfiniDB Syntax guide for additional information on supported distributed syntax or consider changing the InfiniDB Operating Mode (infinidb_vtable_mode).<br />
mysql> show warnings;<br />
+-------+------+------------------------------------------------------------+<br />
| Level | Code | Message                                                    |<br />
+-------+------+------------------------------------------------------------+<br />
| Error | 9999 | Subselect in From clause is not supported in this release. |<br />
+-------+------+------------------------------------------------------------+<br />
</code></p>
<p>Ok, so InfiniDB does not support DERIVED TABLES, which is big limitation from my point of view.<br />
As workaround I tried to create temporary table, but got another error:</p>
<div class="igBar"><span id="lcode-11"><a href="#" onclick="javascript:showPlainTxt('code-11'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">CODE:</span>
<div id="code-11">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">mysql&gt; create temporary table tq2 as <span style="color:#006600; font-weight:bold;">&#40;</span>select Year,Month,count<span style="color:#006600; font-weight:bold;">&#40;</span>*<span style="color:#006600; font-weight:bold;">&#41;</span> as c1 from ontime group by Year, Month<span style="color:#006600; font-weight:bold;">&#41;</span>;</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">ERROR <span style="color:#800000;color:#800000;">122</span> <span style="color:#006600; font-weight:bold;">&#40;</span>HY000<span style="color:#006600; font-weight:bold;">&#41;</span>: Cannot open table handle for ontime. </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<p>As warning suggests I turned <code>infinidb_vtable_mode = 2</code>, which is:</p>
<div class="igBar"><span id="lcode-12"><a href="#" onclick="javascript:showPlainTxt('code-12'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">CODE:</span>
<div id="code-12">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#800000;color:#800000;">2</span><span style="color:#006600; font-weight:bold;">&#41;</span> auto-switch mode: InfiniDB will attempt to process the query internally, if it </div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">cannot, it will automatically switch the query to run in row-by-row mode. </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<p>but query took <strong>667 sec</strong> :</p>
<p>so I skip queries Q5, Q6, Q7 from consideration, which are also  based on DERIVED TABLES,  as not supported by InfiniDB.</p>
<p>Other queries: (again look on comparison with other engines in <a href="https://spreadsheets.google.com/ccc?key=0AjsVX7AnrCYwdERIZFVqakRrcXplM0g0UktaUkRwenc&#038;hl=en">Google Spreadsheet</a> or in summary table at the bottom)</p>
<p>Query Q1:<br />
<code>mysql> SELECT DayOfWeek, count(*) AS c FROM ontime WHERE Year BETWEEN 2000 AND 2008 GROUP BY DayOfWeek ORDER BY c DESC;</code><br />
7 rows in set (<strong>6.79 sec</strong>)</p>
<p>Query Q2:<br />
<code>mysql> SELECT DayOfWeek, count(*) AS c FROM ontime WHERE DepDelay>10 AND Year BETWEEN 2000 AND 2008 GROUP BY DayOfWeek ORDER BY c DESC;<br />
</code><br />
7 rows in set (<strong>4.59 sec</strong>)</p>
<p>Query Q3:<br />
<code>SELECT Origin, count(*) AS c FROM ontime WHERE DepDelay>10 AND Year BETWEEN 2000 AND 2008 GROUP BY Origin ORDER BY c DESC LIMIT 10;<br />
</code><br />
<strong>4.96 sec</strong></p>
<p>Query Q4:<br />
<code>mysql> SELECT Carrier, count(*) FROM ontime WHERE DepDelay > 10 AND YearD=2007 GROUP BY Carrier ORDER BY 2 DESC;<br />
</code></p>
<p>I had another surprise with query, after 15 min it did not return results, I check system and it was totally idle, but query stuck. I killed query, restarted mysqld but could not connect to mysqld anymore.  In processes I see that InfiniDB started couple external processes: <code>ExeMgr, DDLProc, PrimProc, controllernode fg, workernode DBRM_Worker1 fg</code> which cooperate each with other using IPC shared memory and semaphores. To clean system I rebooted  server, and only after that mysqld was able to start.</p>
<p>After that query Q4 took <strong>0.75 sec<br />
</strong> </p>
<p>Queries Q5-Q7 skipped.</p>
<p>Query Q8:</p>
<p><code>SELECT DestCityName, COUNT( DISTINCT OriginCityName) FROM ontime WHERE YearD BETWEEN 2008 and 2008 GROUP BY DestCityName ORDER BY 2 DESC LIMIT 10;<br />
</code></p>
<p>And times for InfiniDB:</p>
<p><strong>1y:  8.13 sec<br />
2y:  16.54 sec<br />
3y:  24.46 sec<br />
4y: 32.49 sec<br />
10y: 1 min 10.35 sec</strong></p>
<p>Query Q9:</p>
<p>Q9:<br />
<code>select Year ,count(*) as c1 from ontime group by Year;<br />
</code><br />
Time: <strong>9.54 sec</strong></p>
<p>Ok, so there is summary table with queries times (in sec, less is better)</p>
<table border=1>
<tr>
<td>Query</td>
<td>MonetDB</td>
<td>InfoBright</td>
<td>LucidDB</td>
<td>InfiniDB</td>
</tr>
<tr>
<td>Q0</td>
<td>29.9</td>
<td><strong>4.19</strong></td>
<td>103.21</td>
<td>NA</td>
</tr>
<tr>
<td>Q1</td>
<td>7.9</td>
<td>12.13</td>
<td>49.17</td>
<td><strong>6.79</strong></td>
</tr>
<tr>
<td>Q2</td>
<td><strong>0.9</strong></td>
<td>6.73</td>
<td>27.13</td>
<td>4.59</td>
</tr>
<tr>
<td>Q3</td>
<td><strong>1.7</strong></td>
<td>7.29</td>
<td>27.66</td>
<td>4.96</td>
</tr>
<tr>
<td>Q4</td>
<td><strong>0.27</strong></td>
<td>0.99</td>
<td>2.34</td>
<td>0.75</td>
</tr>
<tr>
<td>Q5</td>
<td><strong>0.5</strong></td>
<td>2.92</td>
<td>7.35</td>
<td>NA</td>
</tr>
<tr>
<td>Q6</td>
<td><strong>12.5</strong></td>
<td>21.83</td>
<td>78.42</td>
<td>NA</td>
</tr>
<tr>
<td>Q7</td>
<td>27.9</td>
<td><strong>8.59</strong></td>
<td>106.37</td>
<td>NA</td>
</tr>
<tr>
<td>Q8 (1y)</td>
<td><strong>0.55</strong></td>
<td>1.74</td>
<td>6.76</td>
<td>8.13</td>
</tr>
<tr>
<td>Q8 (2y)</td>
<td><strong>1.1</strong></td>
<td>3.68</td>
<td>28.82</td>
<td>16.54</td>
</tr>
<tr>
<td>Q8 (3y)</td>
<td><strong>1.69</strong></td>
<td>5.44</td>
<td>35.37</td>
<td>24.46</td>
</tr>
<tr>
<td>Q8 (4y)</td>
<td><strong>2.12</strong></td>
<td>7.22</td>
<td>41.66</td>
<td>32.49</td>
</tr>
<tr>
<td>Q8 (10y)</td>
<td>29.14</td>
<td><strong>17.42</strong></td>
<td>72.67</td>
<td>70.35</td>
</tr>
<tr>
<td>Q9</td>
<td>6.3</td>
<td><strong>0.31</strong></td>
<td>76.12</td>
<td>9.54</td>
</tr>
</table>
<p><strong>Conclusions</strong></p>
<ul>
<li>InfiniDB server version shows <code>Server version: 5.1.39-community InfiniDB Community Edition 0.9.4.0-5-alpha (GPL)</code>, so I consider it as alpha release, and it is doing OK for alpha. I will wait for more stable release for further tests, as it took good amount of time to deal with different glitches.</li>
<li>InfiniDB shows really good time for queries it can handle, quite often better than InfoBright.</li>
<li> Inability to handle derived tables is significant drawback for me, I hope it will be fixed</li>
</ul>
    <hr noshade style="margin:0;height:1px" />
    <p>Entry posted by Vadim |
      <a href="http://www.mysqlperformanceblog.com/2009/11/02/air-traffic-queries-in-infinidb-early-alpha/#comments">19 comments</a></p>
    <p>Add to: <a href="http://del.icio.us/post?url=http://www.mysqlperformanceblog.com/2009/11/02/air-traffic-queries-in-infinidb-early-alpha/&amp;title=Air traffic queries in InfiniDB: early alpha" title="Bookmark this post on del.icio.us"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/delicious.png" alt="delicious" /></a> | <a href="http://digg.com/submit?phase=2&amp;url=http://www.mysqlperformanceblog.com/2009/11/02/air-traffic-queries-in-infinidb-early-alpha/&amp;title=Air traffic queries in InfiniDB: early alpha" title="Digg this post on Digg.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/digg.png" alt="digg" /></a> | <a href="http://reddit.com/submit?url=http://www.mysqlperformanceblog.com/2009/11/02/air-traffic-queries-in-infinidb-early-alpha/&amp;title=Air traffic queries in InfiniDB: early alpha" title="Submit this post on reddit.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/reddit.png" alt="reddit" /></a> | <a href="http://www.netscape.com/submit/?U=http://www.mysqlperformanceblog.com/2009/11/02/air-traffic-queries-in-infinidb-early-alpha/&amp;T=Air traffic queries in InfiniDB: early alpha" title="Vote for this article on Netscape"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/netscape.gif" alt="netscape" /></a> | <a href="http://www.google.com/bookmarks/mark?op=add&amp;bkmk=http://www.mysqlperformanceblog.com/2009/11/02/air-traffic-queries-in-infinidb-early-alpha/&amp;title=Air traffic queries in InfiniDB: early alpha" title="Add to Google Bookmarks"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/google.png" alt="Google Bookmarks" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.mysqlperformanceblog.com/2009/11/02/air-traffic-queries-in-infinidb-early-alpha/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
		<item>
		<title>Air traffic queries in LucidDB</title>
		<link>http://www.mysqlperformanceblog.com/2009/10/26/air-traffic-queries-in-luciddb/</link>
		<comments>http://www.mysqlperformanceblog.com/2009/10/26/air-traffic-queries-in-luciddb/#comments</comments>
		<pubDate>Mon, 26 Oct 2009 17:10:31 +0000</pubDate>
		<dc:creator>Vadim</dc:creator>
				<category><![CDATA[OLAP]]></category>
		<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[dw]]></category>
		<category><![CDATA[mysql]]></category>

		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=1537</guid>
		<description><![CDATA[After my first post Analyzing air traffic performance with InfoBright and MonetDB where I was not able to finish task with LucidDB, John Sichi contacted me with help to setup. You can see instruction how to load data on LucidDB Wiki page
You can find the description of benchmark in original post, there I will show [...]]]></description>
			<content:encoded><![CDATA[<p>After my first post <a href="http://www.mysqlperformanceblog.com/2009/10/02/analyzing-air-traffic-performance-with-infobright-and-monetdb/">Analyzing air traffic performance with InfoBright and MonetDB</a> where I was not able to finish task with LucidDB, John Sichi contacted me with help to setup. You can see instruction how to load data on <a href="http://pub.eigenbase.org/wiki/LucidDbOtp">LucidDB Wiki page</a></p>
<p>You can find the description of benchmark in original post, there I will show number I have for LucidDB vs previous systems.</p>
<p><strong>Load time</strong><br />
To load data into LucidDB in single thread, it took for me 15273 sec or <strong>4.24h</strong>. In difference with other systems LucidDB support multi-threaded load, with concurrency 2 (as I have only 2 cores on that box), the load time is 9955 sec or <strong>2.76h</strong>. For comparison<br />
for InforBright load time is <strong>2.45h</strong> and for MonetDB it is <strong>2.6h</strong></p>
<p><strong>DataSize</strong><br />
Another interesting metric is datasize after load. In LucidDB db file after load takes <strong>9.3GB</strong>.<br />
<strong>UPDATE 27-Oct-2009</strong> From metadata table the actual size of data is <strong>4.5GB</strong>, the 9.3GB is size of physical file db.dat, which probably was not truncated after several loads of data.</p>
<p>For InfoBright it is <strong>1.6GB</strong>, and for MonetDB - <strong>65GB</strong>. Obviously LucidDB uses some compression, but it is not so aggressive as in InfoBright case. As original dataset is 55GB, compression rate for LucidDB is somewhat <strong>1:12</strong></p>
<p><strong>Queries time<br />
</strong></p>
<p>Let me put list of queries and times for all systems.</p>
<p>- Lame query "count start"<br />
LucidDB:<br />
<code>SELECT count(*) FROM otp."ontime";<br />
</code>1 row selected (55.165 seconds)</p>
<p>Both InfoBright and MonetDB returned result immediately.<br />
It seems LucidDB has to scan whole table to get result.</p>
<p>- Q0:<br />
<code>select avg(c1) from (select "Year","Month",count(*) as c1 from otp."ontime" group by "Year","Month") t;</code><br />
LucidDB: <strong>103.205 seconds</strong><br />
InfoBright: <strong>4.19 sec</strong><br />
MonetDB: <strong>29.9 sec</strong></p>
<p>- Q1:<br />
SELECT "DayOfWeek", count(*) AS c FROM OTP."ontime" WHERE "Year" BETWEEN 2000 AND 2008 GROUP BY "DayOfWeek" ORDER BY c DESC;<br />
LucidDB: <strong>49.17 seconds</strong><br />
InfoBright: <strong>12.13 sec</strong><br />
MonetDB: <strong>7.9 sec</strong></p>
<p>- Q2:<br />
SELECT "DayOfWeek", count(*) AS c FROM otp."ontime" WHERE "DepDelay">10 AND "Year" BETWEEN 2000 AND 2008 GROUP BY "DayOfWeek" ORDER BY c DESC;<br />
LucidDB: <strong>27.131 seconds</strong><br />
InfoBright: <strong>6.37 sec</strong><br />
MonetDB: <strong>0.9 sec</strong></p>
<p>- Q3:<br />
!set rowlimit 10<br />
SELECT "Origin", count(*) AS c FROM otp."ontime" WHERE "DepDelay">10 AND "Year" BETWEEN 2000 AND 2008 GROUP BY "Origin" ORDER BY c DESC;<br />
LucidDB: <strong>27.664 seconds</strong><br />
InfoBright: <strong>7.29 sec</strong><br />
MonetDB: <strong>1.7 sec</strong></p>
<p>- Q4:<br />
SELECT "Carrier", count(*) FROM otp."ontime" WHERE "DepDelay">10 AND "Year"=2007 GROUP BY "Carrier" ORDER BY 2 DESC;<br />
LucidDB: <strong>2.338 seconds</strong><br />
InfoBright: <strong>0.99 sec</strong><br />
MonetDB: <strong>0.27 sec</strong></p>
<p>- Q5:<br />
SELECT t."Carrier", c, c2, c*1000/c2 as c3 FROM (SELECT "Carrier", count(*) AS c FROM OTP."ontime" WHERE "DepDelay">10 AND "Year"=2007 GROUP BY "Carrier") t JOIN (SELECT "Carrier", count(*) AS c2 FROM OTP."ontime" WHERE "Year"=2007 GROUP BY "Carrier") t2 ON (t."Carrier"=t2."Carrier") ORDER BY c3 DESC;<br />
LucidDB: <strong>7.351 seconds</strong><br />
InfoBright: <strong>2.92 sec</strong><br />
MonetDB: <strong>0.5 sec</strong></p>
<p>- Q6:<br />
SELECT t."Carrier", c, c2, c*1000/c2 as c3 FROM (SELECT "Carrier", count(*) AS c FROM OTP."ontime" WHERE "DepDelay">10 AND "Year" BETWEEN 2000 AND 2008 GROUP BY "Carrier") t JOIN (SELECT "Carrier", count(*) AS c2 FROM OTP."ontime" WHERE "Year" BETWEEN 2000 AND 2008 GROUP BY "Carrier") t2 ON (t."Carrier"=t2."Carrier") ORDER BY c3 DESC;<br />
LucidDB: <strong>78.423 seconds</strong><br />
InfoBright: <strong>21.83 sec</strong><br />
MonetDB: <strong>12.5 sec</strong></p>
<p>- Q7:<br />
SELECT t."Year", c1/c2 FROM (select "Year", count(*)*1000 as c1 from OTP."ontime" WHERE "DepDelay">10 GROUP BY "Year") t JOIN (select "Year", count(*) as c2 from OTP."ontime" GROUP BY "Year") t2 ON (t."Year"=t2."Year");<br />
LucidDB: <strong>106.374 seconds</strong><br />
InfoBright: <strong>8.59 sec</strong><br />
MonetDB: <strong>27.9 sec</strong></p>
<p>- Q8:<br />
SELECT "DestCityName", COUNT( DISTINCT "OriginCityName") FROM "ontime" WHERE "Year" BETWEEN 2008 and 2008 GROUP BY "DestCityName" ORDER BY 2 DESC;</p>
<p>Years, LucidDB, InfoBright, MonetDB<br />
1y, 6.76s, 1.74s, 0.55s<br />
2y, 28.82s, 3.68s, 1.10s<br />
3y, 35.37s, 5.44s, 1.69s<br />
4y, 41.66s, 7.22s, 2.12s<br />
10y, 72.67s, 17.42s, 29.14s</p>
<p>- Q9:<br />
select "Year" ,count(*) as c1 from "ontime" group by "Year";<br />
LucidDB: <strong>76.121 seconds</strong><br />
InfoBright: <strong>0.31 sec</strong><br />
MonetDB: <strong>6.3 sec</strong></p>
<p>As you see LucidDB is not showing best results. However on good side about LucidDB I can mention it is very reach featured, with full support of DML statement. ETL features is also very impressive, you can extract, filter, transform external data (there is even access to MySQL via JDBC driver) just in SQL queries (compare with single LOAD DATA statement in InfoBright ICE edition). Also I am not so much in Java, but as I understood LucidDB can be easily integrated with Java applications,  which is important if your development is Java based.</p>
<p>Worth to mention that in LucidDB single query execution takes 100% of user time in single CPU, which may signal that there some low-hanging fruits for optimization. OProfile can show clear places to fix.</p>
    <hr noshade style="margin:0;height:1px" />
    <p>Entry posted by Vadim |
      <a href="http://www.mysqlperformanceblog.com/2009/10/26/air-traffic-queries-in-luciddb/#comments">20 comments</a></p>
    <p>Add to: <a href="http://del.icio.us/post?url=http://www.mysqlperformanceblog.com/2009/10/26/air-traffic-queries-in-luciddb/&amp;title=Air traffic queries in LucidDB" title="Bookmark this post on del.icio.us"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/delicious.png" alt="delicious" /></a> | <a href="http://digg.com/submit?phase=2&amp;url=http://www.mysqlperformanceblog.com/2009/10/26/air-traffic-queries-in-luciddb/&amp;title=Air traffic queries in LucidDB" title="Digg this post on Digg.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/digg.png" alt="digg" /></a> | <a href="http://reddit.com/submit?url=http://www.mysqlperformanceblog.com/2009/10/26/air-traffic-queries-in-luciddb/&amp;title=Air traffic queries in LucidDB" title="Submit this post on reddit.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/reddit.png" alt="reddit" /></a> | <a href="http://www.netscape.com/submit/?U=http://www.mysqlperformanceblog.com/2009/10/26/air-traffic-queries-in-luciddb/&amp;T=Air traffic queries in LucidDB" title="Vote for this article on Netscape"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/netscape.gif" alt="netscape" /></a> | <a href="http://www.google.com/bookmarks/mark?op=add&amp;bkmk=http://www.mysqlperformanceblog.com/2009/10/26/air-traffic-queries-in-luciddb/&amp;title=Air traffic queries in LucidDB" title="Add to Google Bookmarks"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/google.png" alt="Google Bookmarks" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.mysqlperformanceblog.com/2009/10/26/air-traffic-queries-in-luciddb/feed/</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
	</channel>
</rss>
