<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Air traffic queries in MyISAM and Tokutek (TokuDB)</title>
	<atom:link href="http://www.mysqlperformanceblog.com/2009/11/05/air-traffic-queries-in-myisam-and-tokutek-tokudb/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mysqlperformanceblog.com/2009/11/05/air-traffic-queries-in-myisam-and-tokutek-tokudb/</link>
	<description>Percona&#039;s MySQL &#38; InnoDB performance and scalability blog</description>
	<lastBuildDate>Sat, 11 Feb 2012 16:45:54 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
	<item>
		<title>By: Sheeri K. Cabral (Pythian)</title>
		<link>http://www.mysqlperformanceblog.com/2009/11/05/air-traffic-queries-in-myisam-and-tokutek-tokudb/comment-page-1/#comment-675380</link>
		<dc:creator>Sheeri K. Cabral (Pythian)</dc:creator>
		<pubDate>Tue, 10 Nov 2009 23:17:17 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=1641#comment-675380</guid>
		<description>Vadim -- great!  (in general if people are asking for something, even if it seems silly, I&#039;d rather just do it and show them how silly it is, if it doesn&#039;t take me too much time.)

And getting the difference between how fast compiled-in InnoDB and InnoDB-plugin is a win too -- most people only consider upgrading or changing big things like compiled vs. plugin if their db is slow :)</description>
		<content:encoded><![CDATA[<p>Vadim &#8212; great!  (in general if people are asking for something, even if it seems silly, I&#8217;d rather just do it and show them how silly it is, if it doesn&#8217;t take me too much time.)</p>
<p>And getting the difference between how fast compiled-in InnoDB and InnoDB-plugin is a win too &#8212; most people only consider upgrading or changing big things like compiled vs. plugin if their db is slow <img src='http://www.mysqlperformanceblog.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Vadim</title>
		<link>http://www.mysqlperformanceblog.com/2009/11/05/air-traffic-queries-in-myisam-and-tokutek-tokudb/comment-page-1/#comment-675362</link>
		<dc:creator>Vadim</dc:creator>
		<pubDate>Tue, 10 Nov 2009 22:32:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=1641#comment-675362</guid>
		<description>Sheeri,

Ok, I am sold :) I will try to manage InnoDB setup before Portland, if not - then after that.
Would is really interesting here for me how fast indexes from InnoDB-plugin will work here.</description>
		<content:encoded><![CDATA[<p>Sheeri,</p>
<p>Ok, I am sold <img src='http://www.mysqlperformanceblog.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  I will try to manage InnoDB setup before Portland, if not &#8211; then after that.<br />
Would is really interesting here for me how fast indexes from InnoDB-plugin will work here.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sheeri K. Cabral (Pythian)</title>
		<link>http://www.mysqlperformanceblog.com/2009/11/05/air-traffic-queries-in-myisam-and-tokutek-tokudb/comment-page-1/#comment-675361</link>
		<dc:creator>Sheeri K. Cabral (Pythian)</dc:creator>
		<pubDate>Tue, 10 Nov 2009 22:27:45 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=1641#comment-675361</guid>
		<description>I agree -- InnoDB should do much worse than MyISAM...but I&#039;d like to see how much worse.  It would be good to be able to compare TokuDB to InnoDB in the case of an OLAP workload....TokuDB is supposed to be good for OLTP too, and you benchmarked that.

I just don&#039;t see that it would take so much time to do -- seems simple, no?  (I have some benchmarks coming out on TokuDB very soon, so if you can&#039;t do it that&#039;s OK too.)</description>
		<content:encoded><![CDATA[<p>I agree &#8212; InnoDB should do much worse than MyISAM&#8230;but I&#8217;d like to see how much worse.  It would be good to be able to compare TokuDB to InnoDB in the case of an OLAP workload&#8230;.TokuDB is supposed to be good for OLTP too, and you benchmarked that.</p>
<p>I just don&#8217;t see that it would take so much time to do &#8212; seems simple, no?  (I have some benchmarks coming out on TokuDB very soon, so if you can&#8217;t do it that&#8217;s OK too.)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Vadim</title>
		<link>http://www.mysqlperformanceblog.com/2009/11/05/air-traffic-queries-in-myisam-and-tokutek-tokudb/comment-page-1/#comment-675298</link>
		<dc:creator>Vadim</dc:creator>
		<pubDate>Tue, 10 Nov 2009 20:26:14 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=1641#comment-675298</guid>
		<description>Hmm, Sheeri,  your comment came to my mailbox only today, but WordPress says it was posted Nov-6,

Let me comment it now.

I think I explained why I took TokuDB. It has interesting index structure which may make it more 
suitable than MyISAM and InnoDB, and you actually can see that from queries time.

Load time is not good for TokuDB, but as Bradley comments it, it depends how you do load.

I made another run where I loaded data in year-per-chunk way (in contrast all years-at-once as in original post), it took over 9h for MyISAM, but for TokuDB the same 5.5h</description>
		<content:encoded><![CDATA[<p>Hmm, Sheeri,  your comment came to my mailbox only today, but WordPress says it was posted Nov-6,</p>
<p>Let me comment it now.</p>
<p>I think I explained why I took TokuDB. It has interesting index structure which may make it more<br />
suitable than MyISAM and InnoDB, and you actually can see that from queries time.</p>
<p>Load time is not good for TokuDB, but as Bradley comments it, it depends how you do load.</p>
<p>I made another run where I loaded data in year-per-chunk way (in contrast all years-at-once as in original post), it took over 9h for MyISAM, but for TokuDB the same 5.5h</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Vadim</title>
		<link>http://www.mysqlperformanceblog.com/2009/11/05/air-traffic-queries-in-myisam-and-tokutek-tokudb/comment-page-1/#comment-675289</link>
		<dc:creator>Vadim</dc:creator>
		<pubDate>Tue, 10 Nov 2009 20:17:02 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=1641#comment-675289</guid>
		<description>Something wrong with comments again.
I see some more comments to this post in my email, but not here.

There are comments from Sheeri K. Cabral (Pythian) and Alexander Mikhailian.

I will try to manage them to appear here...</description>
		<content:encoded><![CDATA[<p>Something wrong with comments again.<br />
I see some more comments to this post in my email, but not here.</p>
<p>There are comments from Sheeri K. Cabral (Pythian) and Alexander Mikhailian.</p>
<p>I will try to manage them to appear here&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Greg Smith</title>
		<link>http://www.mysqlperformanceblog.com/2009/11/05/air-traffic-queries-in-myisam-and-tokutek-tokudb/comment-page-1/#comment-674945</link>
		<dc:creator>Greg Smith</dc:creator>
		<pubDate>Mon, 09 Nov 2009 22:47:59 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=1641#comment-674945</guid>
		<description>I don&#039;t really agree with Bradley&#039;s characterization that &quot;in production, many databases are performing close to worst-case insertion workloads&quot;.  Sure, there are some of those.  But a good percentage of the data I see in production comes from sources that are time based, and that data tends to be loaded in well clustered clumps that are closer to best-case rather than worst.  This is particularly true if you have enough RAM in the system that a decent chunk of index blocks can stay in there and not have to be updated on disk every time they&#039;re touched, which is how a performance-oriented database should be provisioned.  It&#039;s only the case where you have a workload with really random UPDATE/INSERTs to data already in the index that really tend toward the worst of the B-tree behaviors, which was the case in Vadim&#039;s extreme fragmentation example.  I suspect that the BTS air traffic data won&#039;t degrade anywhere near worst-case even if loaded in chunks, because that data is organized into blocks by time in its original files.  I&#039;d guess most of the work will be building new little index sub-trees and attaching them, rather than the worst-case behavior where you&#039;re touching things all across existing index blocks.</description>
		<content:encoded><![CDATA[<p>I don&#8217;t really agree with Bradley&#8217;s characterization that &#8220;in production, many databases are performing close to worst-case insertion workloads&#8221;.  Sure, there are some of those.  But a good percentage of the data I see in production comes from sources that are time based, and that data tends to be loaded in well clustered clumps that are closer to best-case rather than worst.  This is particularly true if you have enough RAM in the system that a decent chunk of index blocks can stay in there and not have to be updated on disk every time they&#8217;re touched, which is how a performance-oriented database should be provisioned.  It&#8217;s only the case where you have a workload with really random UPDATE/INSERTs to data already in the index that really tend toward the worst of the B-tree behaviors, which was the case in Vadim&#8217;s extreme fragmentation example.  I suspect that the BTS air traffic data won&#8217;t degrade anywhere near worst-case even if loaded in chunks, because that data is organized into blocks by time in its original files.  I&#8217;d guess most of the work will be building new little index sub-trees and attaching them, rather than the worst-case behavior where you&#8217;re touching things all across existing index blocks.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Martin Kersten</title>
		<link>http://www.mysqlperformanceblog.com/2009/11/05/air-traffic-queries-in-myisam-and-tokutek-tokudb/comment-page-1/#comment-674140</link>
		<dc:creator>Martin Kersten</dc:creator>
		<pubDate>Sat, 07 Nov 2009 08:40:15 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=1641#comment-674140</guid>
		<description>Dear Vadim

Before looking into Greenplum, a run against Postgresql would be insightful.
It would demonstrate later the relative gains over Greenplum.

regards, Martin</description>
		<content:encoded><![CDATA[<p>Dear Vadim</p>
<p>Before looking into Greenplum, a run against Postgresql would be insightful.<br />
It would demonstrate later the relative gains over Greenplum.</p>
<p>regards, Martin</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Vadim</title>
		<link>http://www.mysqlperformanceblog.com/2009/11/05/air-traffic-queries-in-myisam-and-tokutek-tokudb/comment-page-1/#comment-674080</link>
		<dc:creator>Vadim</dc:creator>
		<pubDate>Sat, 07 Nov 2009 03:07:26 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=1641#comment-674080</guid>
		<description>Peter,

I am open for bigger dataset, though I have hard time finding it. Air traffic data is the biggest one (well, there is data take from telescopes, but I do not see how that fits into analytic databases) available.
For sure I can use (and will) synthetic generated data, but it is less interesting for me.</description>
		<content:encoded><![CDATA[<p>Peter,</p>
<p>I am open for bigger dataset, though I have hard time finding it. Air traffic data is the biggest one (well, there is data take from telescopes, but I do not see how that fits into analytic databases) available.<br />
For sure I can use (and will) synthetic generated data, but it is less interesting for me.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2009/11/05/air-traffic-queries-in-myisam-and-tokutek-tokudb/comment-page-1/#comment-674063</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Sat, 07 Nov 2009 00:35:04 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=1641#comment-674063</guid>
		<description>Vadim,

I would note you loaded the data for all engines all at once so you need to compare apples to apples.  I would expect things may change for everyone (at various extent) if data is loaded incrementally.

I also would suggest looking at larger data set some time in the future - in the current ran the compression has 2 effects, it is not only much less data to look at but also in memory fit may cause virtually in-memory workload for well compressed data.   On larger data sets when you get say  10% instead of 1% of in memory fit things may be a bit different.</description>
		<content:encoded><![CDATA[<p>Vadim,</p>
<p>I would note you loaded the data for all engines all at once so you need to compare apples to apples.  I would expect things may change for everyone (at various extent) if data is loaded incrementally.</p>
<p>I also would suggest looking at larger data set some time in the future &#8211; in the current ran the compression has 2 effects, it is not only much less data to look at but also in memory fit may cause virtually in-memory workload for well compressed data.   On larger data sets when you get say  10% instead of 1% of in memory fit things may be a bit different.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Vadim</title>
		<link>http://www.mysqlperformanceblog.com/2009/11/05/air-traffic-queries-in-myisam-and-tokutek-tokudb/comment-page-1/#comment-674048</link>
		<dc:creator>Vadim</dc:creator>
		<pubDate>Fri, 06 Nov 2009 22:54:29 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=1641#comment-674048</guid>
		<description>Bradley,

I agree that this is best case scenario for MyISAM , and with fragmented indexes the result may be much  worse for load time and for query execution. For queries yesterday I posted similar problem with InnoDB http://www.mysqlperformanceblog.com/2009/11/05/innodb-look-after-fragmentation/

I will re-run load with chunk-per-year to compare how it affects load time.

Though I should say the scenario I show here is not fully invalid. I&#039;ve seen DW solution bases on MyISAM, when data is load in periodically. And say if we load data once per months, and full load time is only couple hours, it is worth to re-load data entirely to get optimal index structure.</description>
		<content:encoded><![CDATA[<p>Bradley,</p>
<p>I agree that this is best case scenario for MyISAM , and with fragmented indexes the result may be much  worse for load time and for query execution. For queries yesterday I posted similar problem with InnoDB <a href="http://www.mysqlperformanceblog.com/2009/11/05/innodb-look-after-fragmentation/" rel="nofollow">http://www.mysqlperformanceblog.com/2009/11/05/innodb-look-after-fragmentation/</a></p>
<p>I will re-run load with chunk-per-year to compare how it affects load time.</p>
<p>Though I should say the scenario I show here is not fully invalid. I&#8217;ve seen DW solution bases on MyISAM, when data is load in periodically. And say if we load data once per months, and full load time is only couple hours, it is worth to re-load data entirely to get optimal index structure.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

