<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: InnoDB: look after fragmentation</title>
	<atom:link href="http://www.mysqlperformanceblog.com/2009/11/05/innodb-look-after-fragmentation/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mysqlperformanceblog.com/2009/11/05/innodb-look-after-fragmentation/</link>
	<description>Percona&#039;s MySQL &#38; InnoDB performance and scalability blog</description>
	<lastBuildDate>Sat, 11 Feb 2012 16:45:54 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
	<item>
		<title>By: Petrik_CZ</title>
		<link>http://www.mysqlperformanceblog.com/2009/11/05/innodb-look-after-fragmentation/comment-page-1/#comment-725335</link>
		<dc:creator>Petrik_CZ</dc:creator>
		<pubDate>Mon, 15 Feb 2010 13:43:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=1616#comment-725335</guid>
		<description>After several trials I am doing &quot;defragmentation&quot; of innodb 10M+ rows table by exporting to csv file and then loading to new table. Then I rename them, copy all changes which occured during export/import and delete old table. quite fast compared to other methods.</description>
		<content:encoded><![CDATA[<p>After several trials I am doing &#8220;defragmentation&#8221; of innodb 10M+ rows table by exporting to csv file and then loading to new table. Then I rename them, copy all changes which occured during export/import and delete old table. quite fast compared to other methods.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: zanzibar</title>
		<link>http://www.mysqlperformanceblog.com/2009/11/05/innodb-look-after-fragmentation/comment-page-1/#comment-712977</link>
		<dc:creator>zanzibar</dc:creator>
		<pubDate>Fri, 22 Jan 2010 07:33:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=1616#comment-712977</guid>
		<description>I just ran &#039;ALTER TABLE tracker ENGINE=InnoDB&#039;; it takes forever with 20 million rows. I wonder if there&#039;s a better way to defrag on production! :)</description>
		<content:encoded><![CDATA[<p>I just ran &#8216;ALTER TABLE tracker ENGINE=InnoDB&#8217;; it takes forever with 20 million rows. I wonder if there&#8217;s a better way to defrag on production! <img src='http://www.mysqlperformanceblog.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Vadim</title>
		<link>http://www.mysqlperformanceblog.com/2009/11/05/innodb-look-after-fragmentation/comment-page-1/#comment-681513</link>
		<dc:creator>Vadim</dc:creator>
		<pubDate>Sat, 21 Nov 2009 19:04:06 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=1616#comment-681513</guid>
		<description>Frank,

I compared different queries as they use different indexes.

Q1 uses primary key and Q2 uses block_id

The idea behind that is if you run query that you think should be executed by PRIMARY KEY you will not
use FORCE KEY (secondary_key) on it. By PRIMARY KEY is usually much faster for InnoDB.</description>
		<content:encoded><![CDATA[<p>Frank,</p>
<p>I compared different queries as they use different indexes.</p>
<p>Q1 uses primary key and Q2 uses block_id</p>
<p>The idea behind that is if you run query that you think should be executed by PRIMARY KEY you will not<br />
use FORCE KEY (secondary_key) on it. By PRIMARY KEY is usually much faster for InnoDB.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Frank</title>
		<link>http://www.mysqlperformanceblog.com/2009/11/05/innodb-look-after-fragmentation/comment-page-1/#comment-674769</link>
		<dc:creator>Frank</dc:creator>
		<pubDate>Mon, 09 Nov 2009 13:29:15 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=1616#comment-674769</guid>
		<description>Vadim
in this case / with that special purpose in mind why do you compare the runtimes of different queries?
Q1=
SELECT count(distinct username) FROM tracker where TIME_ID &gt;= &#039;2009-07-20 00:00:00&#039; AND TIME_ID = &#039;2009-07-20 00:00:00&#039; AND TIME_ID &lt;= &#039;2009-10-21 00:00:00&#039;

At first glance one would use identical queries and FORCE INDEX usage, wouldn&#039;t one?</description>
		<content:encoded><![CDATA[<p>Vadim<br />
in this case / with that special purpose in mind why do you compare the runtimes of different queries?<br />
Q1=<br />
SELECT count(distinct username) FROM tracker where TIME_ID &gt;= &#8217;2009-07-20 00:00:00&#8242; AND TIME_ID = &#8217;2009-07-20 00:00:00&#8242; AND TIME_ID &lt;= &#039;2009-10-21 00:00:00&#039;</p>
<p>At first glance one would use identical queries and FORCE INDEX usage, wouldn&#039;t one?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2009/11/05/innodb-look-after-fragmentation/comment-page-1/#comment-674211</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Sat, 07 Nov 2009 18:35:44 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=1616#comment-674211</guid>
		<description>Miguel,

Indeed Innodb clusters data by primary key.  This clustering is however per page.  For example  in case we would use MyISAM we could get a &quot;disk seek&quot; and IO for each row in worse case scenario - in Innodb it does not happen. The data is always going to be clustered on per page basics and hence at least 16K worth of data would be read each time.   The number of rows of course depends on row length  and page fill factor  - in this case we had few hundred rows per page.

Now note in the worse case scenario no read-aheads will trigger and all IO will be done by single thread in 16K blocks.  Considering  200 IOS/sec for legacy (non Flash) hard drive you will be looking at about 3MB/sec read speed which is  30-50 times slower than sequential read speed of the same drive.</description>
		<content:encoded><![CDATA[<p>Miguel,</p>
<p>Indeed Innodb clusters data by primary key.  This clustering is however per page.  For example  in case we would use MyISAM we could get a &#8220;disk seek&#8221; and IO for each row in worse case scenario &#8211; in Innodb it does not happen. The data is always going to be clustered on per page basics and hence at least 16K worth of data would be read each time.   The number of rows of course depends on row length  and page fill factor  &#8211; in this case we had few hundred rows per page.</p>
<p>Now note in the worse case scenario no read-aheads will trigger and all IO will be done by single thread in 16K blocks.  Considering  200 IOS/sec for legacy (non Flash) hard drive you will be looking at about 3MB/sec read speed which is  30-50 times slower than sequential read speed of the same drive.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Arjen Lentz</title>
		<link>http://www.mysqlperformanceblog.com/2009/11/05/innodb-look-after-fragmentation/comment-page-1/#comment-674059</link>
		<dc:creator>Arjen Lentz</dc:creator>
		<pubDate>Sat, 07 Nov 2009 00:12:10 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=1616#comment-674059</guid>
		<description>Migael, in InnoDB&#039;s architecture, it works out faster to insert in primary key order. Yes the B+Tree gets rebalanced either way and a B+tree is by definition sorted, but there&#039;s just less work in this particular scenario. That&#039;s why it&#039;s desirable that a dump for InnoDB tables has the rows in PK order.</description>
		<content:encoded><![CDATA[<p>Migael, in InnoDB&#8217;s architecture, it works out faster to insert in primary key order. Yes the B+Tree gets rebalanced either way and a B+tree is by definition sorted, but there&#8217;s just less work in this particular scenario. That&#8217;s why it&#8217;s desirable that a dump for InnoDB tables has the rows in PK order.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Steven Roussey</title>
		<link>http://www.mysqlperformanceblog.com/2009/11/05/innodb-look-after-fragmentation/comment-page-1/#comment-673870</link>
		<dc:creator>Steven Roussey</dc:creator>
		<pubDate>Fri, 06 Nov 2009 15:56:33 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=1616#comment-673870</guid>
		<description>One thing that was great about mysisam was ALTER TABLE ORDER BY, so you could decide how to order your data...</description>
		<content:encoded><![CDATA[<p>One thing that was great about mysisam was ALTER TABLE ORDER BY, so you could decide how to order your data&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Miguel DeAvila</title>
		<link>http://www.mysqlperformanceblog.com/2009/11/05/innodb-look-after-fragmentation/comment-page-1/#comment-673863</link>
		<dc:creator>Miguel DeAvila</dc:creator>
		<pubDate>Fri, 06 Nov 2009 15:21:26 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=1616#comment-673863</guid>
		<description>Can you say more about the nature of the fragmentation?

Even if the dump occurred in non-pk order, the pk order should have been restored during the import, no?
I thought that the price for the out-of-order restore would be excessive i/o during the restore as the
b-tree is continually re-organized to cope with the out-of-order inserts. After the restore the table must
be in pk order, no?</description>
		<content:encoded><![CDATA[<p>Can you say more about the nature of the fragmentation?</p>
<p>Even if the dump occurred in non-pk order, the pk order should have been restored during the import, no?<br />
I thought that the price for the out-of-order restore would be excessive i/o during the restore as the<br />
b-tree is continually re-organized to cope with the out-of-order inserts. After the restore the table must<br />
be in pk order, no?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2009/11/05/innodb-look-after-fragmentation/comment-page-1/#comment-673590</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Fri, 06 Nov 2009 01:36:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=1616#comment-673590</guid>
		<description>Arjen,

Right. From the size prospective the indexes are about the same. Primary key is even a bit larger as it has more information in it than primary key. However  it is still better to scan in primary key order in most cases because it is less fragmented. It is very frequent to see auto increment primary key or other sequential insert patterns.</description>
		<content:encoded><![CDATA[<p>Arjen,</p>
<p>Right. From the size prospective the indexes are about the same. Primary key is even a bit larger as it has more information in it than primary key. However  it is still better to scan in primary key order in most cases because it is less fragmented. It is very frequent to see auto increment primary key or other sequential insert patterns.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Arjen Lentz</title>
		<link>http://www.mysqlperformanceblog.com/2009/11/05/innodb-look-after-fragmentation/comment-page-1/#comment-673562</link>
		<dc:creator>Arjen Lentz</dc:creator>
		<pubDate>Thu, 05 Nov 2009 23:20:01 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=1616#comment-673562</guid>
		<description>Interesting case.
Technically it was an equal choice in terms of index selection, since the indexed column (block_id) plus the composite primary key covers all the columns. So it was just as valid to pick that index as it was to just scan the primary key. And it would be just as fast.

Of course, when using the output from that query to insert into another table (via a dump or just INSERT ... SELECT), and the destination table is InnoDB, then it&#039;s more beneficial to have the select in primary key order. But hey, we all know that unless you specify an ORDER BY, result set order is undefined.
I don&#039;t think this is a optimiser bug, really. If you want the select ordered, use an ORDER BY clause.

And indeed, mysqldump offers the --order-by-primary option, and using it with InnoDB is good. For other engines, the situation can be quite different. For MyISAM, ordering by primary may in fact trigger a filesort (even go to disk) since a full table scan is likely not in primary key order. It&#039;s arbitrary (well, based on insert order + gap filling).</description>
		<content:encoded><![CDATA[<p>Interesting case.<br />
Technically it was an equal choice in terms of index selection, since the indexed column (block_id) plus the composite primary key covers all the columns. So it was just as valid to pick that index as it was to just scan the primary key. And it would be just as fast.</p>
<p>Of course, when using the output from that query to insert into another table (via a dump or just INSERT &#8230; SELECT), and the destination table is InnoDB, then it&#8217;s more beneficial to have the select in primary key order. But hey, we all know that unless you specify an ORDER BY, result set order is undefined.<br />
I don&#8217;t think this is a optimiser bug, really. If you want the select ordered, use an ORDER BY clause.</p>
<p>And indeed, mysqldump offers the &#8211;order-by-primary option, and using it with InnoDB is good. For other engines, the situation can be quite different. For MyISAM, ordering by primary may in fact trigger a filesort (even go to disk) since a full table scan is likely not in primary key order. It&#8217;s arbitrary (well, based on insert order + gap filling).</p>
]]></content:encoded>
	</item>
</channel>
</rss>

