<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Innodb Double Write</title>
	<atom:link href="http://www.mysqlperformanceblog.com/2006/08/04/innodb-double-write/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mysqlperformanceblog.com/2006/08/04/innodb-double-write/</link>
	<description>Percona&#039;s MySQL &#38; InnoDB performance and scalability blog</description>
	<lastBuildDate>Sat, 11 Feb 2012 16:45:54 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
	<item>
		<title>By: Baron Schwartz</title>
		<link>http://www.mysqlperformanceblog.com/2006/08/04/innodb-double-write/comment-page-1/#comment-804976</link>
		<dc:creator>Baron Schwartz</dc:creator>
		<pubDate>Mon, 18 Apr 2011 16:46:34 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2006/08/04/innodb-double-write/#comment-804976</guid>
		<description>Zonker, the doublewrite buffer works as kind of a two-phase commit.  It allows recovering a failed write.  The write either succeeded or not, and if it didn&#039;t, on recovery it will be replayed from either the doublewrite buffer or the redo logs.  It doesn&#039;t work the way you think, and the scenario you listed can&#039;t happen.</description>
		<content:encoded><![CDATA[<p>Zonker, the doublewrite buffer works as kind of a two-phase commit.  It allows recovering a failed write.  The write either succeeded or not, and if it didn&#8217;t, on recovery it will be replayed from either the doublewrite buffer or the redo logs.  It doesn&#8217;t work the way you think, and the scenario you listed can&#8217;t happen.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nick Peirson</title>
		<link>http://www.mysqlperformanceblog.com/2006/08/04/innodb-double-write/comment-page-1/#comment-804959</link>
		<dc:creator>Nick Peirson</dc:creator>
		<pubDate>Mon, 18 Apr 2011 12:27:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2006/08/04/innodb-double-write/#comment-804959</guid>
		<description>Zonker,

As I understand it there are two failure modes:

1. A write to the doublewrite buffer fails. In this case the data in the table hasn&#039;t changed, so the redo log can be applied to update the data.

2. If the write to the doublewrite buffer succeeds and the write to the table fails, the data can be read from the doublewrite buffer to update the table.

If we were writing directly to the table, the failed write would&#039;ve left the table in an inconsistent state where the redo log couldn&#039;t be applied and there&#039;s been no successful write to a buffer that we can read the data back from. The doublewrite buffer means that we always have a way of bringing the data up to date after a failed write, regardless of failure mode.

I haven&#039;t looked at the internals, so this is an educated guess based on the post and comments. I&#039;d be grateful if someone more knowledgeable could confirm.</description>
		<content:encoded><![CDATA[<p>Zonker,</p>
<p>As I understand it there are two failure modes:</p>
<p>1. A write to the doublewrite buffer fails. In this case the data in the table hasn&#8217;t changed, so the redo log can be applied to update the data.</p>
<p>2. If the write to the doublewrite buffer succeeds and the write to the table fails, the data can be read from the doublewrite buffer to update the table.</p>
<p>If we were writing directly to the table, the failed write would&#8217;ve left the table in an inconsistent state where the redo log couldn&#8217;t be applied and there&#8217;s been no successful write to a buffer that we can read the data back from. The doublewrite buffer means that we always have a way of bringing the data up to date after a failed write, regardless of failure mode.</p>
<p>I haven&#8217;t looked at the internals, so this is an educated guess based on the post and comments. I&#8217;d be grateful if someone more knowledgeable could confirm.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Zonker Harris</title>
		<link>http://www.mysqlperformanceblog.com/2006/08/04/innodb-double-write/comment-page-1/#comment-804863</link>
		<dc:creator>Zonker Harris</dc:creator>
		<pubDate>Sat, 16 Apr 2011 22:19:20 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2006/08/04/innodb-double-write/#comment-804863</guid>
		<description>I guess I&#039;m not buying it.

I write to the doublewrite buffer:  AAAAAAAAAA

then flush to the table:  AAAAAAAAAA

fine, all&#039;s fine.  But now I write again to the doublewrite buffer and the write fails halfway:

BBBBB.....

We &quot;recover&quot; and find that the doublewrite buffer is BBBBB....., with a bad checksum, or however we determine that it isn&#039;t complete, so we &quot;just discard it&quot; per the mysql docs.

Really?  This is good?

Now I&#039;ve got AAAAAAAAAA in the table where I expect to have BBBBBBBBBB.  AAAAAAAAAA maybe &quot;consistent&quot;, but it isn&#039;t &quot;correct&quot;, and I stil have a problem.  If that data is correlated to other data, I still have to recover from a backup to get back to true consistency, i.e. my data all being consistent with each other.

So I don&#039;t really see how this benefits me.

To take the example further, if doublewrite is good, why wouldn&#039;t triplewrite or fourplewrite be even better?  Answer: It isn&#039;t, for exactly the reasons I&#039;ve described above. You still have bad data.  &quot;Consistent&quot;, perhaps, but not correct.

And now the obligatory &quot;Or am I simply not understanding this?&quot;</description>
		<content:encoded><![CDATA[<p>I guess I&#8217;m not buying it.</p>
<p>I write to the doublewrite buffer:  AAAAAAAAAA</p>
<p>then flush to the table:  AAAAAAAAAA</p>
<p>fine, all&#8217;s fine.  But now I write again to the doublewrite buffer and the write fails halfway:</p>
<p>BBBBB&#8230;..</p>
<p>We &#8220;recover&#8221; and find that the doublewrite buffer is BBBBB&#8230;.., with a bad checksum, or however we determine that it isn&#8217;t complete, so we &#8220;just discard it&#8221; per the mysql docs.</p>
<p>Really?  This is good?</p>
<p>Now I&#8217;ve got AAAAAAAAAA in the table where I expect to have BBBBBBBBBB.  AAAAAAAAAA maybe &#8220;consistent&#8221;, but it isn&#8217;t &#8220;correct&#8221;, and I stil have a problem.  If that data is correlated to other data, I still have to recover from a backup to get back to true consistency, i.e. my data all being consistent with each other.</p>
<p>So I don&#8217;t really see how this benefits me.</p>
<p>To take the example further, if doublewrite is good, why wouldn&#8217;t triplewrite or fourplewrite be even better?  Answer: It isn&#8217;t, for exactly the reasons I&#8217;ve described above. You still have bad data.  &#8220;Consistent&#8221;, perhaps, but not correct.</p>
<p>And now the obligatory &#8220;Or am I simply not understanding this?&#8221;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Vadim</title>
		<link>http://www.mysqlperformanceblog.com/2006/08/04/innodb-double-write/comment-page-1/#comment-803343</link>
		<dc:creator>Vadim</dc:creator>
		<pubDate>Fri, 01 Apr 2011 15:00:42 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2006/08/04/innodb-double-write/#comment-803343</guid>
		<description>markgoat.

Let me try to answer your question.

The problem comes from fact that when we issue pwrite , and there is crash during
this operation, then there is no way to check state of operation.
It may happen that for 16K operation we wrote only 4K or 8K.

So we may end up with situation when half of page contains new information and another half - old information. InnoDB of course will detect corruption using checksum, but it won&#039;t help much, as page is broken.

So solution to this is to have 2 copies of page. if we crash during writing of one of copies - we always have another consistent copy, which we can work with.</description>
		<content:encoded><![CDATA[<p>markgoat.</p>
<p>Let me try to answer your question.</p>
<p>The problem comes from fact that when we issue pwrite , and there is crash during<br />
this operation, then there is no way to check state of operation.<br />
It may happen that for 16K operation we wrote only 4K or 8K.</p>
<p>So we may end up with situation when half of page contains new information and another half &#8211; old information. InnoDB of course will detect corruption using checksum, but it won&#8217;t help much, as page is broken.</p>
<p>So solution to this is to have 2 copies of page. if we crash during writing of one of copies &#8211; we always have another consistent copy, which we can work with.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Baron Schwartz</title>
		<link>http://www.mysqlperformanceblog.com/2006/08/04/innodb-double-write/comment-page-1/#comment-803315</link>
		<dc:creator>Baron Schwartz</dc:creator>
		<pubDate>Fri, 01 Apr 2011 10:02:55 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2006/08/04/innodb-double-write/#comment-803315</guid>
		<description>The redo logs in InnoDB don&#039;t have complete page images.  This is different from some other database servers.</description>
		<content:encoded><![CDATA[<p>The redo logs in InnoDB don&#8217;t have complete page images.  This is different from some other database servers.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: markgoat</title>
		<link>http://www.mysqlperformanceblog.com/2006/08/04/innodb-double-write/comment-page-1/#comment-803284</link>
		<dc:creator>markgoat</dc:creator>
		<pubDate>Fri, 01 Apr 2011 05:38:15 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2006/08/04/innodb-double-write/#comment-803284</guid>
		<description>Sorry, I am still not understand why we need this.
Each entry in the log should be redo when doing recovery? I think even the page is &quot;partial written&quot;, it doesn&#039;t matter, redo log know the after image. So regardless of the page itself, redo will do every change again. If part of the page are not wrtten due to a crash, redo will do the write? In other words, the &quot;consistent page&quot; is saved in the redo log. Unless you mean &quot;partial write&quot; will modify someting not saved in the redo log??
I mean, EVERY change are in the redo log entries. Partial write , in my mind, is that it failed to make some changes to the disk. But when we apply redo log, we will redo the planned actions.
For example, we have 4 blocks each page, we update 2 rows in two of the blocks, and then we have two entries in the redo log. then there is a commit log. As a result of Partial write, only block 1 was updated, block 2,3,4 did not. But I didn&#039;t see any issue in this case, because redo will reimage block 1 and block 3 correctly? We don&#039;t care about if the disk is consistent or not, this is what redo in theory to handle, isn&#039;t it?
So I think I still didn&#039;t understand innoDB , why it need &quot;double write&quot;? I am new to innoDB, not knowing any internal details, I just think this from a textbook point of view ^_^

thank you, it is so long time you post this great article, don&#039;t know if you can still answer me. I am thinking this for days, but can&#039;t understand yet.</description>
		<content:encoded><![CDATA[<p>Sorry, I am still not understand why we need this.<br />
Each entry in the log should be redo when doing recovery? I think even the page is &#8220;partial written&#8221;, it doesn&#8217;t matter, redo log know the after image. So regardless of the page itself, redo will do every change again. If part of the page are not wrtten due to a crash, redo will do the write? In other words, the &#8220;consistent page&#8221; is saved in the redo log. Unless you mean &#8220;partial write&#8221; will modify someting not saved in the redo log??<br />
I mean, EVERY change are in the redo log entries. Partial write , in my mind, is that it failed to make some changes to the disk. But when we apply redo log, we will redo the planned actions.<br />
For example, we have 4 blocks each page, we update 2 rows in two of the blocks, and then we have two entries in the redo log. then there is a commit log. As a result of Partial write, only block 1 was updated, block 2,3,4 did not. But I didn&#8217;t see any issue in this case, because redo will reimage block 1 and block 3 correctly? We don&#8217;t care about if the disk is consistent or not, this is what redo in theory to handle, isn&#8217;t it?<br />
So I think I still didn&#8217;t understand innoDB , why it need &#8220;double write&#8221;? I am new to innoDB, not knowing any internal details, I just think this from a textbook point of view ^_^</p>
<p>thank you, it is so long time you post this great article, don&#8217;t know if you can still answer me. I am thinking this for days, but can&#8217;t understand yet.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2006/08/04/innodb-double-write/comment-page-1/#comment-766444</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Thu, 10 Jun 2010 17:38:25 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2006/08/04/innodb-double-write/#comment-766444</guid>
		<description>There are different techniques which can be used instead of double write buffer.</description>
		<content:encoded><![CDATA[<p>There are different techniques which can be used instead of double write buffer.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: qihua</title>
		<link>http://www.mysqlperformanceblog.com/2006/08/04/innodb-double-write/comment-page-1/#comment-758511</link>
		<dc:creator>qihua</dc:creator>
		<pubDate>Tue, 04 May 2010 06:58:37 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2006/08/04/innodb-double-write/#comment-758511</guid>
		<description>nice article.  But why oracle doesn&#039;t need it?</description>
		<content:encoded><![CDATA[<p>nice article.  But why oracle doesn&#8217;t need it?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Robert Milkowski</title>
		<link>http://www.mysqlperformanceblog.com/2006/08/04/innodb-double-write/comment-page-1/#comment-705681</link>
		<dc:creator>Robert Milkowski</dc:creator>
		<pubDate>Mon, 04 Jan 2010 15:50:11 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2006/08/04/innodb-double-write/#comment-705681</guid>
		<description>btw: if MySQL is running on ZFS then you can safely disable innodb doublewrites as ZFS always guarantees that either entire write completes or nothing is updated.</description>
		<content:encoded><![CDATA[<p>btw: if MySQL is running on ZFS then you can safely disable innodb doublewrites as ZFS always guarantees that either entire write completes or nothing is updated.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: TODO: Double Write Buffers - Bit Mojo &#8211; Hiram Chirino</title>
		<link>http://www.mysqlperformanceblog.com/2006/08/04/innodb-double-write/comment-page-1/#comment-666715</link>
		<dc:creator>TODO: Double Write Buffers - Bit Mojo &#8211; Hiram Chirino</dc:creator>
		<pubDate>Tue, 20 Oct 2009 02:31:07 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2006/08/04/innodb-double-write/#comment-666715</guid>
		<description>[...] to self: investigate implementing the Double Write Buffers idea in ActiveMQ. ActiveMQ keeps several indexes into the persistent messages that it&#8217;s [...]</description>
		<content:encoded><![CDATA[<p>[...] to self: investigate implementing the Double Write Buffers idea in ActiveMQ. ActiveMQ keeps several indexes into the persistent messages that it&#8217;s [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>

