<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Countless storage engines</title>
	<atom:link href="http://www.mysqlperformanceblog.com/2007/05/03/countless-storage-engines/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mysqlperformanceblog.com/2007/05/03/countless-storage-engines/</link>
	<description>Everything about MySQL Performance</description>
	<pubDate>Wed, 01 Oct 2008 00:00:06 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
		<item>
		<title>By: Ryan</title>
		<link>http://www.mysqlperformanceblog.com/2007/05/03/countless-storage-engines/#comment-119148</link>
		<dc:creator>Ryan</dc:creator>
		<pubDate>Sat, 05 May 2007 16:41:46 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2007/05/03/countless-storage-engines/#comment-119148</guid>
		<description>About clustered indexes. It seems to me that you could use innodb on tables where you rely on clustered index and give innodb a small buffer pool. If you're mainly doing lookups on these tables by clustered index these don't necessarily need to be in the buffer pool. If most of the accesses were going to be cache hits then you probably wouldn't need the table clusted to begin with. Then for all other tables where you don't need your data clustered you could use Falcon with a large page and row cache and benefit from the better use of memory (especially in the case of the row cache). If you could fit twice as many records in memory with falcon than innodb then you could see a large improvement in performance in most common cases where you're disk bound on reads. At this point really falcon is a good year or two out before I would conser using it in a production environment though. Also I need to find out how well things would work in a mixed mode environment with some tables innodb and some falcon.</description>
		<content:encoded><![CDATA[<p>About clustered indexes. It seems to me that you could use innodb on tables where you rely on clustered index and give innodb a small buffer pool. If you&#8217;re mainly doing lookups on these tables by clustered index these don&#8217;t necessarily need to be in the buffer pool. If most of the accesses were going to be cache hits then you probably wouldn&#8217;t need the table clusted to begin with. Then for all other tables where you don&#8217;t need your data clustered you could use Falcon with a large page and row cache and benefit from the better use of memory (especially in the case of the row cache). If you could fit twice as many records in memory with falcon than innodb then you could see a large improvement in performance in most common cases where you&#8217;re disk bound on reads. At this point really falcon is a good year or two out before I would conser using it in a production environment though. Also I need to find out how well things would work in a mixed mode environment with some tables innodb and some falcon.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2007/05/03/countless-storage-engines/#comment-118782</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Fri, 04 May 2007 21:54:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2007/05/03/countless-storage-engines/#comment-118782</guid>
		<description>Vadim, 

My comments on this post.

1) PBXT:  "Never update the row" as far as I understand really applies to "Variable length" row portion - the fixed length portion still needs to be updated to link to link to previous row version.  I think the assumption in PBXT is to have fixed row portion small and so this file fits in memory.   In general PBXT seems to be very optimized for Blobs handling and handling long rows in general.   Also worth to note currently PBXT really treats different databases as different instances which can cause many gotchas if you have transactions spawning multiple databases both for Commit and for sake of Repeatable Reads isolation mode.

2) Falcon.  It is worth to note MySQL official marketing never position MySQL as such per say (for obvious reasons) but it is indeed set as replacement for Innodb in many users mind.  Visiting customers I often hear the question of when they should migrate to Falcon and which benefits they will get.  I surely tell them it is too early to migrate and too early to say because Falcon is in aggressive development.    Due to very different concepts both in terms of performance properties and transaction implementation details Falcon is unlikely to be easy drop-in replacement for Innodb but for certain applications especially newly designed ones it offers some attractive features.  I however agree the lack of clustering by primary key support and not being able to use covering indexes means there is no way to get predictable data locality which is extremely important for large databases.

3) Solid - this one is actually better positioned as "Innodb Replacement" as it has much more close architecture and applications may be easy to port if performance will be adequate.  Solid also has clustering by primary key and unlike Innodb has key compression (one of major Innodb problems in my opinion). The problems with Solid based on listening their talk may be handling of long transactions (spilling bonsai tree to the disk may not be that efficient) and writing new pages to new locations which is likely to cause significant problems.

Comparing Transactional storage engines though one need to take into account a lot of variables - how stable is engine and integration (the last thing you want is loosing your data - it is worse than crashes or wrong query results). Crash recovery reliability and speed, including rare cases such as partial page writes etc.  This is where knowledge just will need to be accumulated. 

It is worth noticing different storage engines and features have various levels of readiness, if as I understand ScaleDB is not publicly available yet.  It is also worth noticing MySQL world is not free (as in beer) any more. Solids synchronous replication offering may be commercial only (It does not say on web site exactly yet) - NitroDB and Infobright are very special storage engines and not surprisingly commercial.   Plus Infobright one is currently Windows only and ReadOnly  (special loader-compressor must be used) as I understand.


P.S For Planet MySQL users who may think why Peter is Commenting Peter -  original post was made by Vadim - number of people write to MySQLPerformanceBlog these days but it is all showing up under Peter Zaitsev in planet MySQL.</description>
		<content:encoded><![CDATA[<p>Vadim, </p>
<p>My comments on this post.</p>
<p>1) PBXT:  &#8220;Never update the row&#8221; as far as I understand really applies to &#8220;Variable length&#8221; row portion - the fixed length portion still needs to be updated to link to link to previous row version.  I think the assumption in PBXT is to have fixed row portion small and so this file fits in memory.   In general PBXT seems to be very optimized for Blobs handling and handling long rows in general.   Also worth to note currently PBXT really treats different databases as different instances which can cause many gotchas if you have transactions spawning multiple databases both for Commit and for sake of Repeatable Reads isolation mode.</p>
<p>2) Falcon.  It is worth to note MySQL official marketing never position MySQL as such per say (for obvious reasons) but it is indeed set as replacement for Innodb in many users mind.  Visiting customers I often hear the question of when they should migrate to Falcon and which benefits they will get.  I surely tell them it is too early to migrate and too early to say because Falcon is in aggressive development.    Due to very different concepts both in terms of performance properties and transaction implementation details Falcon is unlikely to be easy drop-in replacement for Innodb but for certain applications especially newly designed ones it offers some attractive features.  I however agree the lack of clustering by primary key support and not being able to use covering indexes means there is no way to get predictable data locality which is extremely important for large databases.</p>
<p>3) Solid - this one is actually better positioned as &#8220;Innodb Replacement&#8221; as it has much more close architecture and applications may be easy to port if performance will be adequate.  Solid also has clustering by primary key and unlike Innodb has key compression (one of major Innodb problems in my opinion). The problems with Solid based on listening their talk may be handling of long transactions (spilling bonsai tree to the disk may not be that efficient) and writing new pages to new locations which is likely to cause significant problems.</p>
<p>Comparing Transactional storage engines though one need to take into account a lot of variables - how stable is engine and integration (the last thing you want is loosing your data - it is worse than crashes or wrong query results). Crash recovery reliability and speed, including rare cases such as partial page writes etc.  This is where knowledge just will need to be accumulated. </p>
<p>It is worth noticing different storage engines and features have various levels of readiness, if as I understand ScaleDB is not publicly available yet.  It is also worth noticing MySQL world is not free (as in beer) any more. Solids synchronous replication offering may be commercial only (It does not say on web site exactly yet) - NitroDB and Infobright are very special storage engines and not surprisingly commercial.   Plus Infobright one is currently Windows only and ReadOnly  (special loader-compressor must be used) as I understand.</p>
<p>P.S For Planet MySQL users who may think why Peter is Commenting Peter -  original post was made by Vadim - number of people write to MySQLPerformanceBlog these days but it is all showing up under Peter Zaitsev in planet MySQL.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2007/05/03/countless-storage-engines/#comment-118779</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Fri, 04 May 2007 21:28:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2007/05/03/countless-storage-engines/#comment-118779</guid>
		<description>Ann, 

Yes thank your for your comment.  I well understand implications and design choices you made. 

This however lives Falcon in very interesting position.  For small data sets when data mostly fits in memory  fact Falcon does not have clustering by primary key and covering indexes is not critical because it does not require a lot of random IO.   But in the same case the fact indexes are a bit longer does not give much problems, also the fact Innodb has to cache full pages not individual rows does not bother you too much.  Also optimized index retrieval does not really matter if data is in memory.


So this leaves rather narrow range for working set to memory ratio when Falcon optimizations will allow it to excel.   Ie Assume we have 16GB of memory and  10GB Innodb database which will be 7GB in Falcon (for example) and we assume working set matches database size for sake of argument. 

In case you get ie 20GB in Innodb and 14GB in falcon or 30GB in Innodb and 21GB in Falcon we can see the difference but when we get to 100GB Innodb vs 70GB of Falcon  the lack of clustered key and covering index may show serious problems. 

Again these are my assumptions so far - the live will show us how it really is. 

I think you widely underestimate amount if cases when covering index is a life saver.</description>
		<content:encoded><![CDATA[<p>Ann, </p>
<p>Yes thank your for your comment.  I well understand implications and design choices you made. </p>
<p>This however lives Falcon in very interesting position.  For small data sets when data mostly fits in memory  fact Falcon does not have clustering by primary key and covering indexes is not critical because it does not require a lot of random IO.   But in the same case the fact indexes are a bit longer does not give much problems, also the fact Innodb has to cache full pages not individual rows does not bother you too much.  Also optimized index retrieval does not really matter if data is in memory.</p>
<p>So this leaves rather narrow range for working set to memory ratio when Falcon optimizations will allow it to excel.   Ie Assume we have 16GB of memory and  10GB Innodb database which will be 7GB in Falcon (for example) and we assume working set matches database size for sake of argument. </p>
<p>In case you get ie 20GB in Innodb and 14GB in falcon or 30GB in Innodb and 21GB in Falcon we can see the difference but when we get to 100GB Innodb vs 70GB of Falcon  the lack of clustered key and covering index may show serious problems. </p>
<p>Again these are my assumptions so far - the live will show us how it really is. </p>
<p>I think you widely underestimate amount if cases when covering index is a life saver.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Pythian Group Blog &#187; Log Buffer #43: a Carnival of the Vanities for DBAs</title>
		<link>http://www.mysqlperformanceblog.com/2007/05/03/countless-storage-engines/#comment-118668</link>
		<dc:creator>Pythian Group Blog &#187; Log Buffer #43: a Carnival of the Vanities for DBAs</dc:creator>
		<pubDate>Fri, 04 May 2007 17:12:28 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2007/05/03/countless-storage-engines/#comment-118668</guid>
		<description>[...] Zaitsev also writes about MySQL&#8217;s &#8220;countless storage engines&#8221;, summarizing his impressions of InnoDB, PBTX, Falcon, Solid, NitroEDB, Infobright, and ScaleDB. [...]</description>
		<content:encoded><![CDATA[<p>[...] Zaitsev also writes about MySQL&#8217;s &#8220;countless storage engines&#8221;, summarizing his impressions of InnoDB, PBTX, Falcon, Solid, NitroEDB, Infobright, and ScaleDB. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ann Harrison</title>
		<link>http://www.mysqlperformanceblog.com/2007/05/03/countless-storage-engines/#comment-118605</link>
		<dc:creator>Ann Harrison</dc:creator>
		<pubDate>Fri, 04 May 2007 15:41:27 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2007/05/03/countless-storage-engines/#comment-118605</guid>
		<description>"Index coverage optimization" means that queries can
be resolved from the index without reference to data
Falcon doesn't carry transaction information in indexes
for two reasons:  the information is big relative to
an index entry (up to two transaction ids, at about 6 
bytes each) and the transaction ids are volatile - 
sometimes you need one, sometimes you need two, and
sometimes you don't need any.  Those factors equate
to bloating the indexes and making them more expensive 
to maintain.  

So, queries like:

   select name from people where name = 'Ann' 

require actually reading the records in people that are indexed
under 'Ann' to see if I might have changed my name to 'Her Most
Exalted Majesty'. There are real cases where resolving from the 
index is a good thing - joins with junction tables for one - e.g.
students / registrations / courses  where the registration record
consists only of the student_id and the course_id.  Not reading
those records is a good thing.

However, the choice that Falcon made was to have smaller, faster,
lower-maintenance indexes.  That's also the choice that InterBase,
Firebird, and Postgres made.  I've seen benchmarks where they do
well, and others where they don't.  Personally, I prefer that 
indexes work better in the general case, even if it costs in this
case, but YMMV.</description>
		<content:encoded><![CDATA[<p>&#8220;Index coverage optimization&#8221; means that queries can<br />
be resolved from the index without reference to data<br />
Falcon doesn&#8217;t carry transaction information in indexes<br />
for two reasons:  the information is big relative to<br />
an index entry (up to two transaction ids, at about 6<br />
bytes each) and the transaction ids are volatile -<br />
sometimes you need one, sometimes you need two, and<br />
sometimes you don&#8217;t need any.  Those factors equate<br />
to bloating the indexes and making them more expensive<br />
to maintain.  </p>
<p>So, queries like:</p>
<p>   select name from people where name = &#8216;Ann&#8217; </p>
<p>require actually reading the records in people that are indexed<br />
under &#8216;Ann&#8217; to see if I might have changed my name to &#8216;Her Most<br />
Exalted Majesty&#8217;. There are real cases where resolving from the<br />
index is a good thing - joins with junction tables for one - e.g.<br />
students / registrations / courses  where the registration record<br />
consists only of the student_id and the course_id.  Not reading<br />
those records is a good thing.</p>
<p>However, the choice that Falcon made was to have smaller, faster,<br />
lower-maintenance indexes.  That&#8217;s also the choice that InterBase,<br />
Firebird, and Postgres made.  I&#8217;ve seen benchmarks where they do<br />
well, and others where they don&#8217;t.  Personally, I prefer that<br />
indexes work better in the general case, even if it costs in this<br />
case, but YMMV.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
