<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Data mart or data warehouse?</title>
	<atom:link href="http://www.mysqlperformanceblog.com/2010/07/15/data-mart-or-data-warehouse/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mysqlperformanceblog.com/2010/07/15/data-mart-or-data-warehouse/</link>
	<description>Percona&#039;s MySQL &#38; InnoDB performance and scalability blog</description>
	<lastBuildDate>Sat, 11 Feb 2012 16:45:54 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
	<item>
		<title>By: Roland Bouman</title>
		<link>http://www.mysqlperformanceblog.com/2010/07/15/data-mart-or-data-warehouse/comment-page-1/#comment-828875</link>
		<dc:creator>Roland Bouman</dc:creator>
		<pubDate>Wed, 05 Oct 2011 07:33:11 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=3057#comment-828875</guid>
		<description>Hi Juan,

not trying to hijack the thread, but I co-authored a book on BI and data warehousing which is, even if I do say so myself, a pretty good mix between theory and hands-on. It&#039;s mainly about Pentaho, but it contains an extensive example case to build a (kimball-style) data warehouse using MySQL. You can find the book here on amazon:

http://www.amazon.com/Pentaho-Solutions-Business-Intelligence-Warehousing/dp/0470484322

You can get a sample chapter, toc and index here:

http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470484322.html</description>
		<content:encoded><![CDATA[<p>Hi Juan,</p>
<p>not trying to hijack the thread, but I co-authored a book on BI and data warehousing which is, even if I do say so myself, a pretty good mix between theory and hands-on. It&#8217;s mainly about Pentaho, but it contains an extensive example case to build a (kimball-style) data warehouse using MySQL. You can find the book here on amazon:</p>
<p><a href="http://www.amazon.com/Pentaho-Solutions-Business-Intelligence-Warehousing/dp/0470484322" rel="nofollow">http://www.amazon.com/Pentaho-Solutions-Business-Intelligence-Warehousing/dp/0470484322</a></p>
<p>You can get a sample chapter, toc and index here:</p>
<p><a href="http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470484322.html" rel="nofollow">http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470484322.html</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Juan</title>
		<link>http://www.mysqlperformanceblog.com/2010/07/15/data-mart-or-data-warehouse/comment-page-1/#comment-828733</link>
		<dc:creator>Juan</dc:creator>
		<pubDate>Tue, 04 Oct 2011 19:15:05 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=3057#comment-828733</guid>
		<description>Hi congrats for the article. It&#039;s very impressive. I&#039;d like to know when you will be finishing the others topics from the list.

I&#039;m learning the OLAP/OLTP/Cubes concepts and i need some guide. 

thanks</description>
		<content:encoded><![CDATA[<p>Hi congrats for the article. It&#8217;s very impressive. I&#8217;d like to know when you will be finishing the others topics from the list.</p>
<p>I&#8217;m learning the OLAP/OLTP/Cubes concepts and i need some guide. </p>
<p>thanks</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: faruk</title>
		<link>http://www.mysqlperformanceblog.com/2010/07/15/data-mart-or-data-warehouse/comment-page-1/#comment-795471</link>
		<dc:creator>faruk</dc:creator>
		<pubDate>Mon, 24 Jan 2011 23:14:56 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=3057#comment-795471</guid>
		<description>I have written a short paper about this subject, so anyone is welcome to read!
Check the link: http://faruk.ba/?p=87</description>
		<content:encoded><![CDATA[<p>I have written a short paper about this subject, so anyone is welcome to read!<br />
Check the link: <a href="http://faruk.ba/?p=87" rel="nofollow">http://faruk.ba/?p=87</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Roland Bouman</title>
		<link>http://www.mysqlperformanceblog.com/2010/07/15/data-mart-or-data-warehouse/comment-page-1/#comment-787373</link>
		<dc:creator>Roland Bouman</dc:creator>
		<pubDate>Thu, 16 Dec 2010 12:06:43 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=3057#comment-787373</guid>
		<description>Per,

indeed, solving many to many relationships in a star schema is a challenge. If you want to google for it, look for &quot;multi-valued dimensions&quot;
There are solutions though, and there isn&#039;t one &#039;right&#039; answer - it depends on the requirements. 

In some cases, it&#039;s acceptable to create a multivalued member in the dimension table: say, a list of categories. 
A slightly more structured solution is to create a separate flag column for each category (and yes, dimension table will need to be altered whenever a new category is added)
Another solution is to use a bridge table with an allocation factor. In this case the categories would be a separate dimension and have an intersection table with the fact table - the &quot;bridge table&quot;. In this table, you&#039;d store a factor that expresses the partial contribution of the dimension entry to the fact entry. I can&#039;t think of a good example for this approach in the product/category example, but I have set up an example in the &quot;Pentaho Solutions&quot; book that uses this approach to have an &quot;actor&quot; dimension table for film customer orders. 

The actor/film customer order example works like this: For each actor that stars in a film this bridge table contains an actor_id, and a film_id and a factor that is 1/#number of actors in the film. Sometimes this is called a weight and it serves to model the relative contribution of each actor. This way, if you do a query like &quot;What is the order value for films starring a particular actor&quot;, you can multiply the metric value from the fact table with the weight and still get a result that makes sense - kinda. You really need the weight for this type of query if you want to calculate the value of multiple actors: for example, if you&#039;re asking about the value of all customer orders for films starring Robert de Niro or Al Pacino, you want to prevent counting the films starring both Robert de Niro and Al Pacino twice.

All these approaches are explained here too: http://www.pythian.com/news/364/implementing-many-to-many-relationships-in-data-warehousing/
Or if you&#039;re interested, pick up a copy of &quot;Pentaho Solutions&quot; - apart from being an all-round pentaho starter&#039;s guide it also explains these basic data warehousing techniques, and illustrates them with examples.</description>
		<content:encoded><![CDATA[<p>Per,</p>
<p>indeed, solving many to many relationships in a star schema is a challenge. If you want to google for it, look for &#8220;multi-valued dimensions&#8221;<br />
There are solutions though, and there isn&#8217;t one &#8216;right&#8217; answer &#8211; it depends on the requirements. </p>
<p>In some cases, it&#8217;s acceptable to create a multivalued member in the dimension table: say, a list of categories.<br />
A slightly more structured solution is to create a separate flag column for each category (and yes, dimension table will need to be altered whenever a new category is added)<br />
Another solution is to use a bridge table with an allocation factor. In this case the categories would be a separate dimension and have an intersection table with the fact table &#8211; the &#8220;bridge table&#8221;. In this table, you&#8217;d store a factor that expresses the partial contribution of the dimension entry to the fact entry. I can&#8217;t think of a good example for this approach in the product/category example, but I have set up an example in the &#8220;Pentaho Solutions&#8221; book that uses this approach to have an &#8220;actor&#8221; dimension table for film customer orders. </p>
<p>The actor/film customer order example works like this: For each actor that stars in a film this bridge table contains an actor_id, and a film_id and a factor that is 1/#number of actors in the film. Sometimes this is called a weight and it serves to model the relative contribution of each actor. This way, if you do a query like &#8220;What is the order value for films starring a particular actor&#8221;, you can multiply the metric value from the fact table with the weight and still get a result that makes sense &#8211; kinda. You really need the weight for this type of query if you want to calculate the value of multiple actors: for example, if you&#8217;re asking about the value of all customer orders for films starring Robert de Niro or Al Pacino, you want to prevent counting the films starring both Robert de Niro and Al Pacino twice.</p>
<p>All these approaches are explained here too: <a href="http://www.pythian.com/news/364/implementing-many-to-many-relationships-in-data-warehousing/" rel="nofollow">http://www.pythian.com/news/364/implementing-many-to-many-relationships-in-data-warehousing/</a><br />
Or if you&#8217;re interested, pick up a copy of &#8220;Pentaho Solutions&#8221; &#8211; apart from being an all-round pentaho starter&#8217;s guide it also explains these basic data warehousing techniques, and illustrates them with examples.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Per</title>
		<link>http://www.mysqlperformanceblog.com/2010/07/15/data-mart-or-data-warehouse/comment-page-1/#comment-787352</link>
		<dc:creator>Per</dc:creator>
		<pubDate>Thu, 16 Dec 2010 10:02:19 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=3057#comment-787352</guid>
		<description>&quot;A normalized data warehouse schema might contain tables called items, categories and item_category. These three tables allow a user to determine which items belong to which categories, but this structure creates a large number of joins when many dimensions are involved. A data mart would collapse all of this information into an item dimension which would include the category information in the same row as the item information.&quot;

I don&#039;t get it. This schema suggests that there is a many-to-many relationship between items ans categories. (Those are considered dimensions, not facts, right?) But how can the items table row have all its categories in a single column?

(Yes, I have exactly this problem in my data mart with SCD, and it requires some brutal joins).

Your web page is otherwise excellent. First-class style; Clear, concise and complete. You should publish a book on the subject!</description>
		<content:encoded><![CDATA[<p>&#8220;A normalized data warehouse schema might contain tables called items, categories and item_category. These three tables allow a user to determine which items belong to which categories, but this structure creates a large number of joins when many dimensions are involved. A data mart would collapse all of this information into an item dimension which would include the category information in the same row as the item information.&#8221;</p>
<p>I don&#8217;t get it. This schema suggests that there is a many-to-many relationship between items ans categories. (Those are considered dimensions, not facts, right?) But how can the items table row have all its categories in a single column?</p>
<p>(Yes, I have exactly this problem in my data mart with SCD, and it requires some brutal joins).</p>
<p>Your web page is otherwise excellent. First-class style; Clear, concise and complete. You should publish a book on the subject!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: faruk</title>
		<link>http://www.mysqlperformanceblog.com/2010/07/15/data-mart-or-data-warehouse/comment-page-1/#comment-784972</link>
		<dc:creator>faruk</dc:creator>
		<pubDate>Wed, 01 Dec 2010 23:18:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=3057#comment-784972</guid>
		<description>I have written a short paper about this subject, so anyone is welcome to read!
Check the link: http://faruk.ba/site/?p=87</description>
		<content:encoded><![CDATA[<p>I have written a short paper about this subject, so anyone is welcome to read!<br />
Check the link: <a href="http://faruk.ba/site/?p=87" rel="nofollow">http://faruk.ba/site/?p=87</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: James</title>
		<link>http://www.mysqlperformanceblog.com/2010/07/15/data-mart-or-data-warehouse/comment-page-1/#comment-776286</link>
		<dc:creator>James</dc:creator>
		<pubDate>Thu, 30 Sep 2010 13:51:34 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=3057#comment-776286</guid>
		<description>hurray! Thanks for the reply</description>
		<content:encoded><![CDATA[<p>hurray! Thanks for the reply</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Justin Swanhart</title>
		<link>http://www.mysqlperformanceblog.com/2010/07/15/data-mart-or-data-warehouse/comment-page-1/#comment-776285</link>
		<dc:creator>Justin Swanhart</dc:creator>
		<pubDate>Thu, 30 Sep 2010 13:46:17 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=3057#comment-776285</guid>
		<description>Hi James,

I simply haven&#039;t had the free time lately to work on the next post.  

I am, however, going to be flying next week.   That will likely give me some time to work on this.

Stay tuned.  As long as all the stars line up properly the next post will be out sometime in the next two weeks.  :)</description>
		<content:encoded><![CDATA[<p>Hi James,</p>
<p>I simply haven&#8217;t had the free time lately to work on the next post.  </p>
<p>I am, however, going to be flying next week.   That will likely give me some time to work on this.</p>
<p>Stay tuned.  As long as all the stars line up properly the next post will be out sometime in the next two weeks.  <img src='http://www.mysqlperformanceblog.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: James</title>
		<link>http://www.mysqlperformanceblog.com/2010/07/15/data-mart-or-data-warehouse/comment-page-1/#comment-776272</link>
		<dc:creator>James</dc:creator>
		<pubDate>Thu, 30 Sep 2010 11:16:01 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=3057#comment-776272</guid>
		<description>These articles have been really interesting/useful. Will the rest of the six post series appear on the blog at some stage?</description>
		<content:encoded><![CDATA[<p>These articles have been really interesting/useful. Will the rest of the six post series appear on the blog at some stage?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Justin Swanhart</title>
		<link>http://www.mysqlperformanceblog.com/2010/07/15/data-mart-or-data-warehouse/comment-page-1/#comment-769469</link>
		<dc:creator>Justin Swanhart</dc:creator>
		<pubDate>Fri, 16 Jul 2010 19:57:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=3057#comment-769469</guid>
		<description>Thanks, I understand what you mean now.</description>
		<content:encoded><![CDATA[<p>Thanks, I understand what you mean now.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

