<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Why you don&#8217;t want to shard.</title>
	<atom:link href="http://www.mysqlperformanceblog.com/2009/08/06/why-you-dont-want-to-shard/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mysqlperformanceblog.com/2009/08/06/why-you-dont-want-to-shard/</link>
	<description>Percona&#039;s MySQL &#38; InnoDB performance and scalability blog</description>
	<lastBuildDate>Sat, 11 Feb 2012 16:45:54 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
	<item>
		<title>By: Glenn</title>
		<link>http://www.mysqlperformanceblog.com/2009/08/06/why-you-dont-want-to-shard/comment-page-1/#comment-883124</link>
		<dc:creator>Glenn</dc:creator>
		<pubDate>Mon, 30 Jan 2012 17:25:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=911#comment-883124</guid>
		<description>You forgot a major use case: locality.  Having the backend server in the USA is bad for users in Italy, yet you often don&#039;t want completely distinct backends--users in one place should be able to talk to users elsewhere, and users should be able to be transparently migrated if their locale changes.</description>
		<content:encoded><![CDATA[<p>You forgot a major use case: locality.  Having the backend server in the USA is bad for users in Italy, yet you often don&#8217;t want completely distinct backends&#8211;users in one place should be able to talk to users elsewhere, and users should be able to be transparently migrated if their locale changes.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joe Dugan</title>
		<link>http://www.mysqlperformanceblog.com/2009/08/06/why-you-dont-want-to-shard/comment-page-1/#comment-774199</link>
		<dc:creator>Joe Dugan</dc:creator>
		<pubDate>Tue, 14 Sep 2010 02:01:48 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=911#comment-774199</guid>
		<description>This is a good article, and gives some very good points on why not to shard. But there are many reasons to shard as said in the earlier posts. I have seen some dramatic performance increases on mySql and Postgresql. There is a very good company called dbshards.com that has some very impressive benchmarks. I would recommend people look at all the options before they decide not to shard. 

I also read their article on reliable replication. It seems much better than the standard mySql replication.

Sharding often solves scalability issues without much headache.</description>
		<content:encoded><![CDATA[<p>This is a good article, and gives some very good points on why not to shard. But there are many reasons to shard as said in the earlier posts. I have seen some dramatic performance increases on mySql and Postgresql. There is a very good company called dbshards.com that has some very impressive benchmarks. I would recommend people look at all the options before they decide not to shard. </p>
<p>I also read their article on reliable replication. It seems much better than the standard mySql replication.</p>
<p>Sharding often solves scalability issues without much headache.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Clement Huge</title>
		<link>http://www.mysqlperformanceblog.com/2009/08/06/why-you-dont-want-to-shard/comment-page-1/#comment-771733</link>
		<dc:creator>Clement Huge</dc:creator>
		<pubDate>Wed, 18 Aug 2010 12:13:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=911#comment-771733</guid>
		<description>Hello,

I am also working with massive amount of data and I love sharding for a lot of reasons.
The main one is that even though it complexify the operations, it gives you a lot more flexibility and scalability.

I have worked with traditional approach and sharding approach. Right now, I work with a traditional company that has so much money that they would rather buff the hardware as crazy as possible to fit everything in a few boxes. 
I also work with less traditional and they prefered having a lot of boxes to serve shards.

When I was called to the rescue to the non traditional company, it was primarily because of the complex operations: how to publish new objects to all the shards (keeping versioning as well) and how to publish or replicate scripts as well. True! it was a challenge but it was fun to find the solution. 
The second challenge was about datawarehousing transactional data to purge data on the servers. 
Bottom line, we had very few indexes and had pretty much servers about 16GB of data, representing each one shard while we were server billions of transactions per month!
The more traditional management (the datawarehouse) was then the most difficult administration part (with defragmentation/archiving/partitioning, and getting datamart and datamining part.
based on my experience, I definitely prefer sharding which gives you a more intuitive solution for high availability and high performing having active/active nodes... oops shards (;-)).
Well you got my point ;-) It competes seriously peer-to-peer replication, mirroring, log shipping and clustering!</description>
		<content:encoded><![CDATA[<p>Hello,</p>
<p>I am also working with massive amount of data and I love sharding for a lot of reasons.<br />
The main one is that even though it complexify the operations, it gives you a lot more flexibility and scalability.</p>
<p>I have worked with traditional approach and sharding approach. Right now, I work with a traditional company that has so much money that they would rather buff the hardware as crazy as possible to fit everything in a few boxes.<br />
I also work with less traditional and they prefered having a lot of boxes to serve shards.</p>
<p>When I was called to the rescue to the non traditional company, it was primarily because of the complex operations: how to publish new objects to all the shards (keeping versioning as well) and how to publish or replicate scripts as well. True! it was a challenge but it was fun to find the solution.<br />
The second challenge was about datawarehousing transactional data to purge data on the servers.<br />
Bottom line, we had very few indexes and had pretty much servers about 16GB of data, representing each one shard while we were server billions of transactions per month!<br />
The more traditional management (the datawarehouse) was then the most difficult administration part (with defragmentation/archiving/partitioning, and getting datamart and datamining part.<br />
based on my experience, I definitely prefer sharding which gives you a more intuitive solution for high availability and high performing having active/active nodes&#8230; oops shards (;-)).<br />
Well you got my point <img src='http://www.mysqlperformanceblog.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' />  It competes seriously peer-to-peer replication, mirroring, log shipping and clustering!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Morgan Tocker</title>
		<link>http://www.mysqlperformanceblog.com/2009/08/06/why-you-dont-want-to-shard/comment-page-1/#comment-682806</link>
		<dc:creator>Morgan Tocker</dc:creator>
		<pubDate>Mon, 23 Nov 2009 17:28:20 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=911#comment-682806</guid>
		<description>@ Anthony - I wrote under bullet point 2, that sharding was often a response to being write heavy (&quot;too many writes&quot;).  I didn&#039;t forget about replication, this article just has a specific purpose ;)

Most applications can be broken down into shards (see my comment #6 for examples), but I don&#039;t dispute this can be difficult in others.  The example I often give for an application that won&#039;t shard is IMDB&#039;s database.  I don&#039;t think there are many good ways to divide actors up, and the movies they star in.

A small correction to your point about locking: Readers don&#039;t block writers in InnoDB because of MVCC, but MySQL does have locking.  Related to your point though is cross-box consistency, and it is an issue.  Peter wrote about this in comment #23.</description>
		<content:encoded><![CDATA[<p>@ Anthony &#8211; I wrote under bullet point 2, that sharding was often a response to being write heavy (&#8220;too many writes&#8221;).  I didn&#8217;t forget about replication, this article just has a specific purpose <img src='http://www.mysqlperformanceblog.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>Most applications can be broken down into shards (see my comment #6 for examples), but I don&#8217;t dispute this can be difficult in others.  The example I often give for an application that won&#8217;t shard is IMDB&#8217;s database.  I don&#8217;t think there are many good ways to divide actors up, and the movies they star in.</p>
<p>A small correction to your point about locking: Readers don&#8217;t block writers in InnoDB because of MVCC, but MySQL does have locking.  Related to your point though is cross-box consistency, and it is an issue.  Peter wrote about this in comment #23.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Peter van Dijk</title>
		<link>http://www.mysqlperformanceblog.com/2009/08/06/why-you-dont-want-to-shard/comment-page-1/#comment-682446</link>
		<dc:creator>Peter van Dijk</dc:creator>
		<pubDate>Mon, 23 Nov 2009 04:10:21 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=911#comment-682446</guid>
		<description>@Anthony,
I think it might be helpful to consider that sharding can be used as another level of abstraction in a complex system, specifically, (and obviously this is a fairly gross oversimplification, but probably still valid)

Where a raw disk has a filesystem placed on top of it to aid in organisation of the underlying data,
a database server typically will use table structures on top of a filesystem to further abstract the low level operations of storing information in files into something that can be searched, modified and more easily maintained in a structured form.

Similarly, shards, when implemented in a useful way, are able to abstract a given system in such a way that you&#039;re able to distribute storage across an arbitrary number of machines. In our case, we have shards in different physical locations, where things like replication are completely impractical.

By extension, the reason that sharding isnt really a good idea for most people is the same reason that, for example, if you want to copy your holiday photos onto a usb thumb drive, you dont use a database to do it. In many cases, that extra level of abstraction is completely useless and simply adds complexity.

There are a lot of people who have spent a lot of time researching this area, and, particularly in the web world, it is an invaluable tool for dealing with enormous data sets. I think the notion that it&#039;s &#039;hack that ignores 50 years of database theory&#039; probably just indicates the need for better education and understanding of how it can be used as a tool.</description>
		<content:encoded><![CDATA[<p>@Anthony,<br />
I think it might be helpful to consider that sharding can be used as another level of abstraction in a complex system, specifically, (and obviously this is a fairly gross oversimplification, but probably still valid)</p>
<p>Where a raw disk has a filesystem placed on top of it to aid in organisation of the underlying data,<br />
a database server typically will use table structures on top of a filesystem to further abstract the low level operations of storing information in files into something that can be searched, modified and more easily maintained in a structured form.</p>
<p>Similarly, shards, when implemented in a useful way, are able to abstract a given system in such a way that you&#8217;re able to distribute storage across an arbitrary number of machines. In our case, we have shards in different physical locations, where things like replication are completely impractical.</p>
<p>By extension, the reason that sharding isnt really a good idea for most people is the same reason that, for example, if you want to copy your holiday photos onto a usb thumb drive, you dont use a database to do it. In many cases, that extra level of abstraction is completely useless and simply adds complexity.</p>
<p>There are a lot of people who have spent a lot of time researching this area, and, particularly in the web world, it is an invaluable tool for dealing with enormous data sets. I think the notion that it&#8217;s &#8216;hack that ignores 50 years of database theory&#8217; probably just indicates the need for better education and understanding of how it can be used as a tool.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Anthony Berglas</title>
		<link>http://www.mysqlperformanceblog.com/2009/08/06/why-you-dont-want-to-shard/comment-page-1/#comment-682354</link>
		<dc:creator>Anthony Berglas</dc:creator>
		<pubDate>Mon, 23 Nov 2009 00:06:04 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=911#comment-682354</guid>
		<description>MASTER/SLAVE REPLICATION

You forgot to mention that if there are many more reads than writes (common case) then running slave, read only databases off a master provides scalability without having to resort to sharding.

Also, Sharding only works if the shards are largely independent, eg. GMail user accounts.  But sharding and an integrated system such as ERP is likely to slow it down as the shards need to communicate. 

Some databases (Oracle) can horizontally partition a table (and I hope thus a database) automatically based on key values.  That is the right approach.  Keep the logical/physcial separation.  Google style sharding and big table are a hack that ignores 50 years of database theory.

You also forgot to mention that if you take a couple of big tables out of a database, you loose locking and transactions.  Not a good option. (Oh, I forgot, MySql does not have locking anyway ;).)</description>
		<content:encoded><![CDATA[<p>MASTER/SLAVE REPLICATION</p>
<p>You forgot to mention that if there are many more reads than writes (common case) then running slave, read only databases off a master provides scalability without having to resort to sharding.</p>
<p>Also, Sharding only works if the shards are largely independent, eg. GMail user accounts.  But sharding and an integrated system such as ERP is likely to slow it down as the shards need to communicate. </p>
<p>Some databases (Oracle) can horizontally partition a table (and I hope thus a database) automatically based on key values.  That is the right approach.  Keep the logical/physcial separation.  Google style sharding and big table are a hack that ignores 50 years of database theory.</p>
<p>You also forgot to mention that if you take a couple of big tables out of a database, you loose locking and transactions.  Not a good option. (Oh, I forgot, MySql does not have locking anyway <img src='http://www.mysqlperformanceblog.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> .)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dathan Vance Pattishall</title>
		<link>http://www.mysqlperformanceblog.com/2009/08/06/why-you-dont-want-to-shard/comment-page-1/#comment-650159</link>
		<dc:creator>Dathan Vance Pattishall</dc:creator>
		<pubDate>Wed, 09 Sep 2009 00:03:17 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=911#comment-650159</guid>
		<description>I do agree you should only use it if you need to do realtime queries that are user facing across a very large dataset (10&#039;s of TBs)

Sharding is super easy if you know what you&#039;re doing. 1,2 are not an issue at all for me. I can isolate all traffic for super powers users to an in memory DB which will not be overrun if done correctly.</description>
		<content:encoded><![CDATA[<p>I do agree you should only use it if you need to do realtime queries that are user facing across a very large dataset (10&#8242;s of TBs)</p>
<p>Sharding is super easy if you know what you&#8217;re doing. 1,2 are not an issue at all for me. I can isolate all traffic for super powers users to an in memory DB which will not be overrun if done correctly.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Log Buffer</title>
		<link>http://www.mysqlperformanceblog.com/2009/08/06/why-you-dont-want-to-shard/comment-page-1/#comment-631747</link>
		<dc:creator>Log Buffer</dc:creator>
		<pubDate>Mon, 17 Aug 2009 16:12:44 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=911#comment-631747</guid>
		<description>&quot;On the MySQL Performance Blog, Morgan Tocker explains why you donâ€™t want to shard. (It has nothing to do with The Dark Crystal, I already checked.) [...]&quot;

&lt;a href=&quot;http://www.pythian.com/news/3561/log-buffer-158-a-carnival-of-the-vanities-for-dbas&quot; rel=&quot;nofollow&quot;&gt;Log Buffer #158&lt;/a&gt;</description>
		<content:encoded><![CDATA[<p>&#8220;On the MySQL Performance Blog, Morgan Tocker explains why you donâ€™t want to shard. (It has nothing to do with The Dark Crystal, I already checked.) [...]&#8221;</p>
<p><a href="http://www.pythian.com/news/3561/log-buffer-158-a-carnival-of-the-vanities-for-dbas" rel="nofollow">Log Buffer #158</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2009/08/06/why-you-dont-want-to-shard/comment-page-1/#comment-630023</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Fri, 14 Aug 2009 18:47:31 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=911#comment-630023</guid>
		<description>Peter,

We&#039;re not against sharding. In fact we help a lot of people how to shard properly. The problem is now it is such a buzz word so people with 1GB data set start sharing even if it is never going to grow over 10GB

The bad design is one issue the other however simply working with sharded data.  Really it is a lot depended on how tightly coupled is your data - for example hosting million of separate blogs is very easy to shard because there is no interdependencies.

The large data also indeed causes operational concerns - databases in TB range are often not fun in MySQL due to challenges with backups and expecially things as alter table.  http://www.mysqlperformanceblog.com/2006/10/08/small-things-are-better/

With backups - the concern is cross box consistency.  With single box you can restored backup from yesterday and it will be consistent (even though not up to date) - in sharded envinronment backups will correspond to different point in time and so would not be consistent.</description>
		<content:encoded><![CDATA[<p>Peter,</p>
<p>We&#8217;re not against sharding. In fact we help a lot of people how to shard properly. The problem is now it is such a buzz word so people with 1GB data set start sharing even if it is never going to grow over 10GB</p>
<p>The bad design is one issue the other however simply working with sharded data.  Really it is a lot depended on how tightly coupled is your data &#8211; for example hosting million of separate blogs is very easy to shard because there is no interdependencies.</p>
<p>The large data also indeed causes operational concerns &#8211; databases in TB range are often not fun in MySQL due to challenges with backups and expecially things as alter table.  <a href="http://www.mysqlperformanceblog.com/2006/10/08/small-things-are-better/" rel="nofollow">http://www.mysqlperformanceblog.com/2006/10/08/small-things-are-better/</a></p>
<p>With backups &#8211; the concern is cross box consistency.  With single box you can restored backup from yesterday and it will be consistent (even though not up to date) &#8211; in sharded envinronment backups will correspond to different point in time and so would not be consistent.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2009/08/06/why-you-dont-want-to-shard/comment-page-1/#comment-630016</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Fri, 14 Aug 2009 18:33:48 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=911#comment-630016</guid>
		<description>Brooks Johnson,

The functional partitioning makes sense under 2 conditions 

1) The functional partitions are independent enough, hence you do not need to join data frequently between them at all.  Putting different tables on different hosts is not the idea, putting &quot;Forum&quot; on one database host &quot;Wiki&quot; on another and &quot;Bug System&quot; on the third is.

2) The gain you&#039;re looking for is relatively small.  It is often easy to find 3 independent functions with one of them responsible for 50% of the load (and hence split giving you double capacity) but getting 10x this way is rarely possible</description>
		<content:encoded><![CDATA[<p>Brooks Johnson,</p>
<p>The functional partitioning makes sense under 2 conditions </p>
<p>1) The functional partitions are independent enough, hence you do not need to join data frequently between them at all.  Putting different tables on different hosts is not the idea, putting &#8220;Forum&#8221; on one database host &#8220;Wiki&#8221; on another and &#8220;Bug System&#8221; on the third is.</p>
<p>2) The gain you&#8217;re looking for is relatively small.  It is often easy to find 3 independent functions with one of them responsible for 50% of the load (and hence split giving you double capacity) but getting 10x this way is rarely possible</p>
]]></content:encoded>
	</item>
</channel>
</rss>

