<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Working with large data sets in MySQL</title>
	<atom:link href="http://www.mysqlperformanceblog.com/2007/07/05/working-with-large-data-sets-in-mysql/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mysqlperformanceblog.com/2007/07/05/working-with-large-data-sets-in-mysql/</link>
	<description>Percona&#039;s MySQL &#38; InnoDB performance and scalability blog</description>
	<lastBuildDate>Sat, 11 Feb 2012 16:45:54 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
	<item>
		<title>By: Macoway Advertising</title>
		<link>http://www.mysqlperformanceblog.com/2007/07/05/working-with-large-data-sets-in-mysql/comment-page-1/#comment-725707</link>
		<dc:creator>Macoway Advertising</dc:creator>
		<pubDate>Tue, 16 Feb 2010 08:53:05 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2007/07/05/working-with-large-data-sets-in-mysql/#comment-725707</guid>
		<description>A big internet Search Archive is run and kept on the first full sponsored / collective knowledge / search engine of the world. This System is build from scratch, is 100 % original and is builded in php/mysql .... but it will grow and things will get complicated maybe. Please Visit : www.Macoway.com / they are looking for good mysql database design to cover future development and handle large data sets , as they will probably deal with big databases.</description>
		<content:encoded><![CDATA[<p>A big internet Search Archive is run and kept on the first full sponsored / collective knowledge / search engine of the world. This System is build from scratch, is 100 % original and is builded in php/mysql &#8230;. but it will grow and things will get complicated maybe. Please Visit : <a href="http://www.Macoway.com" rel="nofollow">http://www.Macoway.com</a> / they are looking for good mysql database design to cover future development and handle large data sets , as they will probably deal with big databases.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: GAC</title>
		<link>http://www.mysqlperformanceblog.com/2007/07/05/working-with-large-data-sets-in-mysql/comment-page-1/#comment-715277</link>
		<dc:creator>GAC</dc:creator>
		<pubDate>Thu, 28 Jan 2010 09:25:44 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2007/07/05/working-with-large-data-sets-in-mysql/#comment-715277</guid>
		<description>I have a problem, I am using MyISAM engine and I am trying just to fill first the database with data. My data will fit in around 10 tables of 90Gb each one, After it is filled I only need to extract data, well also may be do some statistics. But no transaction, no updates, no inserts anymore. So, my problem is simply, how can I fill this information fast. I used one primary key with three index, and one index separated, this because after it is filled in, I should be able to make fast queries. So the point is, should I remove the index to fill in the data base, would it be faster if I use a different engine? I already played some how with the my.ini but still not improvement in the uploading performance. I take 1 hour for each 200 MB, so you can Imagine the time that it will take me to fill in all. I am uploading chunks of data using load file into, and it is very slow. I am also thinking in partition the table, but I read in other blocks that it might not be the best? althought I know there is no magic solution, I hope someone can give me some advise to dont loose too much time testing all possible combination of solutions.

Thanks in advance</description>
		<content:encoded><![CDATA[<p>I have a problem, I am using MyISAM engine and I am trying just to fill first the database with data. My data will fit in around 10 tables of 90Gb each one, After it is filled I only need to extract data, well also may be do some statistics. But no transaction, no updates, no inserts anymore. So, my problem is simply, how can I fill this information fast. I used one primary key with three index, and one index separated, this because after it is filled in, I should be able to make fast queries. So the point is, should I remove the index to fill in the data base, would it be faster if I use a different engine? I already played some how with the my.ini but still not improvement in the uploading performance. I take 1 hour for each 200 MB, so you can Imagine the time that it will take me to fill in all. I am uploading chunks of data using load file into, and it is very slow. I am also thinking in partition the table, but I read in other blocks that it might not be the best? althought I know there is no magic solution, I hope someone can give me some advise to dont loose too much time testing all possible combination of solutions.</p>
<p>Thanks in advance</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2007/07/05/working-with-large-data-sets-in-mysql/comment-page-1/#comment-146665</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Wed, 18 Jul 2007 11:22:38 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2007/07/05/working-with-large-data-sets-in-mysql/#comment-146665</guid>
		<description>Right. For Innodb it is strange  order by DESC is much slower. 
But anyway you should not be using large LIMIT.  I think I&#039;ve blogged about couple of workarounds but they are mostly best for static  data</description>
		<content:encoded><![CDATA[<p>Right. For Innodb it is strange  order by DESC is much slower.<br />
But anyway you should not be using large LIMIT.  I think I&#8217;ve blogged about couple of workarounds but they are mostly best for static  data</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ken</title>
		<link>http://www.mysqlperformanceblog.com/2007/07/05/working-with-large-data-sets-in-mysql/comment-page-1/#comment-146654</link>
		<dc:creator>Ken</dc:creator>
		<pubDate>Wed, 18 Jul 2007 10:45:50 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2007/07/05/working-with-large-data-sets-in-mysql/#comment-146654</guid>
		<description>I&#039;m using InnoDB tables, so I guess I can&#039;t use PACK_KEYS?

Is there any other way to go about overcoming this problem?

Could you perhaps do a blog on this very topic? I think many sites suffer this kind of dilemma.</description>
		<content:encoded><![CDATA[<p>I&#8217;m using InnoDB tables, so I guess I can&#8217;t use PACK_KEYS?</p>
<p>Is there any other way to go about overcoming this problem?</p>
<p>Could you perhaps do a blog on this very topic? I think many sites suffer this kind of dilemma.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2007/07/05/working-with-large-data-sets-in-mysql/comment-page-1/#comment-146595</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Wed, 18 Jul 2007 07:42:39 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2007/07/05/working-with-large-data-sets-in-mysql/#comment-146595</guid>
		<description>Right. You can&#039;t use large limits because skipping rows becomes expensive. 
Also DESC can be slower for MyISAM Tables with key compression you can set PACK_KEYS=0 for this table and check if it helps. 


Anyway this becomes rather offtopic for this post :)</description>
		<content:encoded><![CDATA[<p>Right. You can&#8217;t use large limits because skipping rows becomes expensive.<br />
Also DESC can be slower for MyISAM Tables with key compression you can set PACK_KEYS=0 for this table and check if it helps. </p>
<p>Anyway this becomes rather offtopic for this post <img src='http://www.mysqlperformanceblog.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ken</title>
		<link>http://www.mysqlperformanceblog.com/2007/07/05/working-with-large-data-sets-in-mysql/comment-page-1/#comment-146510</link>
		<dc:creator>Ken</dc:creator>
		<pubDate>Wed, 18 Jul 2007 03:19:31 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2007/07/05/working-with-large-data-sets-in-mysql/#comment-146510</guid>
		<description>Yes I do have an index there.  The problem is when you start asking for the last few pages, like this:

SELECT u.id from sys_user u WHERE u.account_status=&#039;Active&#039; ORDER BY u.last_updated desc limit 188290,10;

The problem is with the &quot;desc limit 188290,10&quot; it seems, because if I remove the &quot;desc&quot; or change 188290 to 1, the query time becomes much shorter.</description>
		<content:encoded><![CDATA[<p>Yes I do have an index there.  The problem is when you start asking for the last few pages, like this:</p>
<p>SELECT u.id from sys_user u WHERE u.account_status=&#8217;Active&#8217; ORDER BY u.last_updated desc limit 188290,10;</p>
<p>The problem is with the &#8220;desc limit 188290,10&#8243; it seems, because if I remove the &#8220;desc&#8221; or change 188290 to 1, the query time becomes much shorter.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2007/07/05/working-with-large-data-sets-in-mysql/comment-page-1/#comment-146249</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Tue, 17 Jul 2007 16:11:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2007/07/05/working-with-large-data-sets-in-mysql/#comment-146249</guid>
		<description>Ken,

Do you have index on last_updated ? 

Not to mention you can cheat and cache pages for a few seconds.</description>
		<content:encoded><![CDATA[<p>Ken,</p>
<p>Do you have index on last_updated ? </p>
<p>Not to mention you can cheat and cache pages for a few seconds.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ken</title>
		<link>http://www.mysqlperformanceblog.com/2007/07/05/working-with-large-data-sets-in-mysql/comment-page-1/#comment-146237</link>
		<dc:creator>Ken</dc:creator>
		<pubDate>Tue, 17 Jul 2007 15:16:52 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2007/07/05/working-with-large-data-sets-in-mysql/#comment-146237</guid>
		<description>What about cases where I&#039;m returning a long list of users, sorted by their last login times?  I&#039;m using paging (limit) in my mysql, but I&#039;ve noticed that when the result set is huge, this becomes a nightmare. Without the sorting things are fast, but with sorting, it slows down to a crawl, especially at the last pages.

What do you suggest here?  I&#039;ve seen some sites like Myspace, that just return the first 3,000 results (not sure what they&#039;ve done exactly).</description>
		<content:encoded><![CDATA[<p>What about cases where I&#8217;m returning a long list of users, sorted by their last login times?  I&#8217;m using paging (limit) in my mysql, but I&#8217;ve noticed that when the result set is huge, this becomes a nightmare. Without the sorting things are fast, but with sorting, it slows down to a crawl, especially at the last pages.</p>
<p>What do you suggest here?  I&#8217;ve seen some sites like Myspace, that just return the first 3,000 results (not sure what they&#8217;ve done exactly).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2007/07/05/working-with-large-data-sets-in-mysql/comment-page-1/#comment-145046</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Fri, 13 Jul 2007 18:31:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2007/07/05/working-with-large-data-sets-in-mysql/#comment-145046</guid>
		<description>Yes this should work OK. 

The Joins are the main showstopper for using a lot of tables.</description>
		<content:encoded><![CDATA[<p>Yes this should work OK. </p>
<p>The Joins are the main showstopper for using a lot of tables.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: howie</title>
		<link>http://www.mysqlperformanceblog.com/2007/07/05/working-with-large-data-sets-in-mysql/comment-page-1/#comment-145043</link>
		<dc:creator>howie</dc:creator>
		<pubDate>Fri, 13 Jul 2007 18:26:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2007/07/05/working-with-large-data-sets-in-mysql/#comment-145043</guid>
		<description>I&#039;ve been struggling with this problem of large data sets (well, large for me). I&#039;m afraid of switching to 5.1 and using partitioning. 

Do you basically roll your own partitioning? Do you just choose something to partition on and then create a separate table for it? For example, a big table called invoices might be split into invoices_2007, invoices_2006, etc.

I was thinking of doing this for one large table. I basically do two types of queries on the table, so I think I might need to mirror the data and partition on two separate fields. Using the invoices example, I would partition by year, and have another set of tables partitioned by vendor name (invoices_vendor_a, invoices_vendor_b, etc). 

Does this sounds like a reasonable approach? In my case, I won&#039;t have to join across partitions, so it seems like it should work.</description>
		<content:encoded><![CDATA[<p>I&#8217;ve been struggling with this problem of large data sets (well, large for me). I&#8217;m afraid of switching to 5.1 and using partitioning. </p>
<p>Do you basically roll your own partitioning? Do you just choose something to partition on and then create a separate table for it? For example, a big table called invoices might be split into invoices_2007, invoices_2006, etc.</p>
<p>I was thinking of doing this for one large table. I basically do two types of queries on the table, so I think I might need to mirror the data and partition on two separate fields. Using the invoices example, I would partition by year, and have another set of tables partitioned by vendor name (invoices_vendor_a, invoices_vendor_b, etc). </p>
<p>Does this sounds like a reasonable approach? In my case, I won&#8217;t have to join across partitions, so it seems like it should work.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

