<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Why MySQL could be slow with large tables ?</title>
	<atom:link href="http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/</link>
	<description>Everything about MySQL Performance</description>
	<lastBuildDate>Sat, 21 Nov 2009 05:23:57 -0800</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Marc</title>
		<link>http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/comment-page-3/#comment-665886</link>
		<dc:creator>Marc</dc:creator>
		<pubDate>Sat, 17 Oct 2009 07:22:30 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/#comment-665886</guid>
		<description>Sorry, I should say the the current BTREE index is the same data/order as the columns (Val #1, Val #2, Val #3, Val #4)</description>
		<content:encoded><![CDATA[<p>Sorry, I should say the the current BTREE index is the same data/order as the columns (Val #1, Val #2, Val #3, Val #4)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Marc</title>
		<link>http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/comment-page-3/#comment-665884</link>
		<dc:creator>Marc</dc:creator>
		<pubDate>Sat, 17 Oct 2009 07:13:57 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/#comment-665884</guid>
		<description>Is there a point at which adding CSV values to an IN(val1, val2,...) clause starts to make an index lose it&#039;s efficiency?  My situation:
Large MyISAM table: 30 million recs, data: 1.2 GB
Data columns are like so: (INT val #1, INT val #2, INT val #3, VARCHAR val #4)

WHERE clause searches are being done in this manner
Val #1: always = one single integer value
Val #2: IN(1 to 4 CSV integer values)
Val #3: IN(unknown number of CSV integer values, probably max of 50)
Val #4: LIKE &#039;string%&#039;

I actually have not added the column/data for Val #3 yet.  Just doing searches as above on (Val #1, #2, #4) are very fast. I&#039;m worried that when I add Val #3, things will get quite slow.  So I&#039;m wondering, are there a certain number of CSV values that will make the IN() search actually slow down?  I&#039;m not worried if I only have a few in there.  But if I need to have &quot;Val #3 IN (1,2,3,4,5,6...50)&quot; will that make index access super slow?

Thanks.</description>
		<content:encoded><![CDATA[<p>Is there a point at which adding CSV values to an IN(val1, val2,&#8230;) clause starts to make an index lose it&#8217;s efficiency?  My situation:<br />
Large MyISAM table: 30 million recs, data: 1.2 GB<br />
Data columns are like so: (INT val #1, INT val #2, INT val #3, VARCHAR val #4)</p>
<p>WHERE clause searches are being done in this manner<br />
Val #1: always = one single integer value<br />
Val #2: IN(1 to 4 CSV integer values)<br />
Val #3: IN(unknown number of CSV integer values, probably max of 50)<br />
Val #4: LIKE &#8217;string%&#8217;</p>
<p>I actually have not added the column/data for Val #3 yet.  Just doing searches as above on (Val #1, #2, #4) are very fast. I&#8217;m worried that when I add Val #3, things will get quite slow.  So I&#8217;m wondering, are there a certain number of CSV values that will make the IN() search actually slow down?  I&#8217;m not worried if I only have a few in there.  But if I need to have &#8220;Val #3 IN (1,2,3,4,5,6&#8230;50)&#8221; will that make index access super slow?</p>
<p>Thanks.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Robin</title>
		<link>http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/comment-page-3/#comment-664661</link>
		<dc:creator>Robin</dc:creator>
		<pubDate>Tue, 13 Oct 2009 23:32:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/#comment-664661</guid>
		<description>We encountered the performance problem when we join two large tables. However, the problem is quite tricky:

We only select a range of records in one large table to join another large table. the join fields are indexed and the selection of the records for join uses primary key. 
the time for retrieving records between 1-20000, 20000-40000, ... is quite stable (about 5 seconds for each range). 

However, when it came to about 560000-580000 and above, the time became significant longer (more than 50 seconds).  I don&#039;t know why this happens and if any one also had this problem.</description>
		<content:encoded><![CDATA[<p>We encountered the performance problem when we join two large tables. However, the problem is quite tricky:</p>
<p>We only select a range of records in one large table to join another large table. the join fields are indexed and the selection of the records for join uses primary key.<br />
the time for retrieving records between 1-20000, 20000-40000, &#8230; is quite stable (about 5 seconds for each range). </p>
<p>However, when it came to about 560000-580000 and above, the time became significant longer (more than 50 seconds).  I don&#8217;t know why this happens and if any one also had this problem.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Roy</title>
		<link>http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/comment-page-3/#comment-632538</link>
		<dc:creator>Roy</dc:creator>
		<pubDate>Wed, 19 Aug 2009 00:06:31 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/#comment-632538</guid>
		<description>Hypothetical:

I have 500,000 user records.
Each user is going to have a score based on values from another table. (the average of 30 scores for each user)

How should I create table structure so I can do rankings for each user based on their
average score?

So rank 1 through to rank 500,000.  Want to be able to view -quickly- someones rank via SQL at anytime.
Is this possible?</description>
		<content:encoded><![CDATA[<p>Hypothetical:</p>
<p>I have 500,000 user records.<br />
Each user is going to have a score based on values from another table. (the average of 30 scores for each user)</p>
<p>How should I create table structure so I can do rankings for each user based on their<br />
average score?</p>
<p>So rank 1 through to rank 500,000.  Want to be able to view -quickly- someones rank via SQL at anytime.<br />
Is this possible?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: adnan</title>
		<link>http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/comment-page-3/#comment-567926</link>
		<dc:creator>adnan</dc:creator>
		<pubDate>Wed, 27 May 2009 19:20:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/#comment-567926</guid>
		<description>I&#039;m running my own custom ad server and saving all the impressions and clicks data in one single table. When the entries goes beyond 1 million the whole system gets too slow. The site and the ads load very slowly. 

Any suggestion?

Thanks
Adnan</description>
		<content:encoded><![CDATA[<p>I&#8217;m running my own custom ad server and saving all the impressions and clicks data in one single table. When the entries goes beyond 1 million the whole system gets too slow. The site and the ads load very slowly. </p>
<p>Any suggestion?</p>
<p>Thanks<br />
Adnan</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Harutyun</title>
		<link>http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/comment-page-3/#comment-567891</link>
		<dc:creator>Harutyun</dc:creator>
		<pubDate>Wed, 27 May 2009 15:57:14 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/#comment-567891</guid>
		<description>Hello Peter

I&#039;m currently working on banner software with statistics of clicks/views etc. I&#039;m testing with table with ~ 10 000 000 rows generated randomly. In first table I store all events with all information IDs (browser id, platform id, country/ip interval id etc.) along with time when event happened. The most common query in such cases is to get top N results for browsers/platforms/countries etc in any time period.

The main event table definition is
CREATE TABLE IF NOT EXISTS `stats` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `banner_id` int(11) unsigned NOT NULL,
  `location_id` tinyint(3) unsigned NOT NULL,
  `url_id` int(11) unsigned NOT NULL,
  `page_id` int(11) unsigned NOT NULL,
  `dateline` int(11) unsigned NOT NULL,
  `ip_interval` int(11) unsigned NOT NULL,
  `browser_id` tinyint(3) unsigned NOT NULL,
  `platform_id` tinyint(3) unsigned NOT NULL,
  PRIMARY KEY (`id`),
  KEY `bannerid` (`banner_id`),
  KEY `dateline` (`dateline`),
  KEY `ip_interval` (`ip_interval`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 PACK_KEYS=1 ROW_FORMAT=FIXED AUTO_INCREMENT=10100001 ;

The country codes stored in different table named iplist
CREATE TABLE IF NOT EXISTS `iplist` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `code` varchar(2) NOT NULL,
  `code_3` varchar(3) NOT NULL,
  `name` varchar(255) NOT NULL,
  `start` int(11) unsigned NOT NULL,
  `end` int(11) unsigned NOT NULL,
  PRIMARY KEY (`id`),
  KEY `code` (`code`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=91748 ;

So the query to get top 10 countries will be

SELECT iplist.code COUNT(stat.ip_interval ) AS count
FROM stats AS stat
LEFT JOIN iplist AS iplist ON (iplist.id=stat.ip_interval)
WHERE stat.dateline&gt;=1243382400 AND dateline&lt;1243466944
GROUP BY code
ORDER BY count DESC
LIMIT 0, 10

This query takes ~13 seconds to run (2GHZ Dual Core CPU, 2GB RAM).

EXPLAIN this query shows that it uses JOIN perfectly.

id 	select_type 	table 	type 	possible_keys 	key 	key_len 	ref 	rows 	Extra
1 	SIMPLE 	stat 	range 	dateline 	dateline 	4 	NULL 	277483 	Using where; Using temporary; Using filesort
1 	SIMPLE 	iplist 	eq_ref 	PRIMARY 	PRIMARY 	4 	vb38.stat.ip_interval 	1 	 

So have you any idea how this query can be optimized further or it&#039;s normal time for such query?</description>
		<content:encoded><![CDATA[<p>Hello Peter</p>
<p>I&#8217;m currently working on banner software with statistics of clicks/views etc. I&#8217;m testing with table with ~ 10 000 000 rows generated randomly. In first table I store all events with all information IDs (browser id, platform id, country/ip interval id etc.) along with time when event happened. The most common query in such cases is to get top N results for browsers/platforms/countries etc in any time period.</p>
<p>The main event table definition is<br />
CREATE TABLE IF NOT EXISTS `stats` (<br />
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,<br />
  `banner_id` int(11) unsigned NOT NULL,<br />
  `location_id` tinyint(3) unsigned NOT NULL,<br />
  `url_id` int(11) unsigned NOT NULL,<br />
  `page_id` int(11) unsigned NOT NULL,<br />
  `dateline` int(11) unsigned NOT NULL,<br />
  `ip_interval` int(11) unsigned NOT NULL,<br />
  `browser_id` tinyint(3) unsigned NOT NULL,<br />
  `platform_id` tinyint(3) unsigned NOT NULL,<br />
  PRIMARY KEY (`id`),<br />
  KEY `bannerid` (`banner_id`),<br />
  KEY `dateline` (`dateline`),<br />
  KEY `ip_interval` (`ip_interval`)<br />
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 PACK_KEYS=1 ROW_FORMAT=FIXED AUTO_INCREMENT=10100001 ;</p>
<p>The country codes stored in different table named iplist<br />
CREATE TABLE IF NOT EXISTS `iplist` (<br />
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,<br />
  `code` varchar(2) NOT NULL,<br />
  `code_3` varchar(3) NOT NULL,<br />
  `name` varchar(255) NOT NULL,<br />
  `start` int(11) unsigned NOT NULL,<br />
  `end` int(11) unsigned NOT NULL,<br />
  PRIMARY KEY (`id`),<br />
  KEY `code` (`code`)<br />
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=91748 ;</p>
<p>So the query to get top 10 countries will be</p>
<p>SELECT iplist.code COUNT(stat.ip_interval ) AS count<br />
FROM stats AS stat<br />
LEFT JOIN iplist AS iplist ON (iplist.id=stat.ip_interval)<br />
WHERE stat.dateline&gt;=1243382400 AND dateline&lt;1243466944<br />
GROUP BY code<br />
ORDER BY count DESC<br />
LIMIT 0, 10</p>
<p>This query takes ~13 seconds to run (2GHZ Dual Core CPU, 2GB RAM).</p>
<p>EXPLAIN this query shows that it uses JOIN perfectly.</p>
<p>id 	select_type 	table 	type 	possible_keys 	key 	key_len 	ref 	rows 	Extra<br />
1 	SIMPLE 	stat 	range 	dateline 	dateline 	4 	NULL 	277483 	Using where; Using temporary; Using filesort<br />
1 	SIMPLE 	iplist 	eq_ref 	PRIMARY 	PRIMARY 	4 	vb38.stat.ip_interval 	1 	 </p>
<p>So have you any idea how this query can be optimized further or it&#8217;s normal time for such query?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: rich</title>
		<link>http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/comment-page-3/#comment-513599</link>
		<dc:creator>rich</dc:creator>
		<pubDate>Fri, 20 Mar 2009 19:29:08 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/#comment-513599</guid>
		<description>Peter,

I am relatively new to working with databases and have a table that is about 7 million rows.  It seems to be taking forever (2+ hrs) if I need to make any changes to the table (i.e. adding columns, changing column names, etc.)  Is there an easy way to make these operations go faster?  Thanks!</description>
		<content:encoded><![CDATA[<p>Peter,</p>
<p>I am relatively new to working with databases and have a table that is about 7 million rows.  It seems to be taking forever (2+ hrs) if I need to make any changes to the table (i.e. adding columns, changing column names, etc.)  Is there an easy way to make these operations go faster?  Thanks!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jadfreak</title>
		<link>http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/comment-page-3/#comment-502256</link>
		<dc:creator>jadfreak</dc:creator>
		<pubDate>Wed, 11 Mar 2009 09:43:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/#comment-502256</guid>
		<description>hello peter,

i wanted to know your insight about my problem.

QUERY USED:
SELECT DISTINCT MachineName FROM LogDetails WHERE NOT MachineName IS NULL AND MachineName !=&#039;&#039; ORDER BY MachineName

SETUP A:
We have a web application that uses MS SQL database. When invoking a SELECT statement in LogDetails table(having approx. 4 million rows), the execution time is more or less 30 seconds.

SPECS of SETUP A:
OS: Windows XP Prof
Memory: 512MB


SETUP B:
It was decided to use MYSql instead of MS SQL. It took approx. 2.5-3 mins to invoke the same query used in SETUP A.

SPECS of SETUP B:
OS: Red Hat Linux 4
Memory: 512MB

QUESTION:
1) Why does MS SQL performs faster when they have the same specs though with different OS?
2) I know the memory can affect the performance but why has it not affected MS SQL much compared with MYSQL?
3) Any suggestions on how to improve SETUP B?</description>
		<content:encoded><![CDATA[<p>hello peter,</p>
<p>i wanted to know your insight about my problem.</p>
<p>QUERY USED:<br />
SELECT DISTINCT MachineName FROM LogDetails WHERE NOT MachineName IS NULL AND MachineName !=&#8221; ORDER BY MachineName</p>
<p>SETUP A:<br />
We have a web application that uses MS SQL database. When invoking a SELECT statement in LogDetails table(having approx. 4 million rows), the execution time is more or less 30 seconds.</p>
<p>SPECS of SETUP A:<br />
OS: Windows XP Prof<br />
Memory: 512MB</p>
<p>SETUP B:<br />
It was decided to use MYSql instead of MS SQL. It took approx. 2.5-3 mins to invoke the same query used in SETUP A.</p>
<p>SPECS of SETUP B:<br />
OS: Red Hat Linux 4<br />
Memory: 512MB</p>
<p>QUESTION:<br />
1) Why does MS SQL performs faster when they have the same specs though with different OS?<br />
2) I know the memory can affect the performance but why has it not affected MS SQL much compared with MYSQL?<br />
3) Any suggestions on how to improve SETUP B?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dealing Large in MySQL &#171; Myles Kadusale&#8217;s Blog</title>
		<link>http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/comment-page-2/#comment-489294</link>
		<dc:creator>Dealing Large in MySQL &#171; Myles Kadusale&#8217;s Blog</dc:creator>
		<pubDate>Wed, 25 Feb 2009 19:09:11 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/#comment-489294</guid>
		<description>[...]  Why MySQL could be slow with large tables [...]</description>
		<content:encoded><![CDATA[<p>[...]  Why MySQL could be slow with large tables [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: anon</title>
		<link>http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/comment-page-2/#comment-487561</link>
		<dc:creator>anon</dc:creator>
		<pubDate>Tue, 24 Feb 2009 04:03:05 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/#comment-487561</guid>
		<description>Question 1
I&#039;m just wondering what you mean by &#039;&#039;keeping data in memory&#039;&#039;?

&quot;the good solution is to make sure your data fits in memory as good as possible&quot;

Do you mean ensuring SELECTs return less data than the sytems&#039;s RAM?

Question 2
Big joins are bad. But, do you have any suggestions on how to circumvent them?
I&#039;m thinking of doing a number of queries to SELECT subsets of data into smaller TEMPORARY TABLES then doing a JOIN on them.</description>
		<content:encoded><![CDATA[<p>Question 1<br />
I&#8217;m just wondering what you mean by &#8221;keeping data in memory&#8221;?</p>
<p>&#8220;the good solution is to make sure your data fits in memory as good as possible&#8221;</p>
<p>Do you mean ensuring SELECTs return less data than the sytems&#8217;s RAM?</p>
<p>Question 2<br />
Big joins are bad. But, do you have any suggestions on how to circumvent them?<br />
I&#8217;m thinking of doing a number of queries to SELECT subsets of data into smaller TEMPORARY TABLES then doing a JOIN on them.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
