<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Speeding up GROUP BY if you want aproximate results</title>
	<atom:link href="http://www.mysqlperformanceblog.com/2008/03/07/speeding-up-group-by-if-you-want-aproximate-results/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mysqlperformanceblog.com/2008/03/07/speeding-up-group-by-if-you-want-aproximate-results/</link>
	<description>Everything about MySQL Performance</description>
	<lastBuildDate>Sat, 21 Nov 2009 05:23:57 -0800</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: aboyon blog &#187; Cosas no tan ciertas de CRC32() en MySQL</title>
		<link>http://www.mysqlperformanceblog.com/2008/03/07/speeding-up-group-by-if-you-want-aproximate-results/comment-page-1/#comment-495165</link>
		<dc:creator>aboyon blog &#187; Cosas no tan ciertas de CRC32() en MySQL</dc:creator>
		<pubDate>Tue, 03 Mar 2009 13:26:22 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2008/03/07/speeding-up-group-by-if-you-want-aproximate-results/#comment-495165</guid>
		<description>[...] otro dia leyendo el blog de MySQL Performance (MySQL Performance Blog&#8217;s) leia que hablaban del uso de CRC32() + BINARY en la consultas, y [...]</description>
		<content:encoded><![CDATA[<p>[...] otro dia leyendo el blog de MySQL Performance (MySQL Performance Blog&#8217;s) leia que hablaban del uso de CRC32() + BINARY en la consultas, y [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Benni</title>
		<link>http://www.mysqlperformanceblog.com/2008/03/07/speeding-up-group-by-if-you-want-aproximate-results/comment-page-1/#comment-351785</link>
		<dc:creator>Benni</dc:creator>
		<pubDate>Thu, 04 Sep 2008 12:15:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2008/03/07/speeding-up-group-by-if-you-want-aproximate-results/#comment-351785</guid>
		<description>Hi all,

I wonder if this procedure would seed up a very slow query in my application: There is a value &#039;x&#039; in each entry in my database between 0.0 and 100.0 (float). I want to divide this value into N &quot;bins&quot; and make mysql count how many entries are in each bin. My approach for e.g. 2 bins would be an INTERVAL query:

SELECT INTERVAL(x,50,101), count(*) FROM table GROUP BY 1;

Problem is: mysql can&#039;t use an index on x because of the INTERVAL function. Is there a way I can use BINARY CRC32() to speed things up? Are there better ways to solve this issue than using an INTERVAL?

Thank you!

Benni</description>
		<content:encoded><![CDATA[<p>Hi all,</p>
<p>I wonder if this procedure would seed up a very slow query in my application: There is a value &#8216;x&#8217; in each entry in my database between 0.0 and 100.0 (float). I want to divide this value into N &#8220;bins&#8221; and make mysql count how many entries are in each bin. My approach for e.g. 2 bins would be an INTERVAL query:</p>
<p>SELECT INTERVAL(x,50,101), count(*) FROM table GROUP BY 1;</p>
<p>Problem is: mysql can&#8217;t use an index on x because of the INTERVAL function. Is there a way I can use BINARY CRC32() to speed things up? Are there better ways to solve this issue than using an INTERVAL?</p>
<p>Thank you!</p>
<p>Benni</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2008/03/07/speeding-up-group-by-if-you-want-aproximate-results/comment-page-1/#comment-272116</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Wed, 09 Apr 2008 21:07:27 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2008/03/07/speeding-up-group-by-if-you-want-aproximate-results/#comment-272116</guid>
		<description>Upgrade to latest 5.0

simply CRC32(tag) should work.</description>
		<content:encoded><![CDATA[<p>Upgrade to latest 5.0</p>
<p>simply CRC32(tag) should work.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andrew Sutherland</title>
		<link>http://www.mysqlperformanceblog.com/2008/03/07/speeding-up-group-by-if-you-want-aproximate-results/comment-page-1/#comment-272107</link>
		<dc:creator>Andrew Sutherland</dc:creator>
		<pubDate>Wed, 09 Apr 2008 20:42:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2008/03/07/speeding-up-group-by-if-you-want-aproximate-results/#comment-272107</guid>
		<description>MySQL 5.0.41

&lt;code&gt;BINARY crc32(tag)&lt;/code&gt; is still much faster than &lt;code&gt;tag&lt;/code&gt;. Is there a different way I can cast it to work with integers possibly?</description>
		<content:encoded><![CDATA[<p>MySQL 5.0.41</p>
<p><code>BINARY crc32(tag)</code> is still much faster than <code>tag</code>. Is there a different way I can cast it to work with integers possibly?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2008/03/07/speeding-up-group-by-if-you-want-aproximate-results/comment-page-1/#comment-272102</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Wed, 09 Apr 2008 20:37:42 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2008/03/07/speeding-up-group-by-if-you-want-aproximate-results/#comment-272102</guid>
		<description>CRC32 in PHP is crazy in the sense it gives results on 32bit and 64bit platforms...   

In your case casting as BINARY you should get string which is slower.    Typically just CRC32 should work.    Your high collision rate was because it was running into the wall with max signed integer.  What MySQL version is this ?</description>
		<content:encoded><![CDATA[<p>CRC32 in PHP is crazy in the sense it gives results on 32bit and 64bit platforms&#8230;   </p>
<p>In your case casting as BINARY you should get string which is slower.    Typically just CRC32 should work.    Your high collision rate was because it was running into the wall with max signed integer.  What MySQL version is this ?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andrew Sutherland</title>
		<link>http://www.mysqlperformanceblog.com/2008/03/07/speeding-up-group-by-if-you-want-aproximate-results/comment-page-1/#comment-272078</link>
		<dc:creator>Andrew Sutherland</dc:creator>
		<pubDate>Wed, 09 Apr 2008 20:04:37 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2008/03/07/speeding-up-group-by-if-you-want-aproximate-results/#comment-272078</guid>
		<description>Hmm - I initially thought it was signed, because PHP&#039;s crc32 is signed. That is interesting though, the mysql docs definitely say unsigned. Do you think there&#039;s any danger in using my method? I haven&#039;t found any collisions yet.</description>
		<content:encoded><![CDATA[<p>Hmm &#8211; I initially thought it was signed, because PHP&#8217;s crc32 is signed. That is interesting though, the mysql docs definitely say unsigned. Do you think there&#8217;s any danger in using my method? I haven&#8217;t found any collisions yet.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2008/03/07/speeding-up-group-by-if-you-want-aproximate-results/comment-page-1/#comment-272074</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Wed, 09 Apr 2008 20:00:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2008/03/07/speeding-up-group-by-if-you-want-aproximate-results/#comment-272074</guid>
		<description>Andrew,

Something wrong with your query result - note   2147483647 value for CRC32 for few columns - this is MAX value for SIGNED int while CRC32 should be unsigned.  Also for BINARY CRC32 you can get above this value. You might have found a bug in MySQL :)</description>
		<content:encoded><![CDATA[<p>Andrew,</p>
<p>Something wrong with your query result &#8211; note   2147483647 value for CRC32 for few columns &#8211; this is MAX value for SIGNED int while CRC32 should be unsigned.  Also for BINARY CRC32 you can get above this value. You might have found a bug in MySQL <img src='http://www.mysqlperformanceblog.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Andrew Sutherland</title>
		<link>http://www.mysqlperformanceblog.com/2008/03/07/speeding-up-group-by-if-you-want-aproximate-results/comment-page-1/#comment-272067</link>
		<dc:creator>Andrew Sutherland</dc:creator>
		<pubDate>Wed, 09 Apr 2008 19:49:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2008/03/07/speeding-up-group-by-if-you-want-aproximate-results/#comment-272067</guid>
		<description>I&#039;ve been playing with crc32 and saw some very high collision rates, so I investigated a bit further and discovered  GROUP BY was only taking 2^32 as a max. If I cast crc32 to BINARY though, my results worked perfectly.


mysql&gt; SELECT tag, COUNT(*) AS count, crc32(tag), BINARY crc32(tag)
    -&gt; FROM tags
    -&gt; GROUP BY BINARY crc32(tag)
    -&gt; ORDER BY count DESC
    -&gt; LIMIT 10
    -&gt; ;
+------------+-------+------------+-------------------+
&#124; tag        &#124; count &#124; crc32(tag) &#124; BINARY crc32(tag) &#124;
+------------+-------+------------+-------------------+
&#124; spanish    &#124;  4576 &#124;  874050868 &#124; 874050868         &#124; 
&#124; vocab      &#124;  4103 &#124; 1178479308 &#124; 1178479308        &#124; 
&#124; vocabulary &#124;  2786 &#124; 2147483647 &#124; 2425997691        &#124; 
&#124; french     &#124;  2247 &#124; 2147483647 &#124; 2943733342        &#124; 
&#124; english    &#124;  2087 &#124;  746783232 &#124; 746783232         &#124; 
&#124; science    &#124;  1957 &#124; 1729573288 &#124; 1729573288        &#124; 
&#124; latin      &#124;  1411 &#124; 1421320458 &#124; 1421320458        &#124; 
&#124; chapter    &#124;  1274 &#124; 2147483647 &#124; 4186027310        &#124; 
&#124; history    &#124;  1171 &#124;  666529867 &#124; 666529867         &#124; 
&#124; words      &#124;   939 &#124; 1904025228 &#124; 1904025228        &#124; 
+------------+-------+------------+-------------------+
10 rows in set (0.32 sec)


Notice that both &lt;em&gt;chapter&lt;/em&gt;, &lt;em&gt;vocabulary&lt;/em&gt;, and &lt;em&gt;french&lt;/em&gt; come up with the same &lt;code&gt;crc32&lt;/code&gt; values, but different &lt;code&gt;BINARY crc32&lt;/code&gt; values. So if you GROUP BY with regular &lt;code&gt;crc32&lt;/code&gt;, those will group together.

Thanks for the initial tip though, it&#039;s made a 30 second query  into a .32 second query.</description>
		<content:encoded><![CDATA[<p>I&#8217;ve been playing with crc32 and saw some very high collision rates, so I investigated a bit further and discovered  GROUP BY was only taking 2^32 as a max. If I cast crc32 to BINARY though, my results worked perfectly.</p>
<p>mysql&gt; SELECT tag, COUNT(*) AS count, crc32(tag), BINARY crc32(tag)<br />
    -&gt; FROM tags<br />
    -&gt; GROUP BY BINARY crc32(tag)<br />
    -&gt; ORDER BY count DESC<br />
    -&gt; LIMIT 10<br />
    -&gt; ;<br />
+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-+<br />
| tag        | count | crc32(tag) | BINARY crc32(tag) |<br />
+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-+<br />
| spanish    |  4576 |  874050868 | 874050868         |<br />
| vocab      |  4103 | 1178479308 | 1178479308        |<br />
| vocabulary |  2786 | 2147483647 | 2425997691        |<br />
| french     |  2247 | 2147483647 | 2943733342        |<br />
| english    |  2087 |  746783232 | 746783232         |<br />
| science    |  1957 | 1729573288 | 1729573288        |<br />
| latin      |  1411 | 1421320458 | 1421320458        |<br />
| chapter    |  1274 | 2147483647 | 4186027310        |<br />
| history    |  1171 |  666529867 | 666529867         |<br />
| words      |   939 | 1904025228 | 1904025228        |<br />
+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-+<br />
10 rows in set (0.32 sec)</p>
<p>Notice that both <em>chapter</em>, <em>vocabulary</em>, and <em>french</em> come up with the same <code>crc32</code> values, but different <code>BINARY crc32</code> values. So if you GROUP BY with regular <code>crc32</code>, those will group together.</p>
<p>Thanks for the initial tip though, it&#8217;s made a 30 second query  into a .32 second query.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Log Buffer #88: a Carnival of the Vanities for DBAs</title>
		<link>http://www.mysqlperformanceblog.com/2008/03/07/speeding-up-group-by-if-you-want-aproximate-results/comment-page-1/#comment-252784</link>
		<dc:creator>Log Buffer #88: a Carnival of the Vanities for DBAs</dc:creator>
		<pubDate>Fri, 14 Mar 2008 16:55:54 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2008/03/07/speeding-up-group-by-if-you-want-aproximate-results/#comment-252784</guid>
		<description>[...] the MySQL Performance Blog, Peter Zaitsev advises on speeding up GROUP BY if you want approximate results, which you might if accuracy is expensive and unnecessary when all you want is, &#8220;close enough [...]</description>
		<content:encoded><![CDATA[<p>[...] the MySQL Performance Blog, Peter Zaitsev advises on speeding up GROUP BY if you want approximate results, which you might if accuracy is expensive and unnecessary when all you want is, &#8220;close enough [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2008/03/07/speeding-up-group-by-if-you-want-aproximate-results/comment-page-1/#comment-251377</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Tue, 11 Mar 2008 02:05:44 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2008/03/07/speeding-up-group-by-if-you-want-aproximate-results/#comment-251377</guid>
		<description>Yes indeed order by null can be help for group by queries. How good depends on the group by execution mode and amount of groups you get in result set.</description>
		<content:encoded><![CDATA[<p>Yes indeed order by null can be help for group by queries. How good depends on the group by execution mode and amount of groups you get in result set.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
