<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Multi-Column IN clause - Unexpected MySQL Issue</title>
	<atom:link href="http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/</link>
	<description>Everything about MySQL Performance</description>
	<pubDate>Mon, 08 Sep 2008 05:05:54 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
		<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#comment-296540</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Thu, 08 May 2008 00:19:25 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#comment-296540</guid>
		<description>Ken,

Because CRC32 is just 32bits 4bytes  compressed url would take much longer.

Of course if you can sue some simple compression like   you need to store image URLs and so would store 123 instead of  img.site.com/123.jpg but for general urls you will not compress them even close to this number</description>
		<content:encoded><![CDATA[<p>Ken,</p>
<p>Because CRC32 is just 32bits 4bytes  compressed url would take much longer.</p>
<p>Of course if you can sue some simple compression like   you need to store image URLs and so would store 123 instead of  img.site.com/123.jpg but for general urls you will not compress them even close to this number</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ken</title>
		<link>http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#comment-296390</link>
		<dc:creator>Ken</dc:creator>
		<pubDate>Wed, 07 May 2008 15:52:49 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#comment-296390</guid>
		<description>Peter,
Why use CRC32 at all... have your app use a common compression algorithm for the url, store the result... it will be unique, save space, etc....</description>
		<content:encoded><![CDATA[<p>Peter,<br />
Why use CRC32 at all&#8230; have your app use a common compression algorithm for the url, store the result&#8230; it will be unique, save space, etc&#8230;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#comment-274424</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Sun, 13 Apr 2008 07:32:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#comment-274424</guid>
		<description>Ries, 

I do not understand you. You can perfectly check EXACTLY if URL exists or not using plain SQL statement.  Simply only CRC part will be checked by index and post-checking will be done after full row is read. 

Now planning your database structure you plan it for certain queries. If you need to serve arbitrary queries you design schema appropriately. If you only need some queries but you need them really fast you design schema differently. 

Think about it in car terms.  If you only want to entertain your girlfriend and get yourself to work 2 seater sportscar may do good job for you.   If you may need to load 7 people or load a sofa Minivan works better.  So what ? You can't say one or another is better without knowing the purpose.

With domain table you propose certain way of normalization which is good for some applications again.

For some workloads prefix indexes may work or other index structures.  

Speaking about 1-10b rows is a prototype size so we're rather picky on optimizing queries.</description>
		<content:encoded><![CDATA[<p>Ries, </p>
<p>I do not understand you. You can perfectly check EXACTLY if URL exists or not using plain SQL statement.  Simply only CRC part will be checked by index and post-checking will be done after full row is read. </p>
<p>Now planning your database structure you plan it for certain queries. If you need to serve arbitrary queries you design schema appropriately. If you only need some queries but you need them really fast you design schema differently. </p>
<p>Think about it in car terms.  If you only want to entertain your girlfriend and get yourself to work 2 seater sportscar may do good job for you.   If you may need to load 7 people or load a sofa Minivan works better.  So what ? You can&#8217;t say one or another is better without knowing the purpose.</p>
<p>With domain table you propose certain way of normalization which is good for some applications again.</p>
<p>For some workloads prefix indexes may work or other index structures.  </p>
<p>Speaking about 1-10b rows is a prototype size so we&#8217;re rather picky on optimizing queries.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ries</title>
		<link>http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#comment-273996</link>
		<dc:creator>ries</dc:creator>
		<pubDate>Sat, 12 Apr 2008 13:25:26 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#comment-273996</guid>
		<description>I am sorry but what is the use on having an CRC in your table?? You could only really figure out IF a URL exists, and only with some certainty so you need to check your result set . USE a SP for that, don't even think to do this on the application level.

What if your boss comes to you and asks you : hey dude, how many URL's do you have in your database that points to the website http://members.aye.net/

For sure you HAVE to do something like this SELECT * FROM table WHERE url LIKE ('http://members.aye.net/%');

What I would do is may be create a domain table to store domains only based on CRC32,
then create 26 tables for your URL's and partition your URI's based on the URI

so table domain stores : www.dell.com, www.vantwisk.nl, www.mysqlperformanceblog,com
then table URI stores : 2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#more-361, blabla.html, myblog/entry/bla.html

On the doamin table you can create an idnex by CRC, but I would just use the.
On table URI you simple create an index on the first XX characters to keep index size low.

Data retrieval is done using a couple of stored procedures to get the right data and filter any duplicates.

Ries
PS: you could partition your tables if you need more. Since you didn't mention how many rows you ant to store it's hard to guess.... 
You did mention massive, how much is massive for you??? in the order of 1000mil records???

Ries</description>
		<content:encoded><![CDATA[<p>I am sorry but what is the use on having an CRC in your table?? You could only really figure out IF a URL exists, and only with some certainty so you need to check your result set . USE a SP for that, don&#8217;t even think to do this on the application level.</p>
<p>What if your boss comes to you and asks you : hey dude, how many URL&#8217;s do you have in your database that points to the website <a href="http://members.aye.net/" rel="nofollow">http://members.aye.net/</a></p>
<p>For sure you HAVE to do something like this SELECT * FROM table WHERE url LIKE (&#8217;http://members.aye.net/%&#8217;);</p>
<p>What I would do is may be create a domain table to store domains only based on CRC32,<br />
then create 26 tables for your URL&#8217;s and partition your URI&#8217;s based on the URI</p>
<p>so table domain stores : <a href="http://www.dell.com" rel="nofollow">http://www.dell.com</a>, <a href="http://www.vantwisk.nl" rel="nofollow">http://www.vantwisk.nl</a>, <a href="http://www.mysqlperformanceblog,com" rel="nofollow">http://www.mysqlperformanceblog,com</a><br />
then table URI stores : 2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#more-361, blabla.html, myblog/entry/bla.html</p>
<p>On the doamin table you can create an idnex by CRC, but I would just use the.<br />
On table URI you simple create an index on the first XX characters to keep index size low.</p>
<p>Data retrieval is done using a couple of stored procedures to get the right data and filter any duplicates.</p>
<p>Ries<br />
PS: you could partition your tables if you need more. Since you didn&#8217;t mention how many rows you ant to store it&#8217;s hard to guess&#8230;.<br />
You did mention massive, how much is massive for you??? in the order of 1000mil records???</p>
<p>Ries</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#comment-267626</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Sun, 06 Apr 2008 17:47:42 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#comment-267626</guid>
		<description>Roland,

It has to do with IN, your query works fine.

mysql&gt; EXPLAIN SELECT url FROM 124pages.124pages WHERE (url_crc,url) = (484036220,'http://www.dell.com/');
+----+-------------+----------+------+---------------+---------+---------+-------+------+-------------+
&#124; id &#124; select_type &#124; table    &#124; type &#124; possible_keys &#124; key     &#124; key_len &#124; ref   &#124; rows &#124; Extra       &#124;
+----+-------------+----------+------+---------------+---------+---------+-------+------+-------------+
&#124;  1 &#124; SIMPLE      &#124; 124pages &#124; ref  &#124; url_crc       &#124; url_crc &#124; 4       &#124; const &#124;    1 &#124; Using where &#124;
+----+-------------+----------+------+---------------+---------+---------+-------+------+-------------+
1 row in set (0.03 sec)</description>
		<content:encoded><![CDATA[<p>Roland,</p>
<p>It has to do with IN, your query works fine.</p>
<p>mysql> EXPLAIN SELECT url FROM 124pages.124pages WHERE (url_crc,url) = (484036220,&#8217;http://www.dell.com/&#8217;);<br />
+&#8212;-+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;-+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;+&#8212;&#8212;-+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;-+<br />
| id | select_type | table    | type | possible_keys | key     | key_len | ref   | rows | Extra       |<br />
+&#8212;-+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;-+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;+&#8212;&#8212;-+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;-+<br />
|  1 | SIMPLE      | 124pages | ref  | url_crc       | url_crc | 4       | const |    1 | Using where |<br />
+&#8212;-+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;-+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;+&#8212;&#8212;-+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;-+<br />
1 row in set (0.03 sec)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Roland Bouman</title>
		<link>http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#comment-267265</link>
		<dc:creator>Roland Bouman</dc:creator>
		<pubDate>Sun, 06 Apr 2008 09:02:42 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#comment-267265</guid>
		<description>Hi Peter,

I know it shouldn't matter, but have you tried:

EXPLAIN SELECT url FROM 124pages.124pages WHERE (url_crc,url) = (484036220,'http://www.dell.com/');

to see if it is due to IN or rather due to multiple columns?

If there is a difference, it seems likely that it is a bug, which may turn out to be trivial to fix.</description>
		<content:encoded><![CDATA[<p>Hi Peter,</p>
<p>I know it shouldn&#8217;t matter, but have you tried:</p>
<p>EXPLAIN SELECT url FROM 124pages.124pages WHERE (url_crc,url) = (484036220,&#8217;http://www.dell.com/&#8217;);</p>
<p>to see if it is due to IN or rather due to multiple columns?</p>
<p>If there is a difference, it seems likely that it is a bug, which may turn out to be trivial to fix.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#comment-266947</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Sat, 05 Apr 2008 20:53:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#comment-266947</guid>
		<description>Ruslan,

I do not think prepared statements will be faster if you're doing one by one checks - the savings on query parsing are much smaller than loss by having many roundtrips. 

If you would need to check 1000000  urls such a way you can use prepared statements together with batches and check them by 1000 or something like that.  This likely would be the most efficient approach.</description>
		<content:encoded><![CDATA[<p>Ruslan,</p>
<p>I do not think prepared statements will be faster if you&#8217;re doing one by one checks - the savings on query parsing are much smaller than loss by having many roundtrips. </p>
<p>If you would need to check 1000000  urls such a way you can use prepared statements together with batches and check them by 1000 or something like that.  This likely would be the most efficient approach.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#comment-266945</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Sat, 05 Apr 2008 20:51:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#comment-266945</guid>
		<description>Tom,

In this case it is easy enough to get the query which works.  There are many cases when you can't get MySQL to do what you want - consider for example bunch of subqueries, in these cases I would handle thing on the application - ie do SELECT create IN list and generate second query.</description>
		<content:encoded><![CDATA[<p>Tom,</p>
<p>In this case it is easy enough to get the query which works.  There are many cases when you can&#8217;t get MySQL to do what you want - consider for example bunch of subqueries, in these cases I would handle thing on the application - ie do SELECT create IN list and generate second query.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ruslan Zakirov</title>
		<link>http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#comment-266885</link>
		<dc:creator>Ruslan Zakirov</dc:creator>
		<pubDate>Sat, 05 Apr 2008 19:17:11 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#comment-266885</guid>
		<description>I think that at some number prepared statement will be faster than bulk.

May be UNION ALL is more native representation.

Also, wonder why mysql even doesn't consider index merge.</description>
		<content:encoded><![CDATA[<p>I think that at some number prepared statement will be faster than bulk.</p>
<p>May be UNION ALL is more native representation.</p>
<p>Also, wonder why mysql even doesn&#8217;t consider index merge.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: tom</title>
		<link>http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#comment-266795</link>
		<dc:creator>tom</dc:creator>
		<pubDate>Sat, 05 Apr 2008 17:32:17 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#comment-266795</guid>
		<description>Hello Peter, I agree in part with you; this is the same as to use or not stored procedures :)
is it better to have the logic in mysql or in the application? I think the the answer is "depends", as perhaps it's in this case.
cpu time spent in scripting languages is usually worse then cpu time spent in mysql; but if using odd queries avoid correct using of indexes or incur in more disk seekes perhaps it's better application logic postprocessing.

As in many other cases the best things is to make tests against your own data</description>
		<content:encoded><![CDATA[<p>Hello Peter, I agree in part with you; this is the same as to use or not stored procedures <img src='http://www.mysqlperformanceblog.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /><br />
is it better to have the logic in mysql or in the application? I think the the answer is &#8220;depends&#8221;, as perhaps it&#8217;s in this case.<br />
cpu time spent in scripting languages is usually worse then cpu time spent in mysql; but if using odd queries avoid correct using of indexes or incur in more disk seekes perhaps it&#8217;s better application logic postprocessing.</p>
<p>As in many other cases the best things is to make tests against your own data</p>
]]></content:encoded>
	</item>
</channel>
</rss>
