<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Multi-Column IN clause &#8211; Unexpected MySQL Issue</title>
	<atom:link href="http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/</link>
	<description>Everything about MySQL Performance</description>
	<lastBuildDate>Sat, 21 Nov 2009 05:23:57 -0800</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Rick</title>
		<link>http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/comment-page-1/#comment-662732</link>
		<dc:creator>Rick</dc:creator>
		<pubDate>Thu, 08 Oct 2009 02:57:34 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#comment-662732</guid>
		<description>I have a similar issue with the in clause (5.0.77).  It seems that if there are more than 2 items in the list, the optimizer abandons use of an existing index.

Any ideas (besides moving to 5.1.xx)?

mysql&gt; explain select * from projects where project_phase_id in (2,3);
+----+-------------+----------+-------+---------------+---------------+---------+------+------+-------------+
&#124; id &#124; select_type &#124; table    &#124; type  &#124; possible_keys &#124; key           &#124; key_len &#124; ref  &#124; rows &#124; Extra       &#124;
+----+-------------+----------+-------+---------------+---------------+---------+------+------+-------------+
&#124;  1 &#124; SIMPLE      &#124; projects &#124; range &#124; proj_phase_id &#124; proj_phase_id &#124; 4       &#124; NULL &#124;  137 &#124; Using where &#124;
+----+-------------+----------+-------+---------------+---------------+---------+------+------+-------------+
1 row in set (0.00 sec)

mysql&gt; explain select * from projects where project_phase_id in (2,3,4);
+----+-------------+----------+------+---------------+------+---------+------+------+-------------+
&#124; id &#124; select_type &#124; table    &#124; type &#124; possible_keys &#124; key  &#124; key_len &#124; ref  &#124; rows &#124; Extra       &#124;
+----+-------------+----------+------+---------------+------+---------+------+------+-------------+
&#124;  1 &#124; SIMPLE      &#124; projects &#124; ALL  &#124; proj_phase_id &#124; NULL &#124; NULL    &#124; NULL &#124;  757 &#124; Using where &#124;
+----+-------------+----------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)</description>
		<content:encoded><![CDATA[<p>I have a similar issue with the in clause (5.0.77).  It seems that if there are more than 2 items in the list, the optimizer abandons use of an existing index.</p>
<p>Any ideas (besides moving to 5.1.xx)?</p>
<p>mysql&gt; explain select * from projects where project_phase_id in (2,3);<br />
+&#8212;-+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;-+&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;-+<br />
| id | select_type | table    | type  | possible_keys | key           | key_len | ref  | rows | Extra       |<br />
+&#8212;-+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;-+&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;-+<br />
|  1 | SIMPLE      | projects | range | proj_phase_id | proj_phase_id | 4       | NULL |  137 | Using where |<br />
+&#8212;-+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;-+&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;-+<br />
1 row in set (0.00 sec)</p>
<p>mysql&gt; explain select * from projects where project_phase_id in (2,3,4);<br />
+&#8212;-+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;-+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;-+<br />
| id | select_type | table    | type | possible_keys | key  | key_len | ref  | rows | Extra       |<br />
+&#8212;-+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;-+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;-+<br />
|  1 | SIMPLE      | projects | ALL  | proj_phase_id | NULL | NULL    | NULL |  757 | Using where |<br />
+&#8212;-+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;-+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;-+<br />
1 row in set (0.00 sec)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Abhishek Soni</title>
		<link>http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/comment-page-1/#comment-362953</link>
		<dc:creator>Abhishek Soni</dc:creator>
		<pubDate>Sat, 18 Oct 2008 07:36:20 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#comment-362953</guid>
		<description>Hi all

I&#039;ve a solution for above mentioned queries. After hard digging I found out that when we use IN clause, MySQL first executes the outer query i.e. the PRIMARY query as shown in the EXPLAIN for the query and then it executes the inner subquery.

To bypass this problem I used JOIN and the execution time reduced drastically from few seconds to few msec&#039;s

I don&#039;t know what is schema of your table. But here is what I would like to propose as a possible solution

EXPLAIN SELECT  url FROM 106pages.106pages as t1 left outer join 106pages.106pages as t2 on t1.url = t2.url WHERE t2.url_crc
 IN ((2752937066,3799762538);


I have used self join in this case. I haven&#039;t executed the query because of lack of knowledge about table schema. hope it works.</description>
		<content:encoded><![CDATA[<p>Hi all</p>
<p>I&#8217;ve a solution for above mentioned queries. After hard digging I found out that when we use IN clause, MySQL first executes the outer query i.e. the PRIMARY query as shown in the EXPLAIN for the query and then it executes the inner subquery.</p>
<p>To bypass this problem I used JOIN and the execution time reduced drastically from few seconds to few msec&#8217;s</p>
<p>I don&#8217;t know what is schema of your table. But here is what I would like to propose as a possible solution</p>
<p>EXPLAIN SELECT  url FROM 106pages.106pages as t1 left outer join 106pages.106pages as t2 on t1.url = t2.url WHERE t2.url_crc<br />
 IN ((2752937066,3799762538);</p>
<p>I have used self join in this case. I haven&#8217;t executed the query because of lack of knowledge about table schema. hope it works.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/comment-page-1/#comment-296540</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Thu, 08 May 2008 00:19:25 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#comment-296540</guid>
		<description>Ken,

Because CRC32 is just 32bits 4bytes  compressed url would take much longer.

Of course if you can sue some simple compression like   you need to store image URLs and so would store 123 instead of  img.site.com/123.jpg but for general urls you will not compress them even close to this number</description>
		<content:encoded><![CDATA[<p>Ken,</p>
<p>Because CRC32 is just 32bits 4bytes  compressed url would take much longer.</p>
<p>Of course if you can sue some simple compression like   you need to store image URLs and so would store 123 instead of  img.site.com/123.jpg but for general urls you will not compress them even close to this number</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ken</title>
		<link>http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/comment-page-1/#comment-296390</link>
		<dc:creator>Ken</dc:creator>
		<pubDate>Wed, 07 May 2008 15:52:49 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#comment-296390</guid>
		<description>Peter,
Why use CRC32 at all... have your app use a common compression algorithm for the url, store the result... it will be unique, save space, etc....</description>
		<content:encoded><![CDATA[<p>Peter,<br />
Why use CRC32 at all&#8230; have your app use a common compression algorithm for the url, store the result&#8230; it will be unique, save space, etc&#8230;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/comment-page-1/#comment-274424</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Sun, 13 Apr 2008 07:32:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#comment-274424</guid>
		<description>Ries, 

I do not understand you. You can perfectly check EXACTLY if URL exists or not using plain SQL statement.  Simply only CRC part will be checked by index and post-checking will be done after full row is read. 

Now planning your database structure you plan it for certain queries. If you need to serve arbitrary queries you design schema appropriately. If you only need some queries but you need them really fast you design schema differently. 

Think about it in car terms.  If you only want to entertain your girlfriend and get yourself to work 2 seater sportscar may do good job for you.   If you may need to load 7 people or load a sofa Minivan works better.  So what ? You can&#039;t say one or another is better without knowing the purpose.

With domain table you propose certain way of normalization which is good for some applications again.

For some workloads prefix indexes may work or other index structures.  

Speaking about 1-10b rows is a prototype size so we&#039;re rather picky on optimizing queries.</description>
		<content:encoded><![CDATA[<p>Ries, </p>
<p>I do not understand you. You can perfectly check EXACTLY if URL exists or not using plain SQL statement.  Simply only CRC part will be checked by index and post-checking will be done after full row is read. </p>
<p>Now planning your database structure you plan it for certain queries. If you need to serve arbitrary queries you design schema appropriately. If you only need some queries but you need them really fast you design schema differently. </p>
<p>Think about it in car terms.  If you only want to entertain your girlfriend and get yourself to work 2 seater sportscar may do good job for you.   If you may need to load 7 people or load a sofa Minivan works better.  So what ? You can&#8217;t say one or another is better without knowing the purpose.</p>
<p>With domain table you propose certain way of normalization which is good for some applications again.</p>
<p>For some workloads prefix indexes may work or other index structures.  </p>
<p>Speaking about 1-10b rows is a prototype size so we&#8217;re rather picky on optimizing queries.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ries</title>
		<link>http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/comment-page-1/#comment-273996</link>
		<dc:creator>ries</dc:creator>
		<pubDate>Sat, 12 Apr 2008 13:25:26 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#comment-273996</guid>
		<description>I am sorry but what is the use on having an CRC in your table?? You could only really figure out IF a URL exists, and only with some certainty so you need to check your result set . USE a SP for that, don&#039;t even think to do this on the application level.

What if your boss comes to you and asks you : hey dude, how many URL&#039;s do you have in your database that points to the website http://members.aye.net/

For sure you HAVE to do something like this SELECT * FROM table WHERE url LIKE (&#039;http://members.aye.net/%&#039;);

What I would do is may be create a domain table to store domains only based on CRC32,
then create 26 tables for your URL&#039;s and partition your URI&#039;s based on the URI

so table domain stores : www.dell.com, www.vantwisk.nl, www.mysqlperformanceblog,com
then table URI stores : 2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#more-361, blabla.html, myblog/entry/bla.html

On the doamin table you can create an idnex by CRC, but I would just use the.
On table URI you simple create an index on the first XX characters to keep index size low.

Data retrieval is done using a couple of stored procedures to get the right data and filter any duplicates.

Ries
PS: you could partition your tables if you need more. Since you didn&#039;t mention how many rows you ant to store it&#039;s hard to guess.... 
You did mention massive, how much is massive for you??? in the order of 1000mil records???

Ries</description>
		<content:encoded><![CDATA[<p>I am sorry but what is the use on having an CRC in your table?? You could only really figure out IF a URL exists, and only with some certainty so you need to check your result set . USE a SP for that, don&#8217;t even think to do this on the application level.</p>
<p>What if your boss comes to you and asks you : hey dude, how many URL&#8217;s do you have in your database that points to the website <a href="http://members.aye.net/" rel="nofollow">http://members.aye.net/</a></p>
<p>For sure you HAVE to do something like this SELECT * FROM table WHERE url LIKE (&#8217;http://members.aye.net/%&#8217;);</p>
<p>What I would do is may be create a domain table to store domains only based on CRC32,<br />
then create 26 tables for your URL&#8217;s and partition your URI&#8217;s based on the URI</p>
<p>so table domain stores : <a href="http://www.dell.com" rel="nofollow">http://www.dell.com</a>, <a href="http://www.vantwisk.nl" rel="nofollow">http://www.vantwisk.nl</a>, <a href="http://www.mysqlperformanceblog,com" rel="nofollow">http://www.mysqlperformanceblog,com</a><br />
then table URI stores : 2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#more-361, blabla.html, myblog/entry/bla.html</p>
<p>On the doamin table you can create an idnex by CRC, but I would just use the.<br />
On table URI you simple create an index on the first XX characters to keep index size low.</p>
<p>Data retrieval is done using a couple of stored procedures to get the right data and filter any duplicates.</p>
<p>Ries<br />
PS: you could partition your tables if you need more. Since you didn&#8217;t mention how many rows you ant to store it&#8217;s hard to guess&#8230;.<br />
You did mention massive, how much is massive for you??? in the order of 1000mil records???</p>
<p>Ries</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/comment-page-1/#comment-267626</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Sun, 06 Apr 2008 17:47:42 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#comment-267626</guid>
		<description>Roland,

It has to do with IN, your query works fine.

mysql&gt; EXPLAIN SELECT url FROM 124pages.124pages WHERE (url_crc,url) = (484036220,&#039;http://www.dell.com/&#039;);
+----+-------------+----------+------+---------------+---------+---------+-------+------+-------------+
&#124; id &#124; select_type &#124; table    &#124; type &#124; possible_keys &#124; key     &#124; key_len &#124; ref   &#124; rows &#124; Extra       &#124;
+----+-------------+----------+------+---------------+---------+---------+-------+------+-------------+
&#124;  1 &#124; SIMPLE      &#124; 124pages &#124; ref  &#124; url_crc       &#124; url_crc &#124; 4       &#124; const &#124;    1 &#124; Using where &#124;
+----+-------------+----------+------+---------------+---------+---------+-------+------+-------------+
1 row in set (0.03 sec)</description>
		<content:encoded><![CDATA[<p>Roland,</p>
<p>It has to do with IN, your query works fine.</p>
<p>mysql> EXPLAIN SELECT url FROM 124pages.124pages WHERE (url_crc,url) = (484036220,&#8217;http://www.dell.com/&#8217;);<br />
+&#8212;-+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;-+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;+&#8212;&#8212;-+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;-+<br />
| id | select_type | table    | type | possible_keys | key     | key_len | ref   | rows | Extra       |<br />
+&#8212;-+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;-+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;+&#8212;&#8212;-+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;-+<br />
|  1 | SIMPLE      | 124pages | ref  | url_crc       | url_crc | 4       | const |    1 | Using where |<br />
+&#8212;-+&#8212;&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;-+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;+&#8212;&#8212;-+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;-+<br />
1 row in set (0.03 sec)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Roland Bouman</title>
		<link>http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/comment-page-1/#comment-267265</link>
		<dc:creator>Roland Bouman</dc:creator>
		<pubDate>Sun, 06 Apr 2008 09:02:42 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#comment-267265</guid>
		<description>Hi Peter,

I know it shouldn&#039;t matter, but have you tried:

EXPLAIN SELECT url FROM 124pages.124pages WHERE (url_crc,url) = (484036220,&#039;http://www.dell.com/&#039;);

to see if it is due to IN or rather due to multiple columns?

If there is a difference, it seems likely that it is a bug, which may turn out to be trivial to fix.</description>
		<content:encoded><![CDATA[<p>Hi Peter,</p>
<p>I know it shouldn&#8217;t matter, but have you tried:</p>
<p>EXPLAIN SELECT url FROM 124pages.124pages WHERE (url_crc,url) = (484036220,&#8217;http://www.dell.com/&#8217;);</p>
<p>to see if it is due to IN or rather due to multiple columns?</p>
<p>If there is a difference, it seems likely that it is a bug, which may turn out to be trivial to fix.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/comment-page-1/#comment-266947</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Sat, 05 Apr 2008 20:53:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#comment-266947</guid>
		<description>Ruslan,

I do not think prepared statements will be faster if you&#039;re doing one by one checks - the savings on query parsing are much smaller than loss by having many roundtrips. 

If you would need to check 1000000  urls such a way you can use prepared statements together with batches and check them by 1000 or something like that.  This likely would be the most efficient approach.</description>
		<content:encoded><![CDATA[<p>Ruslan,</p>
<p>I do not think prepared statements will be faster if you&#8217;re doing one by one checks &#8211; the savings on query parsing are much smaller than loss by having many roundtrips. </p>
<p>If you would need to check 1000000  urls such a way you can use prepared statements together with batches and check them by 1000 or something like that.  This likely would be the most efficient approach.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/comment-page-1/#comment-266945</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Sat, 05 Apr 2008 20:51:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/#comment-266945</guid>
		<description>Tom,

In this case it is easy enough to get the query which works.  There are many cases when you can&#039;t get MySQL to do what you want - consider for example bunch of subqueries, in these cases I would handle thing on the application - ie do SELECT create IN list and generate second query.</description>
		<content:encoded><![CDATA[<p>Tom,</p>
<p>In this case it is easy enough to get the query which works.  There are many cases when you can&#8217;t get MySQL to do what you want &#8211; consider for example bunch of subqueries, in these cases I would handle thing on the application &#8211; ie do SELECT create IN list and generate second query.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
