<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: How to find wrong indexing with glance view</title>
	<atom:link href="http://www.mysqlperformanceblog.com/2008/08/21/how-to-find-wrong-indexing-with-glance-view/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mysqlperformanceblog.com/2008/08/21/how-to-find-wrong-indexing-with-glance-view/</link>
	<description>Everything about MySQL Performance</description>
	<lastBuildDate>Sat, 21 Nov 2009 05:23:57 -0800</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Ruslan Zakirov</title>
		<link>http://www.mysqlperformanceblog.com/2008/08/21/how-to-find-wrong-indexing-with-glance-view/comment-page-1/#comment-348089</link>
		<dc:creator>Ruslan Zakirov</dc:creator>
		<pubDate>Sat, 23 Aug 2008 18:12:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=472#comment-348089</guid>
		<description>I was describing myspace&#039;s search referenced by Nachos where country (sorry, it&#039;s not city) is mandatory. Anyway, condition on age is mandatory and it&#039;s a range, so sorting sure will be problem. Especially when result set is big. At least it will be filesort and not necessary temporary table as we select from one table.

To solve sorting issue on big sets, people can use pre-explain. Then decide either use hint to force ordered access or access based on conditions. To avoid explains local in memory statistics can be used to calculate conditions selectivity.

Anyway, question was about set of indexes with compound vs. single context. Sorting falls out of scope :)</description>
		<content:encoded><![CDATA[<p>I was describing myspace&#8217;s search referenced by Nachos where country (sorry, it&#8217;s not city) is mandatory. Anyway, condition on age is mandatory and it&#8217;s a range, so sorting sure will be problem. Especially when result set is big. At least it will be filesort and not necessary temporary table as we select from one table.</p>
<p>To solve sorting issue on big sets, people can use pre-explain. Then decide either use hint to force ordered access or access based on conditions. To avoid explains local in memory statistics can be used to calculate conditions selectivity.</p>
<p>Anyway, question was about set of indexes with compound vs. single context. Sorting falls out of scope <img src='http://www.mysqlperformanceblog.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2008/08/21/how-to-find-wrong-indexing-with-glance-view/comment-page-1/#comment-348074</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Sat, 23 Aug 2008 17:07:07 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=472#comment-348074</guid>
		<description>Ruslan,

Thanks for good explanations.  The City in the start would work only in case you always specify it... As you can&#039;t use IN trick for cities as there are too many of them.
The problem with IN in such cases is generally related to sorting  - results typically need to be sorted some way and as soon as you use IN on any key parts sorting can&#039;t be done using index and filesort may be too slow for many matches.     Though in reality problem can get even more complicated because sorting can be done by the function, for example by distance or using some sort of scoring system.</description>
		<content:encoded><![CDATA[<p>Ruslan,</p>
<p>Thanks for good explanations.  The City in the start would work only in case you always specify it&#8230; As you can&#8217;t use IN trick for cities as there are too many of them.<br />
The problem with IN in such cases is generally related to sorting  &#8211; results typically need to be sorted some way and as soon as you use IN on any key parts sorting can&#8217;t be done using index and filesort may be too slow for many matches.     Though in reality problem can get even more complicated because sorting can be done by the function, for example by distance or using some sort of scoring system.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ruslan Zakirov</title>
		<link>http://www.mysqlperformanceblog.com/2008/08/21/how-to-find-wrong-indexing-with-glance-view/comment-page-1/#comment-347936</link>
		<dc:creator>Ruslan Zakirov</dc:creator>
		<pubDate>Sat, 23 Aug 2008 09:21:26 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=472#comment-347936</guid>
		<description>More about Nachos&#039; example

1) Looks like all fields are linked one-to-one to users so can be all stored in users table(may be except looking_for, but let&#039;s keep it simple), some thing like (gender, age, available, looking_for, location, has_foto)
2) booleans - gender, has_foto; enums - available, looking_for, location; integer - age
3) All queries fall into X IN (...) AND Y IN (...) AND ...

Some conditions fall into X = const - ref access that may be preferable over range. Condition on Age may have up to 60 values so it can be converted to IN instead of using BETWEEN [1].

As all boolean operators are &#039;AND&#039;, mysql can use either indexes built on multiple columns or intersection of multiple accesses by indexes. When we talk about index_merge_intersection, we should understand that mysql finds one set using an index, another set and then finds records which exist in both sets [2]. Consider this &quot;age = 18 and available = &#039;divorced&#039;&quot;. I&#039;ve used Russia as target for investigation and here is results: both conditions - 58, divorced - 623, 18th - 3000 (looks like it&#039;s limit and most probably it&#039;s much more than that). It&#039;s pretty obvious that using index (age, available) is much better all the time as it&#039;s always more selective than any of its columns alone. If you don&#039;t have such compound index then mysql most probably use index on &quot;available&quot; column only using ref access and after that filter by age. This is optimal choice. So in any case index intersection sucks. When it doesn&#039;t suck? In something like &quot;age = 68 AND available = &#039;engaged&#039;&quot;, but even in this case compound index quickly returns a few rows.

Enough about intersections. About booleans. has_foto? How many of your users have a foto? 100%? 10%? Most probably it&#039;s more than 70% or even close to 90%. So it&#039;s very not selective column. Should we avoid using it in indexes? Yes, if it&#039;s an index based on this column only. No, if it&#039;s part of compound index. Why? Because more than 70% percents of people will mark &quot;show people with fotos only&quot; and mysql can use index to filter records before fetching any rows. How much it will give? Not many, worth investigation. May be impact will be negative especially when almost all people have fotos.

Gender? you can use it everywhere as peter suggested as 90% of searches will use one either &#039;male&#039; or &#039;female&#039; and it will give you twice smaller result set. However, you most probably should cheat a little and give mysql a hint by using &quot;gender IN (&#039;male&#039;,&#039;female&#039;)&quot; when people looking for any gender.

Other. Location is mandatory with quite good selectivity comparing with other enumns. Age as well is mandatory, but it can select few rows or many that depends.

Final set?

1) (City, Age, Gender) or (City, Gender, Age) - you should benchmark and check EXPLAINS which one is better for all combinations of (male, female, both) and different ranges on Age with high selectivity and low. At the end you can end up with one index, both or may be consider using (City, Age). Whatever you select to use can be used as prefix for other indexes cuz conditions on these fields are mandatory.

2) You&#039;ve selected mandatory prefix, for example (City, Gender, Age) is the most effective. Drop it :) and instead create (City, Gender, Age, LookingFor) and (City, Gender, Age, Available). So mysql can choose between two which is more selective.

3) You can even use (City, Gender, Age, LookingFor, Available) - needs investigation.

Aaaa... so long indexes. Let&#039;s consider you&#039;re superb and have 1 billion of users. If you&#039;re smart enough and read mysql performance related articles then you&#039;ll store all those fields as enums or tiny int with dictionary (in your up or table). 4(id)+1+1+1+1+1+1 = 10 bytes per record =&gt; &gt;=10GB for data. Each index will be between 5GB and size of data (for innodb). May be I&#039;m wrong here, but I do think that size of index(City, Gender, Age) will be smaller than size of three (City), (Gender) and (Age). Considering quite low standalone selectivity of those columns, it&#039;s up to you decide what to do.

1. http://www.mysqlperformanceblog.com/2006/08/10/using-union-to-implement-loose-index-scan-to-mysql/#comment-1695
2. http://dev.mysql.com/doc/refman/5.1/en/index-merge-intersection.html</description>
		<content:encoded><![CDATA[<p>More about Nachos&#8217; example</p>
<p>1) Looks like all fields are linked one-to-one to users so can be all stored in users table(may be except looking_for, but let&#8217;s keep it simple), some thing like (gender, age, available, looking_for, location, has_foto)<br />
2) booleans &#8211; gender, has_foto; enums &#8211; available, looking_for, location; integer &#8211; age<br />
3) All queries fall into X IN (&#8230;) AND Y IN (&#8230;) AND &#8230;</p>
<p>Some conditions fall into X = const &#8211; ref access that may be preferable over range. Condition on Age may have up to 60 values so it can be converted to IN instead of using BETWEEN [1].</p>
<p>As all boolean operators are &#8216;AND&#8217;, mysql can use either indexes built on multiple columns or intersection of multiple accesses by indexes. When we talk about index_merge_intersection, we should understand that mysql finds one set using an index, another set and then finds records which exist in both sets [2]. Consider this &#8220;age = 18 and available = &#8216;divorced&#8217;&#8221;. I&#8217;ve used Russia as target for investigation and here is results: both conditions &#8211; 58, divorced &#8211; 623, 18th &#8211; 3000 (looks like it&#8217;s limit and most probably it&#8217;s much more than that). It&#8217;s pretty obvious that using index (age, available) is much better all the time as it&#8217;s always more selective than any of its columns alone. If you don&#8217;t have such compound index then mysql most probably use index on &#8220;available&#8221; column only using ref access and after that filter by age. This is optimal choice. So in any case index intersection sucks. When it doesn&#8217;t suck? In something like &#8220;age = 68 AND available = &#8216;engaged&#8217;&#8221;, but even in this case compound index quickly returns a few rows.</p>
<p>Enough about intersections. About booleans. has_foto? How many of your users have a foto? 100%? 10%? Most probably it&#8217;s more than 70% or even close to 90%. So it&#8217;s very not selective column. Should we avoid using it in indexes? Yes, if it&#8217;s an index based on this column only. No, if it&#8217;s part of compound index. Why? Because more than 70% percents of people will mark &#8220;show people with fotos only&#8221; and mysql can use index to filter records before fetching any rows. How much it will give? Not many, worth investigation. May be impact will be negative especially when almost all people have fotos.</p>
<p>Gender? you can use it everywhere as peter suggested as 90% of searches will use one either &#8216;male&#8217; or &#8216;female&#8217; and it will give you twice smaller result set. However, you most probably should cheat a little and give mysql a hint by using &#8220;gender IN (&#8217;male&#8217;,'female&#8217;)&#8221; when people looking for any gender.</p>
<p>Other. Location is mandatory with quite good selectivity comparing with other enumns. Age as well is mandatory, but it can select few rows or many that depends.</p>
<p>Final set?</p>
<p>1) (City, Age, Gender) or (City, Gender, Age) &#8211; you should benchmark and check EXPLAINS which one is better for all combinations of (male, female, both) and different ranges on Age with high selectivity and low. At the end you can end up with one index, both or may be consider using (City, Age). Whatever you select to use can be used as prefix for other indexes cuz conditions on these fields are mandatory.</p>
<p>2) You&#8217;ve selected mandatory prefix, for example (City, Gender, Age) is the most effective. Drop it <img src='http://www.mysqlperformanceblog.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  and instead create (City, Gender, Age, LookingFor) and (City, Gender, Age, Available). So mysql can choose between two which is more selective.</p>
<p>3) You can even use (City, Gender, Age, LookingFor, Available) &#8211; needs investigation.</p>
<p>Aaaa&#8230; so long indexes. Let&#8217;s consider you&#8217;re superb and have 1 billion of users. If you&#8217;re smart enough and read mysql performance related articles then you&#8217;ll store all those fields as enums or tiny int with dictionary (in your up or table). 4(id)+1+1+1+1+1+1 = 10 bytes per record =&gt; &gt;=10GB for data. Each index will be between 5GB and size of data (for innodb). May be I&#8217;m wrong here, but I do think that size of index(City, Gender, Age) will be smaller than size of three (City), (Gender) and (Age). Considering quite low standalone selectivity of those columns, it&#8217;s up to you decide what to do.</p>
<p>1. <a href="http://www.mysqlperformanceblog.com/2006/08/10/using-union-to-implement-loose-index-scan-to-mysql/#comment-1695" rel="nofollow">http://www.mysqlperformanceblog.com/2006/08/10/using-union-to-implement-loose-index-scan-to-mysql/#comment-1695</a><br />
2. <a href="http://dev.mysql.com/doc/refman/5.1/en/index-merge-intersection.html" rel="nofollow">http://dev.mysql.com/doc/refman/5.1/en/index-merge-intersection.html</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2008/08/21/how-to-find-wrong-indexing-with-glance-view/comment-page-1/#comment-347795</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Sat, 23 Aug 2008 02:17:27 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=472#comment-347795</guid>
		<description>tgabi,

I mean if you have  Index (A,B,C,D,E) and  MySQL can use (A,B,C) index only for this query. Considering it is selective enough (enough meaning query response time is acceptable) you&#039;re good. Though as your data size growths the acceptable selectivity tends to grow as well.  If you have 100000 profiles  selectivity of 1/10 is  often good enough as  scanning through 10000 rows in memory is often acceptable.  With 50.000.000 profiles this would not be the case. 

With Nachos case  if you have  &quot;WHERE GENDER=&#039;M&#039; AND RACE=&#039;B&#039; AND HAIR=&#039;Blond&#039;    and it is very selective index on (GENDER,RACE,HAIR) would work quite well to resolve it.    Though you really need to be looking at your best queries while picking solutions and so your worst cardinaliy.</description>
		<content:encoded><![CDATA[<p>tgabi,</p>
<p>I mean if you have  Index (A,B,C,D,E) and  MySQL can use (A,B,C) index only for this query. Considering it is selective enough (enough meaning query response time is acceptable) you&#8217;re good. Though as your data size growths the acceptable selectivity tends to grow as well.  If you have 100000 profiles  selectivity of 1/10 is  often good enough as  scanning through 10000 rows in memory is often acceptable.  With 50.000.000 profiles this would not be the case. </p>
<p>With Nachos case  if you have  &#8220;WHERE GENDER=&#8217;M&#8217; AND RACE=&#8217;B&#8217; AND HAIR=&#8217;Blond&#8217;    and it is very selective index on (GENDER,RACE,HAIR) would work quite well to resolve it.    Though you really need to be looking at your best queries while picking solutions and so your worst cardinaliy.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: tgabi</title>
		<link>http://www.mysqlperformanceblog.com/2008/08/21/how-to-find-wrong-indexing-with-glance-view/comment-page-1/#comment-347762</link>
		<dc:creator>tgabi</dc:creator>
		<pubDate>Sat, 23 Aug 2008 00:29:26 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=472#comment-347762</guid>
		<description>Peter,
you said &quot;assuming you can use index prefix which is selective enough&quot;. What do you mean by that ?
In specific case - derived from Nacho&#039;s example: say you have millions of users with gender, race and hair color. Many males, many blacks, many blondes, very few all three. How do you see this search in Mysql ?</description>
		<content:encoded><![CDATA[<p>Peter,<br />
you said &#8220;assuming you can use index prefix which is selective enough&#8221;. What do you mean by that ?<br />
In specific case &#8211; derived from Nacho&#8217;s example: say you have millions of users with gender, race and hair color. Many males, many blacks, many blondes, very few all three. How do you see this search in Mysql ?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2008/08/21/how-to-find-wrong-indexing-with-glance-view/comment-page-1/#comment-347742</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Fri, 22 Aug 2008 23:24:44 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=472#comment-347742</guid>
		<description>Nacho,

For cases like you mentioned - Dating site search I would rather go with Sphinx all together.  It handles cases like this beautifully (even if you do not have any full text search)  though if you want to stick to MySQL you&#039;ve got to look at the cardinality for different columns and query pattern and handle it appropriately.  For example gender may not be selective but specified in 99% of the cases which makes it good first column in the index.</description>
		<content:encoded><![CDATA[<p>Nacho,</p>
<p>For cases like you mentioned &#8211; Dating site search I would rather go with Sphinx all together.  It handles cases like this beautifully (even if you do not have any full text search)  though if you want to stick to MySQL you&#8217;ve got to look at the cardinality for different columns and query pattern and handle it appropriately.  For example gender may not be selective but specified in 99% of the cases which makes it good first column in the index.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2008/08/21/how-to-find-wrong-indexing-with-glance-view/comment-page-1/#comment-347741</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Fri, 22 Aug 2008 23:20:56 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=472#comment-347741</guid>
		<description>tgabi,

I have cases when billions records are loaded daily so it no more excites me.

Speaking about index merge -  indeed there are cases when index merge works when multiple column index does not however if they both work multiple column index are most likely to be the winner.

There are always exceptions.   Even running with MySQL defaults can be good fit for particular case. I&#039;m just saying this is very typical red flag thing. Sure you need to look in details to check if it is really the case but it is called red flag exactly because it shows there are the problem in say 90% of all cases.    If you made deliberate choice going with multiple column indexes good for you - if you can make this deliberate choice you already stand away from the crowd :)</description>
		<content:encoded><![CDATA[<p>tgabi,</p>
<p>I have cases when billions records are loaded daily so it no more excites me.</p>
<p>Speaking about index merge &#8211;  indeed there are cases when index merge works when multiple column index does not however if they both work multiple column index are most likely to be the winner.</p>
<p>There are always exceptions.   Even running with MySQL defaults can be good fit for particular case. I&#8217;m just saying this is very typical red flag thing. Sure you need to look in details to check if it is really the case but it is called red flag exactly because it shows there are the problem in say 90% of all cases.    If you made deliberate choice going with multiple column indexes good for you &#8211; if you can make this deliberate choice you already stand away from the crowd <img src='http://www.mysqlperformanceblog.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: tgabi</title>
		<link>http://www.mysqlperformanceblog.com/2008/08/21/how-to-find-wrong-indexing-with-glance-view/comment-page-1/#comment-347738</link>
		<dc:creator>tgabi</dc:creator>
		<pubDate>Fri, 22 Aug 2008 23:11:50 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=472#comment-347738</guid>
		<description>I think there are many cases where multiple column indexes don&#039;t work. That&#039;s all. By looking at the index structure alone you cannot draw any conclusion about the design quality. You need to look at the data too.
It seems that we disagree on index merge versus multiple column index. I&#039;m saying that I see index merge working when multiple column indexes don&#039;t. You&#039;re saying it cannot be used - you don&#039;t say why, but I don&#039;t disagree. One thing is clear: single column indexes are not enough, no doubt about that.
Regarding billions of records: it&#039;s just awesome. It&#039;s not every day you hear something like that. Well maybe you do, I don&#039;t. I&#039;m few months away from &quot;billion records club&quot; BTW.</description>
		<content:encoded><![CDATA[<p>I think there are many cases where multiple column indexes don&#8217;t work. That&#8217;s all. By looking at the index structure alone you cannot draw any conclusion about the design quality. You need to look at the data too.<br />
It seems that we disagree on index merge versus multiple column index. I&#8217;m saying that I see index merge working when multiple column indexes don&#8217;t. You&#8217;re saying it cannot be used &#8211; you don&#8217;t say why, but I don&#8217;t disagree. One thing is clear: single column indexes are not enough, no doubt about that.<br />
Regarding billions of records: it&#8217;s just awesome. It&#8217;s not every day you hear something like that. Well maybe you do, I don&#8217;t. I&#8217;m few months away from &#8220;billion records club&#8221; BTW.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2008/08/21/how-to-find-wrong-indexing-with-glance-view/comment-page-1/#comment-347730</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Fri, 22 Aug 2008 22:19:46 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=472#comment-347730</guid>
		<description>tgabi,

I do not see what you really disagree with.  It is the question of multiple column indexing speeding up your queries. If they do not. There is surely no point using them.  Billions of records ? So what.  If you your main queries can use all key parts in 5 column index it will be faster than 5 single column indexes.  And also much easier to maintain.
Really it may look as columns are scary but 5 ints for example is  just 20 bytes so the index would be same as index on char(20) in terms of side.</description>
		<content:encoded><![CDATA[<p>tgabi,</p>
<p>I do not see what you really disagree with.  It is the question of multiple column indexing speeding up your queries. If they do not. There is surely no point using them.  Billions of records ? So what.  If you your main queries can use all key parts in 5 column index it will be faster than 5 single column indexes.  And also much easier to maintain.<br />
Really it may look as columns are scary but 5 ints for example is  just 20 bytes so the index would be same as index on char(20) in terms of side.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2008/08/21/how-to-find-wrong-indexing-with-glance-view/comment-page-1/#comment-347722</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Fri, 22 Aug 2008 22:03:25 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=472#comment-347722</guid>
		<description>I think adding to the column to the index (for selection not sorting) has the same question as adding the index for selection all together - if cardinality is poor it is better not to.  Though increasing index length usually has less impact than going from full table scan to index ref scan (causing a lot of IOs)

In case of low selectivity indexes you too can often build multiple column indexes, assuming you can use index prefix which is selective enough. Though there are times when bitmap indexes would be much better.</description>
		<content:encoded><![CDATA[<p>I think adding to the column to the index (for selection not sorting) has the same question as adding the index for selection all together &#8211; if cardinality is poor it is better not to.  Though increasing index length usually has less impact than going from full table scan to index ref scan (causing a lot of IOs)</p>
<p>In case of low selectivity indexes you too can often build multiple column indexes, assuming you can use index prefix which is selective enough. Though there are times when bitmap indexes would be much better.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
