<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: How to find wrong indexing with glance view</title>
	<atom:link href="http://www.mysqlperformanceblog.com/2008/08/21/how-to-find-wrong-indexing-with-glance-view/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mysqlperformanceblog.com/2008/08/21/how-to-find-wrong-indexing-with-glance-view/</link>
	<description>Percona&#039;s MySQL &#38; InnoDB performance and scalability blog</description>
	<lastBuildDate>Sat, 11 Feb 2012 16:45:54 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
	<item>
		<title>By: Will</title>
		<link>http://www.mysqlperformanceblog.com/2008/08/21/how-to-find-wrong-indexing-with-glance-view/comment-page-1/#comment-775845</link>
		<dc:creator>Will</dc:creator>
		<pubDate>Sun, 26 Sep 2010 14:11:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=472#comment-775845</guid>
		<description>Thanks for the great blog. How to index a product table with columns like price, unitprice, weight, color, created_on ... and sql were dynamically generated from the web to lookup products. for example:

select * from products where price &gt; 10 and price  5 and weight &lt;70 and color = 1 ... order by created_on desc limit 10;

Note that all the conditions are dynamically generated and some of them may or may not be there, e.g., some people filter by price, some by color, some by unitprice, ... numerous possible combinations.  How would you index this type of tables? Appreciate it!</description>
		<content:encoded><![CDATA[<p>Thanks for the great blog. How to index a product table with columns like price, unitprice, weight, color, created_on &#8230; and sql were dynamically generated from the web to lookup products. for example:</p>
<p>select * from products where price &gt; 10 and price  5 and weight &lt;70 and color = 1 &#8230; order by created_on desc limit 10;</p>
<p>Note that all the conditions are dynamically generated and some of them may or may not be there, e.g., some people filter by price, some by color, some by unitprice, &#8230; numerous possible combinations.  How would you index this type of tables? Appreciate it!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David Drouin</title>
		<link>http://www.mysqlperformanceblog.com/2008/08/21/how-to-find-wrong-indexing-with-glance-view/comment-page-1/#comment-733561</link>
		<dc:creator>David Drouin</dc:creator>
		<pubDate>Mon, 08 Mar 2010 21:25:26 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=472#comment-733561</guid>
		<description>realize I pressed submit too soon and had some typos on the FKs definitions:

create table orders(
order_id int unsigned not null auto_increment primary key,
item_type_id tinyint unsigned not null,
locale_id smallint unsigned not null,
currency_id tinyint unsigned not null,
operator_id smallint unsigned not null,
order_amount decimal(10, 4) not null,
order_date datetime not null,
last_updated timestamp not null default current_timestamp on update current_timestamp,
foreign key(item_type_id) references item_types(item_type_id),
foreign key(locale_id) references locales(locale_id),
foreign key(currency_id) references currencies(currency_id),
foreign key(operator_id) references operators(operator_id)
);</description>
		<content:encoded><![CDATA[<p>realize I pressed submit too soon and had some typos on the FKs definitions:</p>
<p>create table orders(<br />
order_id int unsigned not null auto_increment primary key,<br />
item_type_id tinyint unsigned not null,<br />
locale_id smallint unsigned not null,<br />
currency_id tinyint unsigned not null,<br />
operator_id smallint unsigned not null,<br />
order_amount decimal(10, 4) not null,<br />
order_date datetime not null,<br />
last_updated timestamp not null default current_timestamp on update current_timestamp,<br />
foreign key(item_type_id) references item_types(item_type_id),<br />
foreign key(locale_id) references locales(locale_id),<br />
foreign key(currency_id) references currencies(currency_id),<br />
foreign key(operator_id) references operators(operator_id)<br />
);</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David Drouin</title>
		<link>http://www.mysqlperformanceblog.com/2008/08/21/how-to-find-wrong-indexing-with-glance-view/comment-page-1/#comment-733560</link>
		<dc:creator>David Drouin</dc:creator>
		<pubDate>Mon, 08 Mar 2010 21:22:30 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=472#comment-733560</guid>
		<description>Here&#039;s to hoping that eventually MySQL&#039;s InnoDB becomes close to on par with the competition - in particular that other open source db ;) for relational database support.  Going from any of those systems to MySQL is like switching from a driving modern automobile to one from the Flintstones presently.  Okay it&#039;s not that bad, but it can be downright abysmal in some respects.  Declarative referential integrity index requirements is one of the areas that&#039;s truly frustrating.

Lets say you have a clue and design your OLTP database using at least 3rd normal form and integrity constraints.  You realize that ISAM was perhaps a reasonable choice in 1965 for a production database involving inserts, updates and deletes but not so much in 2010.  MySQL surprisingly supports declarative integrity constraints.  But wait being MySQL there must be something about it that&#039;s bizarre and going to bite you somehow.  Well it also requires that an index be created on each referencing column.  At first this may seem to be tolerable, especially for cases where you&#039;re requiring that deletes / updates cascade from the parent to child tables - naturally an index on the referencing columns in the child tables would help there - well maybe.  Often though, the database is designed such that this is not the case.  Updates and deletes of primary key values are instead restricted - that is not permitted.  So the indexes on the referencing columns are quite useless.  Actually they&#039;re more than useless they&#039;re detrimental.  Not only do these indexes slow down insert operations, the MySQL optimizer sometimes appears to become rather overwhelmed by the number of indexes and tables to choose from for moderately complex queries and selects an extraordinarily inefficient plan resulting in horrendous performance by using one of these foreign key index with low cardinality and somewhat even distribution first instead of a primary or unique key even if such a key exists and would result in 1 row matching.  So to circumvent that you need to help it out with optimizer hints like straight_join to have it look at tables in the right order and come up with a sane choice (this is using version 5.0.45 btw).  This great fun when you&#039;re using an ORM like SQL Alchemy and want to avoid writing sql directly in application code or introducing things that tie you to a particular dbms.

So as this thread started out by alluding to lots of single column indexes generally aren&#039;t a good idea, well MySQL forces this upon you if you actually implement a proper database design and use declarative syntax to do so.

Here&#039;s an example table 

create table orders(
  order_id int unsigned not null auto_increment primary key,
  item_type_id tinyint unsigned not null,
  locale_id smallint unsigned not null,
  currency_id tinyint unsigned not null,
  operator_id smallint unsigned not null,
  order_amount decimal(10, 4) not null,
  order_date datetime not null,
  last_updated timestamp not null default current_timestamp on update current_timestamp,
  foreign key(item_type_id) references item_types(item_type_id),
  foreign key(locale_id) references items(locale_id),
  foreign key(currency_id) references items(currency_id),
  foreign key(operator_id) references items(operator_id)
);

create index order_date_idx on orders(order_date);

Okay so we know that we&#039;ll never delete or change item_type ids, locale ids, currency ids or operator ids.  And even if we did it would be a big deal - requiring system down time etc.  New ones may be added but that&#039;s it. Lets say there are about 10 million rows in this orders table.  Of the referenced columns there are 10 types of items, 5 locales they could be sold in presently, 3 currencies and 20 operators.  

MySQL will create the following indexes:
unique index on order_id
index on item_type_id
index on locale_id
index on currency_id
index on operator_id

You have no control over this at all - aside from not using declarative integrity constraints.  You could resort to the 1980&#039;s early 90&#039;s way of writing triggers to implement referential integrity but I hear triggers are also pretty slow themselves in MySQL too and writing a few hundred of those isn&#039;t going to be to enjoyable at all.

Additionally I need an index on order_date to help with finding orders for a given date.

8 columns 6 indexes on single columns.  This is silly.  You can imagine a table with more columns and FKs and more indexes.  Sure you can vertically partition the data but you&#039;ll end up with the same indexes regardless.

The foreign key indexes are all low cardinality with roughly even distributions.  Nearly always a poor choice to use for any query.  This is why most if not all widely used databases do not require indexes on FKs.  The database designer should be able to choose what to do.  Having them by default might be ok but they should not be required.</description>
		<content:encoded><![CDATA[<p>Here&#8217;s to hoping that eventually MySQL&#8217;s InnoDB becomes close to on par with the competition &#8211; in particular that other open source db <img src='http://www.mysqlperformanceblog.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />  for relational database support.  Going from any of those systems to MySQL is like switching from a driving modern automobile to one from the Flintstones presently.  Okay it&#8217;s not that bad, but it can be downright abysmal in some respects.  Declarative referential integrity index requirements is one of the areas that&#8217;s truly frustrating.</p>
<p>Lets say you have a clue and design your OLTP database using at least 3rd normal form and integrity constraints.  You realize that ISAM was perhaps a reasonable choice in 1965 for a production database involving inserts, updates and deletes but not so much in 2010.  MySQL surprisingly supports declarative integrity constraints.  But wait being MySQL there must be something about it that&#8217;s bizarre and going to bite you somehow.  Well it also requires that an index be created on each referencing column.  At first this may seem to be tolerable, especially for cases where you&#8217;re requiring that deletes / updates cascade from the parent to child tables &#8211; naturally an index on the referencing columns in the child tables would help there &#8211; well maybe.  Often though, the database is designed such that this is not the case.  Updates and deletes of primary key values are instead restricted &#8211; that is not permitted.  So the indexes on the referencing columns are quite useless.  Actually they&#8217;re more than useless they&#8217;re detrimental.  Not only do these indexes slow down insert operations, the MySQL optimizer sometimes appears to become rather overwhelmed by the number of indexes and tables to choose from for moderately complex queries and selects an extraordinarily inefficient plan resulting in horrendous performance by using one of these foreign key index with low cardinality and somewhat even distribution first instead of a primary or unique key even if such a key exists and would result in 1 row matching.  So to circumvent that you need to help it out with optimizer hints like straight_join to have it look at tables in the right order and come up with a sane choice (this is using version 5.0.45 btw).  This great fun when you&#8217;re using an ORM like SQL Alchemy and want to avoid writing sql directly in application code or introducing things that tie you to a particular dbms.</p>
<p>So as this thread started out by alluding to lots of single column indexes generally aren&#8217;t a good idea, well MySQL forces this upon you if you actually implement a proper database design and use declarative syntax to do so.</p>
<p>Here&#8217;s an example table </p>
<p>create table orders(<br />
  order_id int unsigned not null auto_increment primary key,<br />
  item_type_id tinyint unsigned not null,<br />
  locale_id smallint unsigned not null,<br />
  currency_id tinyint unsigned not null,<br />
  operator_id smallint unsigned not null,<br />
  order_amount decimal(10, 4) not null,<br />
  order_date datetime not null,<br />
  last_updated timestamp not null default current_timestamp on update current_timestamp,<br />
  foreign key(item_type_id) references item_types(item_type_id),<br />
  foreign key(locale_id) references items(locale_id),<br />
  foreign key(currency_id) references items(currency_id),<br />
  foreign key(operator_id) references items(operator_id)<br />
);</p>
<p>create index order_date_idx on orders(order_date);</p>
<p>Okay so we know that we&#8217;ll never delete or change item_type ids, locale ids, currency ids or operator ids.  And even if we did it would be a big deal &#8211; requiring system down time etc.  New ones may be added but that&#8217;s it. Lets say there are about 10 million rows in this orders table.  Of the referenced columns there are 10 types of items, 5 locales they could be sold in presently, 3 currencies and 20 operators.  </p>
<p>MySQL will create the following indexes:<br />
unique index on order_id<br />
index on item_type_id<br />
index on locale_id<br />
index on currency_id<br />
index on operator_id</p>
<p>You have no control over this at all &#8211; aside from not using declarative integrity constraints.  You could resort to the 1980&#8242;s early 90&#8242;s way of writing triggers to implement referential integrity but I hear triggers are also pretty slow themselves in MySQL too and writing a few hundred of those isn&#8217;t going to be to enjoyable at all.</p>
<p>Additionally I need an index on order_date to help with finding orders for a given date.</p>
<p>8 columns 6 indexes on single columns.  This is silly.  You can imagine a table with more columns and FKs and more indexes.  Sure you can vertically partition the data but you&#8217;ll end up with the same indexes regardless.</p>
<p>The foreign key indexes are all low cardinality with roughly even distributions.  Nearly always a poor choice to use for any query.  This is why most if not all widely used databases do not require indexes on FKs.  The database designer should be able to choose what to do.  Having them by default might be ok but they should not be required.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ruslan Zakirov</title>
		<link>http://www.mysqlperformanceblog.com/2008/08/21/how-to-find-wrong-indexing-with-glance-view/comment-page-1/#comment-348089</link>
		<dc:creator>Ruslan Zakirov</dc:creator>
		<pubDate>Sat, 23 Aug 2008 18:12:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=472#comment-348089</guid>
		<description>I was describing myspace&#039;s search referenced by Nachos where country (sorry, it&#039;s not city) is mandatory. Anyway, condition on age is mandatory and it&#039;s a range, so sorting sure will be problem. Especially when result set is big. At least it will be filesort and not necessary temporary table as we select from one table.

To solve sorting issue on big sets, people can use pre-explain. Then decide either use hint to force ordered access or access based on conditions. To avoid explains local in memory statistics can be used to calculate conditions selectivity.

Anyway, question was about set of indexes with compound vs. single context. Sorting falls out of scope :)</description>
		<content:encoded><![CDATA[<p>I was describing myspace&#8217;s search referenced by Nachos where country (sorry, it&#8217;s not city) is mandatory. Anyway, condition on age is mandatory and it&#8217;s a range, so sorting sure will be problem. Especially when result set is big. At least it will be filesort and not necessary temporary table as we select from one table.</p>
<p>To solve sorting issue on big sets, people can use pre-explain. Then decide either use hint to force ordered access or access based on conditions. To avoid explains local in memory statistics can be used to calculate conditions selectivity.</p>
<p>Anyway, question was about set of indexes with compound vs. single context. Sorting falls out of scope <img src='http://www.mysqlperformanceblog.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2008/08/21/how-to-find-wrong-indexing-with-glance-view/comment-page-1/#comment-348074</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Sat, 23 Aug 2008 17:07:07 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=472#comment-348074</guid>
		<description>Ruslan,

Thanks for good explanations.  The City in the start would work only in case you always specify it... As you can&#039;t use IN trick for cities as there are too many of them.
The problem with IN in such cases is generally related to sorting  - results typically need to be sorted some way and as soon as you use IN on any key parts sorting can&#039;t be done using index and filesort may be too slow for many matches.     Though in reality problem can get even more complicated because sorting can be done by the function, for example by distance or using some sort of scoring system.</description>
		<content:encoded><![CDATA[<p>Ruslan,</p>
<p>Thanks for good explanations.  The City in the start would work only in case you always specify it&#8230; As you can&#8217;t use IN trick for cities as there are too many of them.<br />
The problem with IN in such cases is generally related to sorting  &#8211; results typically need to be sorted some way and as soon as you use IN on any key parts sorting can&#8217;t be done using index and filesort may be too slow for many matches.     Though in reality problem can get even more complicated because sorting can be done by the function, for example by distance or using some sort of scoring system.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ruslan Zakirov</title>
		<link>http://www.mysqlperformanceblog.com/2008/08/21/how-to-find-wrong-indexing-with-glance-view/comment-page-1/#comment-347936</link>
		<dc:creator>Ruslan Zakirov</dc:creator>
		<pubDate>Sat, 23 Aug 2008 09:21:26 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=472#comment-347936</guid>
		<description>More about Nachos&#039; example

1) Looks like all fields are linked one-to-one to users so can be all stored in users table(may be except looking_for, but let&#039;s keep it simple), some thing like (gender, age, available, looking_for, location, has_foto)
2) booleans - gender, has_foto; enums - available, looking_for, location; integer - age
3) All queries fall into X IN (...) AND Y IN (...) AND ...

Some conditions fall into X = const - ref access that may be preferable over range. Condition on Age may have up to 60 values so it can be converted to IN instead of using BETWEEN [1].

As all boolean operators are &#039;AND&#039;, mysql can use either indexes built on multiple columns or intersection of multiple accesses by indexes. When we talk about index_merge_intersection, we should understand that mysql finds one set using an index, another set and then finds records which exist in both sets [2]. Consider this &quot;age = 18 and available = &#039;divorced&#039;&quot;. I&#039;ve used Russia as target for investigation and here is results: both conditions - 58, divorced - 623, 18th - 3000 (looks like it&#039;s limit and most probably it&#039;s much more than that). It&#039;s pretty obvious that using index (age, available) is much better all the time as it&#039;s always more selective than any of its columns alone. If you don&#039;t have such compound index then mysql most probably use index on &quot;available&quot; column only using ref access and after that filter by age. This is optimal choice. So in any case index intersection sucks. When it doesn&#039;t suck? In something like &quot;age = 68 AND available = &#039;engaged&#039;&quot;, but even in this case compound index quickly returns a few rows.

Enough about intersections. About booleans. has_foto? How many of your users have a foto? 100%? 10%? Most probably it&#039;s more than 70% or even close to 90%. So it&#039;s very not selective column. Should we avoid using it in indexes? Yes, if it&#039;s an index based on this column only. No, if it&#039;s part of compound index. Why? Because more than 70% percents of people will mark &quot;show people with fotos only&quot; and mysql can use index to filter records before fetching any rows. How much it will give? Not many, worth investigation. May be impact will be negative especially when almost all people have fotos.

Gender? you can use it everywhere as peter suggested as 90% of searches will use one either &#039;male&#039; or &#039;female&#039; and it will give you twice smaller result set. However, you most probably should cheat a little and give mysql a hint by using &quot;gender IN (&#039;male&#039;,&#039;female&#039;)&quot; when people looking for any gender.

Other. Location is mandatory with quite good selectivity comparing with other enumns. Age as well is mandatory, but it can select few rows or many that depends.

Final set?

1) (City, Age, Gender) or (City, Gender, Age) - you should benchmark and check EXPLAINS which one is better for all combinations of (male, female, both) and different ranges on Age with high selectivity and low. At the end you can end up with one index, both or may be consider using (City, Age). Whatever you select to use can be used as prefix for other indexes cuz conditions on these fields are mandatory.

2) You&#039;ve selected mandatory prefix, for example (City, Gender, Age) is the most effective. Drop it :) and instead create (City, Gender, Age, LookingFor) and (City, Gender, Age, Available). So mysql can choose between two which is more selective.

3) You can even use (City, Gender, Age, LookingFor, Available) - needs investigation.

Aaaa... so long indexes. Let&#039;s consider you&#039;re superb and have 1 billion of users. If you&#039;re smart enough and read mysql performance related articles then you&#039;ll store all those fields as enums or tiny int with dictionary (in your up or table). 4(id)+1+1+1+1+1+1 = 10 bytes per record =&gt; &gt;=10GB for data. Each index will be between 5GB and size of data (for innodb). May be I&#039;m wrong here, but I do think that size of index(City, Gender, Age) will be smaller than size of three (City), (Gender) and (Age). Considering quite low standalone selectivity of those columns, it&#039;s up to you decide what to do.

1. http://www.mysqlperformanceblog.com/2006/08/10/using-union-to-implement-loose-index-scan-to-mysql/#comment-1695
2. http://dev.mysql.com/doc/refman/5.1/en/index-merge-intersection.html</description>
		<content:encoded><![CDATA[<p>More about Nachos&#8217; example</p>
<p>1) Looks like all fields are linked one-to-one to users so can be all stored in users table(may be except looking_for, but let&#8217;s keep it simple), some thing like (gender, age, available, looking_for, location, has_foto)<br />
2) booleans &#8211; gender, has_foto; enums &#8211; available, looking_for, location; integer &#8211; age<br />
3) All queries fall into X IN (&#8230;) AND Y IN (&#8230;) AND &#8230;</p>
<p>Some conditions fall into X = const &#8211; ref access that may be preferable over range. Condition on Age may have up to 60 values so it can be converted to IN instead of using BETWEEN [1].</p>
<p>As all boolean operators are &#8216;AND&#8217;, mysql can use either indexes built on multiple columns or intersection of multiple accesses by indexes. When we talk about index_merge_intersection, we should understand that mysql finds one set using an index, another set and then finds records which exist in both sets [2]. Consider this &#8220;age = 18 and available = &#8216;divorced&#8217;&#8221;. I&#8217;ve used Russia as target for investigation and here is results: both conditions &#8211; 58, divorced &#8211; 623, 18th &#8211; 3000 (looks like it&#8217;s limit and most probably it&#8217;s much more than that). It&#8217;s pretty obvious that using index (age, available) is much better all the time as it&#8217;s always more selective than any of its columns alone. If you don&#8217;t have such compound index then mysql most probably use index on &#8220;available&#8221; column only using ref access and after that filter by age. This is optimal choice. So in any case index intersection sucks. When it doesn&#8217;t suck? In something like &#8220;age = 68 AND available = &#8216;engaged&#8217;&#8221;, but even in this case compound index quickly returns a few rows.</p>
<p>Enough about intersections. About booleans. has_foto? How many of your users have a foto? 100%? 10%? Most probably it&#8217;s more than 70% or even close to 90%. So it&#8217;s very not selective column. Should we avoid using it in indexes? Yes, if it&#8217;s an index based on this column only. No, if it&#8217;s part of compound index. Why? Because more than 70% percents of people will mark &#8220;show people with fotos only&#8221; and mysql can use index to filter records before fetching any rows. How much it will give? Not many, worth investigation. May be impact will be negative especially when almost all people have fotos.</p>
<p>Gender? you can use it everywhere as peter suggested as 90% of searches will use one either &#8216;male&#8217; or &#8216;female&#8217; and it will give you twice smaller result set. However, you most probably should cheat a little and give mysql a hint by using &#8220;gender IN (&#8216;male&#8217;,'female&#8217;)&#8221; when people looking for any gender.</p>
<p>Other. Location is mandatory with quite good selectivity comparing with other enumns. Age as well is mandatory, but it can select few rows or many that depends.</p>
<p>Final set?</p>
<p>1) (City, Age, Gender) or (City, Gender, Age) &#8211; you should benchmark and check EXPLAINS which one is better for all combinations of (male, female, both) and different ranges on Age with high selectivity and low. At the end you can end up with one index, both or may be consider using (City, Age). Whatever you select to use can be used as prefix for other indexes cuz conditions on these fields are mandatory.</p>
<p>2) You&#8217;ve selected mandatory prefix, for example (City, Gender, Age) is the most effective. Drop it <img src='http://www.mysqlperformanceblog.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  and instead create (City, Gender, Age, LookingFor) and (City, Gender, Age, Available). So mysql can choose between two which is more selective.</p>
<p>3) You can even use (City, Gender, Age, LookingFor, Available) &#8211; needs investigation.</p>
<p>Aaaa&#8230; so long indexes. Let&#8217;s consider you&#8217;re superb and have 1 billion of users. If you&#8217;re smart enough and read mysql performance related articles then you&#8217;ll store all those fields as enums or tiny int with dictionary (in your up or table). 4(id)+1+1+1+1+1+1 = 10 bytes per record =&gt; &gt;=10GB for data. Each index will be between 5GB and size of data (for innodb). May be I&#8217;m wrong here, but I do think that size of index(City, Gender, Age) will be smaller than size of three (City), (Gender) and (Age). Considering quite low standalone selectivity of those columns, it&#8217;s up to you decide what to do.</p>
<p>1. <a href="http://www.mysqlperformanceblog.com/2006/08/10/using-union-to-implement-loose-index-scan-to-mysql/#comment-1695" rel="nofollow">http://www.mysqlperformanceblog.com/2006/08/10/using-union-to-implement-loose-index-scan-to-mysql/#comment-1695</a><br />
2. <a href="http://dev.mysql.com/doc/refman/5.1/en/index-merge-intersection.html" rel="nofollow">http://dev.mysql.com/doc/refman/5.1/en/index-merge-intersection.html</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2008/08/21/how-to-find-wrong-indexing-with-glance-view/comment-page-1/#comment-347795</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Sat, 23 Aug 2008 02:17:27 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=472#comment-347795</guid>
		<description>tgabi,

I mean if you have  Index (A,B,C,D,E) and  MySQL can use (A,B,C) index only for this query. Considering it is selective enough (enough meaning query response time is acceptable) you&#039;re good. Though as your data size growths the acceptable selectivity tends to grow as well.  If you have 100000 profiles  selectivity of 1/10 is  often good enough as  scanning through 10000 rows in memory is often acceptable.  With 50.000.000 profiles this would not be the case. 

With Nachos case  if you have  &quot;WHERE GENDER=&#039;M&#039; AND RACE=&#039;B&#039; AND HAIR=&#039;Blond&#039;    and it is very selective index on (GENDER,RACE,HAIR) would work quite well to resolve it.    Though you really need to be looking at your best queries while picking solutions and so your worst cardinaliy.</description>
		<content:encoded><![CDATA[<p>tgabi,</p>
<p>I mean if you have  Index (A,B,C,D,E) and  MySQL can use (A,B,C) index only for this query. Considering it is selective enough (enough meaning query response time is acceptable) you&#8217;re good. Though as your data size growths the acceptable selectivity tends to grow as well.  If you have 100000 profiles  selectivity of 1/10 is  often good enough as  scanning through 10000 rows in memory is often acceptable.  With 50.000.000 profiles this would not be the case. </p>
<p>With Nachos case  if you have  &#8220;WHERE GENDER=&#8217;M&#8217; AND RACE=&#8217;B&#8217; AND HAIR=&#8217;Blond&#8217;    and it is very selective index on (GENDER,RACE,HAIR) would work quite well to resolve it.    Though you really need to be looking at your best queries while picking solutions and so your worst cardinaliy.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: tgabi</title>
		<link>http://www.mysqlperformanceblog.com/2008/08/21/how-to-find-wrong-indexing-with-glance-view/comment-page-1/#comment-347762</link>
		<dc:creator>tgabi</dc:creator>
		<pubDate>Sat, 23 Aug 2008 00:29:26 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=472#comment-347762</guid>
		<description>Peter,
you said &quot;assuming you can use index prefix which is selective enough&quot;. What do you mean by that ?
In specific case - derived from Nacho&#039;s example: say you have millions of users with gender, race and hair color. Many males, many blacks, many blondes, very few all three. How do you see this search in Mysql ?</description>
		<content:encoded><![CDATA[<p>Peter,<br />
you said &#8220;assuming you can use index prefix which is selective enough&#8221;. What do you mean by that ?<br />
In specific case &#8211; derived from Nacho&#8217;s example: say you have millions of users with gender, race and hair color. Many males, many blacks, many blondes, very few all three. How do you see this search in Mysql ?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2008/08/21/how-to-find-wrong-indexing-with-glance-view/comment-page-1/#comment-347742</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Fri, 22 Aug 2008 23:24:44 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=472#comment-347742</guid>
		<description>Nacho,

For cases like you mentioned - Dating site search I would rather go with Sphinx all together.  It handles cases like this beautifully (even if you do not have any full text search)  though if you want to stick to MySQL you&#039;ve got to look at the cardinality for different columns and query pattern and handle it appropriately.  For example gender may not be selective but specified in 99% of the cases which makes it good first column in the index.</description>
		<content:encoded><![CDATA[<p>Nacho,</p>
<p>For cases like you mentioned &#8211; Dating site search I would rather go with Sphinx all together.  It handles cases like this beautifully (even if you do not have any full text search)  though if you want to stick to MySQL you&#8217;ve got to look at the cardinality for different columns and query pattern and handle it appropriately.  For example gender may not be selective but specified in 99% of the cases which makes it good first column in the index.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2008/08/21/how-to-find-wrong-indexing-with-glance-view/comment-page-1/#comment-347741</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Fri, 22 Aug 2008 23:20:56 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=472#comment-347741</guid>
		<description>tgabi,

I have cases when billions records are loaded daily so it no more excites me.

Speaking about index merge -  indeed there are cases when index merge works when multiple column index does not however if they both work multiple column index are most likely to be the winner.

There are always exceptions.   Even running with MySQL defaults can be good fit for particular case. I&#039;m just saying this is very typical red flag thing. Sure you need to look in details to check if it is really the case but it is called red flag exactly because it shows there are the problem in say 90% of all cases.    If you made deliberate choice going with multiple column indexes good for you - if you can make this deliberate choice you already stand away from the crowd :)</description>
		<content:encoded><![CDATA[<p>tgabi,</p>
<p>I have cases when billions records are loaded daily so it no more excites me.</p>
<p>Speaking about index merge &#8211;  indeed there are cases when index merge works when multiple column index does not however if they both work multiple column index are most likely to be the winner.</p>
<p>There are always exceptions.   Even running with MySQL defaults can be good fit for particular case. I&#8217;m just saying this is very typical red flag thing. Sure you need to look in details to check if it is really the case but it is called red flag exactly because it shows there are the problem in say 90% of all cases.    If you made deliberate choice going with multiple column indexes good for you &#8211; if you can make this deliberate choice you already stand away from the crowd <img src='http://www.mysqlperformanceblog.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
</channel>
</rss>

