Suboptimal ORDER BY implementation, especially together with LIMIT is often the cause of MySQL Performance problems.
Here is what you need to know about ORDER BY … LIMIT optimization to avoid these problems
ORDER BY with LIMIT is most common use of ORDER BY in interactive applications with large data sets being sorted. On many web sites you will fine top tags, recently registered users etc – which would often require ORDER BY with LIMIT in the back end. In general this type of ORDER BY looks like: SELECT ….. WHERE [conditions] ORDER BY [sort] LIMIT N,M
Make sure it uses index It is very important to have ORDER BY with LIMIT executed without scanning and sorting full result set, so it is important for it to use index – in this case index range scan will be started and query execution stopped as soon as soon as required amount of rows generated.
For example if I do SELECT * FROM sites ORDER BY date_created DESC LIMIT 10; I would use index on (date_created) to get result set very fast.
Now what if I have something like SELECT * FROM sites WHERE category_id=5 ORDER BY date_created DESC LIMIT 10;
In this case index by date_created may also work but it might not be the most efficient – If it is rare category large portion of table may be scanned to find 10 rows. So index on (category_id, date_created) will be better idea.
Lets take a look at a bit more complex case: SELECT * FROM sites WHERE category_id in (5,10,12) ORDER BY date_created DESC LIMIT 10;
Even though it looks quite similar to previous one it is a lot different as there are multiple category_id values in the list now so index on (category_id, date_created) can’t be used directly. Index on date_created separately would still work. The good from performance standpoint (even though a bit ugly) will be UNION workaround I already wrote about.
So what if you have application which can perform search on many different columns, with worse then perfect selectivity. Various social networking and dating sites are perfect example of such queries
SELECT FROM people where gender=’m’ and age between 18 and 28 and country_id=5 and city_id=345 order by last_online desc limit 10;
There could be many possible limiting factors with all of them being optional. This is hard nut to crack and I know on high end custom search solutions can be developed, but if we stick to simple MySQL using multiple indexes on most selective columns would be good idea for performance of such queries.
For example you may put index on(gender,last_online) assuming most people will have gender specified, as well as (country_id,city_id,last_online) assuming in most cases these will be specified. It takes a good look at queries actually being run and data selectivity to come up with good set of indexes for such case, it also may need to be adjusted in the future.
The main thing to watch for, if you do not have full where clause resolved by index is how many rows you need to scan to resolve order by (this can be found in slow query log or by examining Hander statistics). If only 50 rows are examined to provide
10 rows of result set you’re in decent shape but if it is 5000 you might need to rethink your indexing.
Also note – number of records scanned to provide result set will be very dynamic based on particular constant and other factors.
For example for our dating example if we use only (last_online) index and look for people from USA we likely will find 10 people pretty quickly, if the country is small or simply there are few members from the country, ie Slovenia – same kind of search might need to scan 1000s times more rows to provide result set.
In the example above we did order by by last column, in fact index can be used for ORDER BY if sorting is done by leading column(s). Note however columns following column used for order by can’t be used to restrict result set. For example:
key(a,b,c) SELECT * FROM tbl WHERE c=5 ORDER BY a,b limit 10 – In this case first two columns from the index can be used to satisfy order by, index however will not be helpful to check c=5 (unless it is index covered query). Index on
(c,a,b) would work better for query above.
Do not sort by expressions I guess this one is obvious – expressions or functions will block index usage for order by.
Sort by column in leading table if you have JOIN with ORDER BY … LIMIT you should try hard to have sorting column(s) to be in the leading table. If ORDER BY is going by field from the table which is not first in the join order index can’t be used. Sometimes it means breaking normalization and duplicating column(s) you’re going to use in ORDER BY in other table.
Here is example when ORDER BY is done by second table which requires filesort:
mysql> explain select test.i from test, test t where test.k=5 and test.i=t.k order by t.k limit 5; +----+-------------+-------+------+---------------+------+---------+-------------+------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+-------------+------+---------------------------------+ | 1 | SIMPLE | test | ref | PRIMARY,k | k | 4 | const | 1 | Using temporary; Using filesort | | 1 | SIMPLE | t | ref | k | k | 4 | test.test.i | 1 | Using where; Using index | +----+-------------+-------+------+---------------+------+---------+-------------+------+---------------------------------+ 2 rows in set (0.00 sec)
However if first table has “const” or “system” access type it is effectively removed from join execution (replaced with constants) and so ORDER BY can be optimized even if it is done by second table:
mysql> explain select test.i from test, test t where test.i=5 and test.k=t.k order by t.k limit 5; +----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+ | 1 | SIMPLE | test | const | PRIMARY,k | PRIMARY | 4 | const | 1 | | | 1 | SIMPLE | t | ref | k | k | 4 | const | 1 | Using index | +----+-------------+-------+-------+---------------+---------+---------+-------+------+-------------+ 2 rows in set (0.01 sec)
The difference between these cases is “i” is primary key while “k” is simply indexes column.
Note: In some cases even if it is possible to use index to do ORDER BY with JOIN MySQL still will not be able to use it as Optimizer is not smart enough yet to detect such cases:
mysql> explain select test.i from test, test t where test.k=5 and test.i=t.k order by test.k,t.j limit 5; +----+-------------+-------+------+---------------+------+---------+-------------+------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+------+---------+-------------+------+---------------------------------+ | 1 | SIMPLE | test | ref | PRIMARY,k | k | 4 | const | 1 | Using temporary; Using filesort | | 1 | SIMPLE | t | ref | k | k | 4 | test.test.i | 1 | Using where; Using index | +----+-------------+-------+------+---------------+------+---------+-------------+------+---------------------------------+ 2 rows in set (0.00 sec)
In this case there is index (k,j) on the table so indexes could be used on each of the tables to optimize order by, or at least local sort could be used for each t.k=const value for the second table. Which is not done however.
Sort in one direction. If you have ORDER BY col1, col2 it can be optimized using index, if you have
ORDER BY col1 DESC, col2 DESC same thing, however if you would have ORDER BY col1, col2 DESC MySQL will have to use filesort. Classic for solution for this would be to have index which is sorted appropriately (ascending by col1 and descending by col2) but MySQL can’t do it at this point. Workaround which can be currently used is separate column which holds reverse values, so you can do ORDER BY col1, col2_reverse instead.
Beware of large LIMIT Using index to sort is efficient if you need first few rows, even if some extra filtering takes place so you need to scan more rows by index then requested by LIMIT. However if you’re dealing with LIMIT query with large offset efficiency will suffer. LIMIT 1000,10 is likely to be way slower than LIMIT 0,10. It is true most users will not go further than 10 page in results, however Search Engine Bots may very well do so. I’ve seen bots looking at 200+ page in my projects. Also for many web sites failing to take care of this provides very easy task to launch a DOS attack – request page with some large number from few connections and it is enough. If you do not do anything else make sure you block requests with too large page numbers.
For some cases, for example if results are static it may make sense to precompute results so you can query them for positions.
So instead of query with LIMIT 1000,10 you will have WHERE position between 1000 and 1009 which has same efficiency for any position (as long as it is indexed)
Force index if needed In some cases MySQL Optimizer may prefer to use different index, which has better selectivity or just better estimates instead of which allows you to do the sort. For example if you would have indexes on (country_id,city_id) and index on (country_id,last_online) for query SELECT * FROM people WHERE country_id=5 and city_id=6 order by last_online desc limit 10; first index will be likely selected even if it leads to filesort.
The solution for this problem is ether extending your indexes so MySQL Optimizer does not have to chose between better sort or better lookup or use FORCE INDEX to force it to use appropriate index.
One more note about ORDER BY … LIMIT is – it provides scary explain statements and may end up in slow query log as query which does not use indexes, even if it is quite fast:
mysql> explain select * from test order by k limit 5; +----+-------------+-------+-------+---------------+------+---------+------+---------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+------+---------+------+---------+-------+ | 1 | SIMPLE | test | index | NULL | k | 772 | NULL | 1638400 | | +----+-------------+-------+-------+---------------+------+---------+------+---------+-------+ 1 row in set (0.03 sec)
See – “rows” is showing us there are estimated 1.6 million of rows to be scanned, while we well know it will be just 5 in this case.
Many of the tips I’ve mentioned here work for ORDER BY without LIMIT as well but there are some differences. I should write another article about ORDER BY without limit and large tables soon.


That was extremely helpful. I was having a hard time locating good and clear resources on limit and order optimization. Thanks!
Mikes solution is by far the fastest.
http://jan.kneschke.de/projects/mysql/order-by-rand a.id
This brings trouble as well because the random number is generated inside the query resulting with two different random numbers. This can be overcome by making a pre-query to generate a random number in your application, still leaving you with the first problem.
my choice was to combine the ORDER BY RAND() and the other one.
SELECT * FROM
(SELECT * FROM comments
WHERE count > $number
AND topic = $topic
LIMIT 100)
AS r1 ORDER BY RAND() LIMIT 5
The resulting speed is acceptable. The first LIMIT can be use to adapt the level of random in the query.
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY ALL NULL NULL NULL NULL 5 Using temporary; Using ilesort
2 DERIVED comments ALL NULL NULL NULL NULL 140809 Using where
The query as you can see goes through 140.000 rows in 0.0792 seconds. Not as fast as the other one but a lot more flexible and open for other WHERE statements
Once again a post has been cut into pieces.
But it has one problem. What if you intend to look for multiple rows with another criteria.
Lets say you have 100.000 rows and want to output 5 randomized rows. From the 100.000 rows 200 qualify to be in the resulting rows. You can not run the query 5 times because you might end up with the same row 5 times. If there is a large gap between two qualifying results it can happen that you generate a random number that is in the gap thus giving you the same row over and over again.
Secondly you may hit the end of the set thus generating no output at all. Some ppl suggested to run the query backwards and forwards. i.id (smaller-equal) a.id and the i.id (greater) a.id
This brings trouble as well because the random number is generated inside the query resulting with two different random numbers. This can be overcome by making a pre-query to generate a random number in your application, still leaving you with the first problem.
Addendum:
I forgot to create a index over the concerning collumns. Now the efficiency has increased a lot.
But I just found a problem. The results are not spread evenly. They allways cluster around the begining of the table.
Any way to alter this behaviour?
Great resource Peter, and also thanks to Jan (comment #48) for his “compromise” solution to the ORDER BY RAND() dilemma. I’ve now managed to get a query (testing with 500k rows, requiring < 10 random rows) down from 0.8s execution time to just 0.05s.
SELECT *, MATCH(title, url, stripped) AGAINST(’test’ IN BOOLEAN MODE) as score FROM sites
WHERE stat = ‘A’
ORDER BY score
having score >0
Hello Peter,
in your presentation on FULLTEXT SEARCH you say:
â— Be care careful with extra where clause if it
filters out a lot of matches.
– select title from cont where match (title,content) against (“global service
company”) and date_added>unix_timestamp(“2006-07-18 18:00:00″)
limit 10;
â— Takes 26 min 43.59 sec
I think, maybe it was the unix_timestamp-function, that slowed things down? Because maybe the unix-timestamp had to be calculated for each of the millions of records, and therefore no index could be used?
Tobias,
In this case unix_timestamp is called for constant so it is not executed millions of time. It also would not block index usage. The problem is once you’re using full text search index you can’t use other index to do sorting.
Peter, Thanks….
This is where I found very useful matters about optimization which I have been looking for so long.
Thanks again.
Mathu Mou
How to optimized this query in PHP
$sqls=”select id,state from states”;
$rss=mysql_query($sqls) or die(mysql_error());
$num_rowss=mysql_num_rows($rss);
$count=1;
while($num_rowss && $datas=mysql_fetch_assoc($rss))
{
$state_id=$datas['id'];
$state=$datas['state'];
$sqlc=”select * from companies,package where comp_status=1 and comp_account_site=0 and comp_balance >0 and comp_mem_pac=pac_id order by pac_amount DESC, rand()”;
$rsc=mysql_query($sqlc) or die(mysql_error());
$num_rowsc=mysql_num_rows($rsc);
$checkonlyonetime=1; // header
while($num_rowsc && $datac=mysql_fetch_assoc($rsc)) // loop of company
{
$comp_id=$datac['comp_id'];
$comp_name=$datac['comp_name'];
$comp_mapped_city=$datac['comp_mapped_city'];
@$comp_mapped_city=explode(“,”,trim($comp_mapped_city,”,”));
}
}
… LIMIT optimization
SELECT *
FROM forum_posts AS pa
LEFT JOIN forum_posts_text AS pb ON pa.post_id = pb.post_id
LEFT JOIN forum_users ON user_id = post_poster
WHERE post_topic_id = ’450′
ORDER BY pa.post_id ASC
LIMIT 5475 , 15
versus
SELECT *
FROM forum_posts AS pa
LEFT JOIN forum_posts_text AS pb ON pa.post_id = pb.post_id
LEFT JOIN forum_users ON user_id = post_poster
WHERE post_topic_id = ’224′
AND pa.post_id >= (
SELECT MAX( post_id )
FROM (
SELECT post_id
FROM forum_posts
WHERE post_topic_id = ’224′
ORDER BY post_id ASC
LIMIT 6075 , 1
) AS tmp )
ORDER BY pa.post_id ASC
LIMIT 15
~ 0.09 sec versus ~ 0.004 sec
Tables with posts: 2×292,000+ rows (25MB + 89MB)
Topic: 6092 rows (rows that meet condition WHERE post_topic_id = ’224′)
Source: http://www.easemarry.com/blog/mysql-limit-optimization/
If you found something better, I would be glad to hear.
Here SELECT MAX(post_id) FROM (SELECT …) is not necessary. It’s redundant
(it’s obvious ‘cos that subquery extracts only one post_id – see LIMIT x, 1)
SELECT *
FROM forum_posts AS pa
LEFT JOIN forum_posts_text AS pb ON pa.post_id = pb.post_id
LEFT JOIN forum_users ON user_id = post_poster
WHERE post_topic_id = ’224′
AND pa.post_id >= (
SELECT post_id
FROM forum_posts
WHERE post_topic_id = ’224′
ORDER BY post_id ASC
LIMIT 6075 , 1)
ORDER BY pa.post_id ASC
LIMIT 15
I have this query, but the orderby and the last 2 left joins affect considerably to the performance.
Any idea how I can optimize this query?
Thanks.
SELECT
*
FROM
message
LEFT JOIN
accounts a ON message_account = a.id
LEFT JOIN
opportunities o ON message_project = o.id
LEFT JOIN
leads l1 ON (UCASE(message_from) LIKE CONCAT(\’%\’, UCASE(l1.email1), \’%\’)
AND l1.email1 \’\') OR (UCASE(message_from) LIKE CONCAT(\’%\’,
UCASE(l1.email2), \’%\’) AND l1.email2 \’\')
LEFT JOIN
leads l2 ON (UCASE(message_to) LIKE CONCAT(\’%\’, UCASE(l2.email1), \’%\’)
AND l2.email1 \’\') OR (UCASE(message_to) LIKE CONCAT(\’%\’,
UCASE(l2.email2), \’%\’) AND l2.email2 \’\')
WHERE
message_visibility
AND message_mailbox = UCASE(\’user\’)
AND message_visibility 1
ORDERBY
message_ocode DESC
One tweak that we do for large limit offsets on pagination is to first select columns that are part of the index. Typically this would be the id’s. After that we use an “IN” to select the the other data columns from the same table. We find that there is a vast improvement on selects that use large offsets.
Example (category,id index)
select photo.id, photo.description from photo where category=1 limit 30000000,20
Query time: 4 seconds
To speed things up we split the call into 2 queries
select photo.id from photo where category=1 limit 30000000,20
Query time: 0.030 (this just uses the index so no traversing the data table)
Select photo.id, photo.description from photo where photo.id IN (x1,x2…x20)
The x1 – x20 are the results the first query returns
Query time: 0.020
So with a simple change like this we sped things up by a couple of orders of magnitude and allow our users to use large pagination.
I assume it has something to do that the the first query only uses the index to get the ids and does not have to jump around getting a larger data set.
Cheers.
Radek, and what about if I want to order the resultset?
eg: select id …. order by id desc ..
Have you tried?
I’m Joining multiple tables, so the join must be done in the second query..
Unfortunately, with your solution I have to use 3 queries:
- SELECT COUNT(*) .. (for total rows that meets the criteria blabla)
- SELECT id … LIMIT x,y
- SELECT id, name .. IN (x1,x2…x20)
Anyway.. thank you for the tweak!
Yes Sergio, You dont need to do the count as usually you have the page number or offset in the get varaible as you go from page to page. Typically we do a count first to create all the pagination stuff. Current page, total pages, etc.
Example of it in action here. http://www.pinkbike.com/photo/list/?date=all&page=6
You can page to page 100,000 (24 per page, so the offset is 2.4 million) without and slow down. If this was done in one call it literally takes seconds to generate offsets of 2,000,000
Yes, you can order by in the first query by id no problem. The id is indexed so no penalty there. Also your result set can be ordered by also but that is fast as you only have 24 items, or however many you show on the page.
Cheers
Hey Radek,
Yes, I was talking about this: “Typically we do a count first to create all the pagination stuff. Current page, total pages, etc.” – you have to create a counter for pagination information.
Thank you for your solution, maybe I will try it on other projects. Right now I’m fine with solution posted above by me.
Hi, was wondering if there is a way to stop a SELECT COUNT(*) from continuing once it reached a max number of rows. Eg, i want to do something like:
SELECT count(*) FROM table_a LIMIT 1000;
So if the table has more than 1k rows, it should just return and say “1000″. What i’m trying to avoid is have the entire table scanned, in particular if it has millions of rows. Basically the question i want to ask is: “Does table X have at least N rows?”.
Hi Radek,
I am trying replay this from your post:
# select photo.id from photo where category=1 limit 30000000,20
# Query time: 0.030 (this just uses the index so no traversing the data table)
but withou success. Can you help me?
In MySQL I do:
> create table c (id integer auto_increment, a integer, b integer, primary key(id));
> insert into c (a,b) value (1,2),(5,6),(3,9),(122,22),(32,54);
> insert into c (a,b) value (1,2),(5,6),(3,9),(122,22),(32,54);
about 21x times i run this to fill table:
> insert into c (a,b) select a,b from c;
So I have:
> select count(*) from c;
+———-+
| count(*) |
+———-+
| 20971520 |
+———-+
1 row in set (0.00 sec)
> show index from c;
+——-+————+———-+————–+————-+———–+————-+———-+——–+——+————+———+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+——-+————+———-+————–+————-+———–+————-+———-+——–+——+————+———+
| c | 0 | PRIMARY | 1 | id | A | 20971520 | NULL | NULL | | BTREE | |
+——-+————+———-+————–+————-+———–+————-+———-+——–+——+————+———+
1 row in set (0.00 sec)
And finally:
> explain select id from c limit 20000000,20;
+—-+————-+——-+——-+—————+———+———+——+———-+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——-+——-+—————+———+———+——+———-+————-+
| 1 | SIMPLE | c | index | NULL | PRIMARY | 4 | NULL | 20971520 | Using index |
+—-+————-+——-+——-+—————+———+———+——+———-+————-+
1 row in set (0.00 sec)
> select id from c limit 20000000,20;
+———-+
| id |
+———-+
| 20000001 |
……..
| 20000020 |
+———-+
20 rows in set (4.58 sec)
So even if I use indexes, the paginationing on huge table is very slow
Any idea why?
My MySQL is 5.0.51a-18
PC: Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz, bogomips: 4776.06, 8GB RAM, and system including mysql id on SSD hard drive
What I am doing wrong?
Thanks
bozek, is it possible that your key_buffer size is set too small and therefore the whole index for your 30 million records is not large enough and therefore not caching the index? Not on my application this is set larger than the total index sizes for ALL tables and not just this one.
As you have somthing like this SELECT * FROM table LIMIT 10
It working with this script SELECT * FROM table LIMIT 0,10
Is there anyway to accomplish the same thing as this query, avoiding a temporary table?
select ID_MEMBER_STARTED, COUNT(*) as hits from smf_topics group by ID_MEMBER_STARTED order by hits desc limit 20;
There is an index on ID_MEMBER_STARTED and the table type is InnoDB.
Thanks dude that is rather helpful!
My query performance is very slow for large database.Table A has 2050702 records in it. ID is primary key and index on time.
SELECT ID, Name, Time from TableA WHERE Time >= ’2009-03-24 12:53:00′ and ID > 2050702
Order By ID DESC LIMIT 100
I would like to know ways to optimize the query above. SO retrieve last 100 records from TAble A matching the criteria in where clause
Add a query on (ID, Time) and re-order the WHERE clause to “ID > 2050702 and Time >= 2009-03-24 12:53:00″.
MySQL can only use one index in a query (basically), so you need an index that contains all the columns you are filtering and ordering by, in the same order.
Sorry, that should be “Add an index on” not “Add a query on”.
I have a query that takes 14 seconds to execute:
SELECT posts.* FROM posts force index (idx_post_date) INNER JOIN follow ON follow.followuserid = posts.userid WHERE follow.userid=’61585′ ORDER BY date DESC LIMIT 0, 20;
The table posts has ~1000000 records and the number of records in the posts table belonging to userid 61585 is 0.
The table follow has 63537 records.
This is how my indexes are set:-
mysql> show index from follow;
+——–+————+———-+————–+————–+———–+————-+———-+——–+——+————+———+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+——–+————+———-+————–+————–+———–+————-+———-+——–+——+————+———+
| follow | 0 | PRIMARY | 1 | userid | A | NULL | NULL | NULL | | BTREE | |
| follow | 0 | PRIMARY | 2 | followuserid | A | 63537 | NULL | NULL | | BTREE | |
+——–+————+———-+————–+————–+———–+————-+———-+——–+——+————+———+
mysql> show index from posts;
+——-+————+——————+————–+————-+———–+————-+———-+——–+——+————+———+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+——-+————+——————+————–+————-+———–+————-+———-+——–+——+————+———+
| posts | 0 | PRIMARY | 1 | id | A | 1047467 | NULL | NULL | | BTREE | |
| posts | 1 | idx_post_date | 1 | date | A | 1047467 | NULL | NULL | | BTREE | |
| posts | 1 | idx_posts_userid | 1 | userid | A | 22771 | NULL | NULL | | BTREE | |
+——-+————+——————+————–+————-+———–+————-+———-+——–+——+————+———+
mysql> explain SELECT posts.* FROM posts force index (idx_post_date) INNER JOIN follow ON follow.followuserid = posts.userid WHERE follow.userid=’61585′ ORDER BY date DESC LIMIT 0, 20;
+—-+————-+——–+——–+—————+—————+———+——————————-+——+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——–+——–+—————+—————+———+——————————-+——+————-+
| 1 | SIMPLE | posts | index | NULL | idx_post_date | 4 | NULL | 20 | |
| 1 | SIMPLE | follow | eq_ref | PRIMARY | PRIMARY | 8 | const,wwwproject.posts.userid | 1 | Using index |
+—-+————-+——–+——–+—————+—————+———+——————————-+——+————-+
Is there a way to optimize this query i have now been trying for quite some time but could not find a solution any help would be very much appreciated??
Good article. I didn’t realize the join order would affect the ORDER BY clause.
Thanks for the article, but I have an additional feature to throw at you.
Suppose I request to select, say 10, ordered some way. If none of the 10 fit any additional constraints I may have, I would like to fetch the second best 10 set, and on like this until I get the record that fits all my constraints.
How do I ensure the first 10 will not show up?
Let me just add that the additional constraints may eventually be associated with features that are outside the database.
Cheers,
Carlos
I have a query :
In this table category_id, date_created is indexed.
EXPLAIN select newsid from news where category_id=53 and approval=’yes’ order by date_created desc limit 0,10
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE news index PRIMARY,category_id date_created 9 NULL 216 Using where
the above query examines 216 rows to fetch only 10 records.
Whereas the below query examines 10 rows which it should, (here i am not searching by category_id):
EXPLAIN select newsid from news where approval=’yes’ order by date_created desc limit 0,10
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE news index NULL date_created 9 NULL 10 Using where
Why does the first query examine 216 rows? Is this fine or does the query need to be optimised futher.
Please advice, thanks in advance.
Reshma,
you seems to have poor index setup to begin with. You have “index scan” while you should have ref if you have (category_id,approval,date_created) for all queries.
If Index is scanned the more rows have to be scanned to provide 10 results the more selective where clause is.
Hello Peter,
Thanks for the quick response.
I have removed the index on category_id. After removing it Explain for the below select query is as follows:
EXPLAIN select newsid from news where category_id=53 and approval=’yes’ order by date_created desc limit 0,10
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE news index NULL date_created 9 NULL 10 Using where
This is fine i guess.
Another question is: If there are 56 records for category_id=53 and i want to display 100 records for all given categories in news table, why does below query examine 100 records for category_id=53 when there are only 56 of them?
EXPLAIN select newsid from news where category_id=53 and approval=’yes’ order by date_created desc limit 0,100
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE news index NULL date_created 9 NULL 100 Using where
Most of the rows are displayed based on category_id, so I want to know if I am doing it correctly.
If this is confusing I could send the table structures and queries.
In this post you have mentioned about DOS attack, do you mean paging records can cause DOS attack ?
Any help would be highly appreciated.
Hi Peter
We have a tool which is processing 30 million records, this tool is a migration project where we are using perl script to do that
we need to compare two tables where we are doing like this
for using counter of 50,000 records in chunk size
and then we are using
select column1, column2 from table order by table limit index_num(21 million), (500,000)
initially for smaller dataset, things are working fine (26 secs to 50 secs)
when index_num is growing, it is taking almost 30 mins to do that
We want to optimize this one.
As per your recent posts,
we need to have index, for column_1, we are creating indexes ..still it is not optimal.
Hey Peter,
I’ve been reading your articles for some time now, especially since I’ve been running into issues with wrapping my head around this whole MySQL optimization stuff.
Anyways, I get the jist of it, however, I run a real estate website (displays listings, etc, in a search results page), and there are just too many variables to try and cover an index on each scenario… I’d end up with 20 indexes, and I fear that any update/insert/delete queries to that table would blow up the internet.
I’m not looking for anybody to write me anything, I’m just looking to get a better understanding of this:
SELECT
`id`,
… more *necessary* columns …
FROM
`listings`
WHERE
`category` = 1
AND `country` = 1
AND `state` = 45
AND `city` = 35859
AND `visible` = 1
ORDER BY
`price` ASC
LIMIT 0, 10
Now, `category` numbers represent different listing types, ie. 1 = For Sale, 2 = For Rent, 3 = Pending Sale, and so on. 1 (For Sale is by the far the most popular, representing at least 75% of all listings in the table; probably more).
`country`, `state`, `city` are always in the query. `visible` is either a 1 or 0. 1 meaning the listing has not expired, and 0 meaning it has. Probably 85%+ instances of `visible` = 1 in the table.
Now, there are also other WHERE conditions that are possible, like WHERE `price` = 300000, or `bedrooms` = 2, or `bathrooms` = 3, etc. And who knows what the user is going to enter in.
Then, there is the order by on top of that.
What I have been looking for is how to accommodate these kinds of variables when there are so many possible scenarios? When I do an ORDER BY `price` DESC, it can take 4-5 seconds, maybe more… I guess that’s not the end of the world, but our traffic is always growing, and that will end up being unacceptable.
Any thoughts?
- Batman
Hi,
This is a hard one especially if you add ranges as MySQL will not be able to both use range for selection and sort by different column. You may and up with large scans and sorts which can use many resources. Sphinx (http://www.sphinxsearch.com) is often a good alternative in such cases even if you’re not using full text search query
Thanks for this artikel!! Was what I was looking for.
Thanks for the quick response. VEry Nice
Hello Peter,
I am going through your article its found pretty much inserting. I need one help as i am displaying my property query which is displaying properly and ordering the result by following 4 column
p.is_move_up DESC,REPLACE(IFNULL(p.is_feature,”"),”2015-12-31″,”") DESC,
p.listing_order DESC,p.propid DESC
In order site, properties are been displayed in this order: Moveup paid, Paid featured, paid, trial then expired. However, we want to show within each group the properties random and make them rotate. Now,
the properties do not rotate so it is not fair for users to see always the same properties in same order. So, we need the rotation to be within same group not mixing all.
Query :-
SELECT DISTINCT(p.propid),p.category_id,p.people_sleep,p.bedrooms,p.bathrooms,p.children,
p.airport,p.features,p.pets,p.smoke, p.wheelchair,p.prop_title,p.property_type,prop_date,
display_calender,other_url_calender,is_feature
FROM property p
LEFT JOIN category ON (p.category_id=category.id)
WHERE p.active IN(“yes”,”d”) AND p.is_feature >= SYSDATE()
ORDER BY p.is_move_up DESC,REPLACE(IFNULL(p.is_feature,”"),”2015-12-31″,”") DESC,
p.listing_order DESC,p.propid DESC
Can you please help me into this.
Hi Peter,
I have one question regarding your comment
“Sort by column in leading table if you have JOIN with ORDER BY … LIMIT you should try hard to have sorting column(s) to be in the leading table. If ORDER BY is going by field from the table which is not first in the join order index can’t be used.”
I have a similar situation where I can not have sorting columns in leading table; in such case please suggest any work around.
Thanks in anticipation.
Rbk
Thanks alot with your info!