Comments to my previous posts, especially this one by Gokhan inspired me to write a bit about descending indexes and about loose index scan, or what Gokhan calls “better range” support. None of these are actially related to Innodb tables in general – these are features MySQL should get for all storage engines at some point.
Descending indexes – This is something MySQL does not have at this point, but it was not where for so long at large extent because it is less problem than many people think. First – if index is ascending it does not mean it can’t be scanned in reverse order and it will well be. This is how MySQL will optimize indexed ORDER BY col DESC queries for example. Reverse scan could be as fast as forward scan – this is however where storage engines and operating systems come in play. For example certain operation systems might not do backward read-ahead which may slow it down a bit. Or some storage engines, such as MyISAM (for packed indexes) may have reverse scan being much slower than forward scan.
So when do you really need Descending indexes ? Most typical case is when you want to order by two colums in different directions: … ORDER BY price ASC, date DESC LIMIT 10 If you have indexed on (price,date) in ascending order you will not be able to optimize this query well – external sort (“filesort”) will be needed. If you would be able to build index on price ASC, date DESC the same query could retrive data in aready sorted order.
This is however something you can workaround by having something like “reverse_date” column and using it for sort. With MySQL 5.0 you even can use triggers to update it as real date updates so it becomes less ugly. In fact this is for example why you would see “reverse_timestamp” field in Wikipedia table structure.
Loose index scan – Number of years ago when I just started using MySQL I thought it would have any optimization which could come to my mind. For example if I would have (A>0 and B>6) clause and index (A,B) I expected it would start looking at all values where A>0 instantly jumping to onces have B>6 by using index. It is possibe. So I was shocked and upset to find out it did not. And this optimization is still not implemented. This is very important item to remember when you designing your new applications or porting ones from other databases. Designing the indexes for MySQL you should only make sure queries use “=” for all keyparts in the last of index. Only last one is allowed to have “range” clauses, such as >, IN etc. All clauses which follow the range in the index will not use index for their operation.
Let me give one more example KEY (A,B,C) A=5 and B>6 and C=7 In this case index will be used to check A=5 and B>6 cause. C=7 will not be using the index (rows with all C values will be retrieved from the index) and if this is not the index covered query you might rather shorten your key to KEY(A,B) to keep index smaller.
The good news are Loose index scan implementation is finally on a way. In fact it was even implemented in MySQL 5.0, but only for very narrow case of aggregate queries.
In general complete loose index scan implementation is one of my most wanted features for MySQL optimizer.
P.S If you post queries in your comments please also explain which indexes do you have on the table. SHOW CREATE TABLE is the best. Otherwise I can get you wrong.
Hey,
I’m doing the following SQL query on a 150k entry table:
explain select * from ad STRAIGHT JOIN category ON catid=sitecatid order by scraped asc,siteid asc limit 0,20;
How would you suggest doing the keys on scraped and siteid? If I do order by scraped asc, siteid asc it goes very fast, because I have an index on (scraped,siteid), but as of MySQL 5.0.21 there is no descending index possible, so how would you suggest I manage the descending index? I noticed you mentioned the reverse_timestamp hack in wikipedia, but that is apparently no longer in use, which means maybe there is a better way then creating a ‘negative value’ column or something that I could sort ASC, but would effectively be DESC.
Thanks,
Gabe
Gabe,
It is not entirely clear from the query which columns correspond to which of the tables, which affect how you whould optimize query dramatically. It is important sorting is done by fields from first tables. In your case if sorting is done by fields from “ad” table, key (scraped,siteid) is optimal solution for “asc” sort but if you need to have reverse sort you need to have
something like reverse_timestamp hack.
I googled a bit about reverse_timestamp and it looks Wikipedia was using it for a bit different purpose, this is why they do not need it any more. In MySQL before 4.0 WHERE key_part1=const ORDER BY key_part2 DESC simply was not optimized even if you’re sorting by single column. In MySQL 4.0 this problem is fixed and so Wikipedia did not need hack any more. However if you want to sort by two columns in different order you still need this kind of hack.
So in your case order by scraped desc,siteid desc should also be fast, while order by scraped desc,siteid asc should be slow.
Thanks,
What is your suggestion as to the way to create the reverse_id field? just (-1)* the id number? Or is there a better/more practical way you’ve found.
Thanks again for the response,
Gabe
Gabe,
Yes you can use id*-1 if id is signed or max_int-id for unsigned ids
If you really need sorting by multiple fields in different directions this is perhaps best solution.
It looks like this still isn’t implemented currently, which is a shame.
I would be interested to know if you think it is a feature that will just never happen?
Bug/Request below:
http://bugs.mysql.com/bug.php?id=13375
Many thanks 🙂
What is different between the internal implementation of “column!=null” and “column is not null”?
The column data type is bigint.