Working on customer case today I ran into interesting problem – query joining about 20 tables (thank you ORM by joining all tables connected with foreign keys just in case) which would take 5 seconds even though in the read less than 1000 rows and doing it completely in memory. The plan optimizer picked was very good one, yet you could notice EXPLAIN itself was taking same 5 seconds, which points to problem with optimizer performance. Note though if you have subqueries these might need to be executed during EXPLAIN phase yet making it unusable to check the optimizer performance.

Solution for this problem was to use set optimizer_search_depth=0, rarely used option which as per manual will chose best value automatically. Making this change I could bring optimization, and full query execution time to less than 50ms. Low values, such as 3,4 provided a bit better performance but I decided against using this as I did not want to risk likehood of execution plans changing for some over queries joining less number of tables.

I was wondering if 0 is automatic selection why do we have value of 62 being default in MySQL 5.5 which can produce very expensive plan selections ? Investigating this further I found the following explanation from Timour Katchaounov in MySQL mailing list archives

I have some recollection that there were few main reasons for the
decision to keep exhaustive search as the default:
– backwards compatibility,
– the hypothesis that most users have joins with few tables,
– it is not clear how far from optimal plans do we get by using a greedy
search.

From the same discussion we can learn how automatic selection works – it picks value of min(number of tables, 7) essentially limiting search depth to no more than 7 at which complexity is reasonable. This makes Timour explanation somewhat conflicting though as if we assume MySQL users do not join lots of tables (less than 7) when using 0 as default value would not impact them.
For people who have more than 7 tables in join I think faster execution plan computation would be more important than backward compatibility.

In MySQL 5.6 things are likely to get even better handling joins of many tables as optimizer heuristics are improved so much higher search depths are feasible now.

9 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Timour

Peter,

The above comment was written long time ago, when I was just hired at MySQL. The assumptions
about what are typical MySQL may have been true then in 2004, but they are likely not true now
in 2012. So yes, the default should be changed to ‘0’ which means “choose automatically”, which
in turn means “choose depth 7”.

Is this what your blog boils down to?

At the time when I implemented the so-called “greedy optimizer”, I did some experiments
to determine what is a reasonable limit for the search depth (and the total number of tables) that can
be optimized in reasonable time. I reached the conclusion that if we want a simple magic
threshold, the number is “7”. This is also confirmed by looking at the graph of the N!
function. Indeed, after 7, this function grows very fast – 7! = 5040, while 8! = 40320. The worst-case
complexity of MySQL’s join optimizer is N!.

The goal of this choice was to provide a conservative limit that guarantees that any query
can be optimized quickly no matter how bad the pruning heuristics works. With heuristics,
things can go arbitrarily bad, that’s why the choice of higher search depth was left to the user
who knows what they are doing. We still have to investigate the potential pitfalls with the change
to plan pruning in MySQL 5.6.

Ideally, the server should analyze the query and should determine the search depth based on
some syntactic properties and knowledge of indexes. The problem is that it is very hard to predict
in general the actual complexity of an algorithm.

Ovais Tariq

Peter,

I think even changing the query to use index/join hints can bring down the time its taking the query to complete, did you try with STRAIGHT_JOIN and/or index hints.

Bartosz Bednarowicz

Ovais,

Syntactic hints surely help pushing the performance in case of particular queries but surely cannot be an all-around answer to inefficiencies of the optimizer, that should be doing a good job in the first place. In addition, hints need to be built into the (application’s) logic which is simply a nightmare for developers in case of migration or major change in data profiles.

Ovais Tariq

Bartosz,

You are right this is due to an optimizer inefficiency. But till the time the optimizer is not efficient enough we would have to use workarounds. In fact hints tend to be used a lot because although mostly optimizer makes good choices but sometimes it doesn’t.

Alexey Polyakov

When a developer writes query with 15+ tables join, they should be prepared to manually optimize it. The problem is that this bottleneck is unobvious (or was – when there was no profiling available in mysqld), so even skillled developers may get stuck. Properly documenting it and giving feedback on “explain” output (and slow log entry) should solve it.

I don’t think it’s good idea to make query optimizer more complex, because it’s speed is crucial for overall mysql performance.

Bartek Bednarowicz

Ovais, Alexey – I agree with you completely that once your architecture is entrenched within MySQL there is nothing more you can do than to use optimizer hints and from this angle everyone, including myself, is happy they are available. The point I was trying to put through, and something I believe Peter was hinting at, is that much more attention should be put into optimizer not needing the hints in the first place.

From medium and small business perspective, which needs to remain dynamic by definition, if you go out to the business people and tell them MySQL, only one of the options in RDBMS environment, will run optimally only when an experienced MySQL specialist will custom-code the app with MySQL-specific hints reducing the app portability, each of those costing additional money and putting up question marks on future migration possibilities and costs, those business people will laugh in you face and pick another engine.

Fortunately, as Peter noted, the changes are already happening.

Alexey Polyakov

I think the problem isn’t MySQL-specific in the first place. If you’re using RDBMS in a way it was not intended for, you have to employ dirty tricks to make it work.
In the original case Peter mentioned optimizer can be configured to process multi-table joins with acceptable performance (with some limitations – I doubt it would scale if tables grow). But it’s still much more reliable to use explicit join order, just to make sure it works as well after software update.

And a bit more off-topic – business folks sometimes are not afraid to hire specialists. 🙂 I once had the experience exactly opposite of what you said – managers hired a dedicated full-time MySQL DBA, and I was highly sceptical about it, and joking about it all the way. After some time I realized I was completely wrong (when it became obvious that hiring this person paid well). Business people that are not afraid to hire more expensive and experienced specialists tend to get better results. 🙂

steve88aa

what you mean by search depth exactly