August 23, 2014

Joining many tables in MySQL – optimizer_search_depth

Working on customer case today I ran into interesting problem – query joining about 20 tables (thank you ORM by joining all tables connected with foreign keys just in case) which would take 5 seconds even though in the read less than 1000 rows and doing it completely in memory. The plan optimizer picked was very good one, yet you could notice EXPLAIN itself was taking same 5 seconds, which points to problem with optimizer performance. Note though if you have subqueries these might need to be executed during EXPLAIN phase yet making it unusable to check the optimizer performance.

Solution for this problem was to use set optimizer_search_depth=0, rarely used option which as per manual will chose best value automatically. Making this change I could bring optimization, and full query execution time to less than 50ms. Low values, such as 3,4 provided a bit better performance but I decided against using this as I did not want to risk likehood of execution plans changing for some over queries joining less number of tables.

I was wondering if 0 is automatic selection why do we have value of 62 being default in MySQL 5.5 which can produce very expensive plan selections ? Investigating this further I found the following explanation from Timour Katchaounov in MySQL mailing list archives

I have some recollection that there were few main reasons for the
decision to keep exhaustive search as the default:
- backwards compatibility,
- the hypothesis that most users have joins with few tables,
- it is not clear how far from optimal plans do we get by using a greedy
search.

From the same discussion we can learn how automatic selection works – it picks value of min(number of tables, 7) essentially limiting search depth to no more than 7 at which complexity is reasonable. This makes Timour explanation somewhat conflicting though as if we assume MySQL users do not join lots of tables (less than 7) when using 0 as default value would not impact them.
For people who have more than 7 tables in join I think faster execution plan computation would be more important than backward compatibility.

In MySQL 5.6 things are likely to get even better handling joins of many tables as optimizer heuristics are improved so much higher search depths are feasible now.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. Timour says:

    Peter,

    The above comment was written long time ago, when I was just hired at MySQL. The assumptions
    about what are typical MySQL may have been true then in 2004, but they are likely not true now
    in 2012. So yes, the default should be changed to ’0′ which means “choose automatically”, which
    in turn means “choose depth 7″.

    Is this what your blog boils down to?

    At the time when I implemented the so-called “greedy optimizer”, I did some experiments
    to determine what is a reasonable limit for the search depth (and the total number of tables) that can
    be optimized in reasonable time. I reached the conclusion that if we want a simple magic
    threshold, the number is “7″. This is also confirmed by looking at the graph of the N!
    function. Indeed, after 7, this function grows very fast – 7! = 5040, while 8! = 40320. The worst-case
    complexity of MySQL’s join optimizer is N!.

    The goal of this choice was to provide a conservative limit that guarantees that any query
    can be optimized quickly no matter how bad the pruning heuristics works. With heuristics,
    things can go arbitrarily bad, that’s why the choice of higher search depth was left to the user
    who knows what they are doing. We still have to investigate the potential pitfalls with the change
    to plan pruning in MySQL 5.6.

    Ideally, the server should analyze the query and should determine the search depth based on
    some syntactic properties and knowledge of indexes. The problem is that it is very hard to predict
    in general the actual complexity of an algorithm.

  2. Timour,

    The message in the list I mention is dated in 2010 but I understand it refers back to much early time when you start working on this feature. The main point of my blog post is what default value of optimizer_search_depth is impractical as 62! which is potential number of combinations it exposes to is huge and it needs to be lowered down significantly in the cases optimizer spends too much time choosing the plan.

    I’ve looked at playing with optimizer_prune_level variable which is another one available. The default value is already to do pruning and it is still taking way to much time to get the plan in some cases I’ve seen.

  3. Peter,

    I think even changing the query to use index/join hints can bring down the time its taking the query to complete, did you try with STRAIGHT_JOIN and/or index hints.

  4. Bartosz Bednarowicz says:

    Ovais,

    Syntactic hints surely help pushing the performance in case of particular queries but surely cannot be an all-around answer to inefficiencies of the optimizer, that should be doing a good job in the first place. In addition, hints need to be built into the (application’s) logic which is simply a nightmare for developers in case of migration or major change in data profiles.

  5. Bartosz,

    You are right this is due to an optimizer inefficiency. But till the time the optimizer is not efficient enough we would have to use workarounds. In fact hints tend to be used a lot because although mostly optimizer makes good choices but sometimes it doesn’t.

  6. Alexey Polyakov says:

    When a developer writes query with 15+ tables join, they should be prepared to manually optimize it. The problem is that this bottleneck is unobvious (or was – when there was no profiling available in mysqld), so even skillled developers may get stuck. Properly documenting it and giving feedback on “explain” output (and slow log entry) should solve it.

    I don’t think it’s good idea to make query optimizer more complex, because it’s speed is crucial for overall mysql performance.

  7. Bartek Bednarowicz says:

    Ovais, Alexey – I agree with you completely that once your architecture is entrenched within MySQL there is nothing more you can do than to use optimizer hints and from this angle everyone, including myself, is happy they are available. The point I was trying to put through, and something I believe Peter was hinting at, is that much more attention should be put into optimizer not needing the hints in the first place.

    From medium and small business perspective, which needs to remain dynamic by definition, if you go out to the business people and tell them MySQL, only one of the options in RDBMS environment, will run optimally only when an experienced MySQL specialist will custom-code the app with MySQL-specific hints reducing the app portability, each of those costing additional money and putting up question marks on future migration possibilities and costs, those business people will laugh in you face and pick another engine.

    Fortunately, as Peter noted, the changes are already happening.

  8. Alexey Polyakov says:

    I think the problem isn’t MySQL-specific in the first place. If you’re using RDBMS in a way it was not intended for, you have to employ dirty tricks to make it work.
    In the original case Peter mentioned optimizer can be configured to process multi-table joins with acceptable performance (with some limitations – I doubt it would scale if tables grow). But it’s still much more reliable to use explicit join order, just to make sure it works as well after software update.

    And a bit more off-topic – business folks sometimes are not afraid to hire specialists. :) I once had the experience exactly opposite of what you said – managers hired a dedicated full-time MySQL DBA, and I was highly sceptical about it, and joking about it all the way. After some time I realized I was completely wrong (when it became obvious that hiring this person paid well). Business people that are not afraid to hire more expensive and experienced specialists tend to get better results. :)

Speak Your Mind

*