August 28, 2014

How is join_buffer_size allocated?

When examining MySQL configuration, we quite often want to know how various buffer sizes are used. This matters because some buffers (sort_buffer_size for example) are allocated to their full size immediately as soon as they are needed, but others are effectively a “max size” and the corresponding buffers are allocated only as big as needed (key_buffer_size). There are many examples of this. What about join_buffer_size?

I saw a my.cnf with a 128M join_buffer_size the other day and needed to research this quickly before I gave advice. The join buffer is a special case. Unlike many of the buffers that are allocated per-thread (i.e. per-connection), this one is allocated per-join-per-thread, in special cases. A join buffer is allocated to cache rows from each table in a join when the join can’t use an index. This is because we know that the nested loop is effectively going to do a table scan on the inner table — it has to, because there’s no index. If the query joins several tables this way, you’ll get several join buffers allocated, for example, this one will have two:

You can see these un-indexed queries in SHOW STATUS as Select_full_join, and you want zero of them. Anyway, back to the question about how the buffer is allocated. Its meaning is actually “minimum join buffer size.” Here’s the code, in sql/sql_select.cc in 5.1.47 source:

On line 14271 the server decides the size of the cache: it is the greater of the join_buffer_size or the size that’s been determined to be needed. And it’s allocated all at once. So a 128M join_buffer_size is indeed very bad!

And this leads me to an interesting possibility to run the server out of memory:

That should try to allocate 1GB of memory for each join. If you execute this and nothing bad happens, you might be seeing this bug: http://bugs.mysql.com/55002

About Baron Schwartz

Baron is the lead author of High Performance MySQL.
He is a former Percona employee.

Comments

  1. peter says:

    Baron,

    First about allocation – it is checked but it is NOT reported if it failed which can cause someone to set large buffer and get lower performance because it is not used.
    I think MySQL should warn about such failures – print stuff in the error log file or at least have a global variable counter for allocation failures (which should stay 0 for any well configured system)

    A note on crashing the server – you would do a lot better by allocating very large number of smaller buffers, for example 128K or so. If Server failed to allocate 1GB it still can likely allocate few MB needed to run most of concurrent queries successfully. If however you can’t allocate even 128K any more nothing else probably will work.

    Also do not forget about over commit behavior for a lot of systems – they can let you to allocate a lot of memory but may fail when you can actually use it (access it)

  2. dalin says:

    Then what is the purpose of a setting that can only be used to waste memory?

  3. Geoff says:

    Is it correct that the allocated cache size is then the maximum of *one* row of the join data and the size specified in the configuration? As I understand, that would allow it to degrade from a Block-Nested Loop Join to a Nested Loop Join should only one row fit in cache.

  4. Baron,

    Very interesting! A buffer-per-join-per query is news to me. What about deallocation, then? Do all these per-join buffers live for the duration of the connection?

    Thanks

  5. It’s not accurate to say it can ONLY be used for that. I explained in the blog post what the server uses it for.

  6. Shlomi, buffers are usually deallocated at the end in “cleaning up” state. I haven’t checked where this one is.

    Geoff, I am not sure; I did not look that closely at that code to figure out its purpose.

  7. I disagree with “A join buffer is allocated to cache rows from each table in a join when the join can’t use an index.” because even if MySQL decides to scan all index rows and/or uses range for SELECTion it prefers to use the join buffer. Also, there is another variable that determines the max size of join buffer and that is max_join_size (no. of rows to be joined not the buffer size).

  8. Shahryar, you are right. The manual (http://dev.mysql.com/doc/refman/5.0/en/nested-loop-joins.html) has the following quote: “Join buffering can be used when the join is of type ALL or index (in other words, when no possible keys can be used, and a full scan is done, of either the data or index rows, respectively), or range.”

    Thanks for correcting me!

  9. Normalex says:

    size=max(thd->variables.join_buff_size, cache->length);

    and the main question is how does the cache->length is being determined?
    If it just waists the memory what is the point to use join_buff_size then?

    There is obviously a lack of documentation about it.

Speak Your Mind

*