When examining MySQL configuration, we quite often want to know how various buffer sizes are used. This matters because some buffers (sort_buffer_size for example) are allocated to their full size immediately as soon as they are needed, but others are effectively a “max size” and the corresponding buffers are allocated only as big as needed (key_buffer_size). There are many examples of this. What about join_buffer_size?
I saw a my.cnf with a 128M join_buffer_size the other day and needed to research this quickly before I gave advice. The join buffer is a special case. Unlike many of the buffers that are allocated per-thread (i.e. per-connection), this one is allocated per-join-per-thread, in special cases. A join buffer is allocated to cache rows from each table in a join when the join can’t use an index. This is because we know that the nested loop is effectively going to do a table scan on the inner table — it has to, because there’s no index. If the query joins several tables this way, you’ll get several join buffers allocated, for example, this one will have two:
1 2 3 | select * from a_table join b_table on b.col1 = a.col1 join c_table on c.col2 = b.col2 |
You can see these un-indexed queries in SHOW STATUS as Select_full_join, and you want zero of them. Anyway, back to the question about how the buffer is allocated. Its meaning is actually “minimum join buffer size.” Here’s the code, in sql/sql_select.cc in 5.1.47 source:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | 14176 /***************************************************************************** 14177 Fill join cache with packed records 14178 Records are stored in tab->cache.buffer and last record in 14179 last record is stored with pointers to blobs to support very big 14180 records 14181 ******************************************************************************/ 14182 14183 static int 14184 join_init_cache(THD *thd,JOIN_TAB *tables,uint table_count) 14185 { ... snip ... 14268 cache->length=length+blobs*sizeof(char*); 14269 cache->blobs=blobs; 14270 *blob_ptr=0; /* End sequentel */ 14271 size=max(thd->variables.join_buff_size, cache->length); 14272 if (!(cache->buff=(uchar*) my_malloc(size,MYF(0)))) 14273 DBUG_RETURN(1); /* Don't use cache */ /* purecov: inspected */ 14274 cache->end=cache->buff+size; 14275 reset_cache_write(cache); 14276 DBUG_RETURN(0); 14277 } |
On line 14271 the server decides the size of the cache: it is the greater of the join_buffer_size or the size that’s been determined to be needed. And it’s allocated all at once. So a 128M join_buffer_size is indeed very bad!
And this leads me to an interesting possibility to run the server out of memory:
1 2 3 4 5 | set session join_buffer_size = 1 << 30; # 1GB select * from (select 1 union select 1) as x1 join (select 1 union select 1) as x2 join.... |
That should try to allocate 1GB of memory for each join. If you execute this and nothing bad happens, you might be seeing this bug: http://bugs.mysql.com/55002
Baron,
First about allocation – it is checked but it is NOT reported if it failed which can cause someone to set large buffer and get lower performance because it is not used.
I think MySQL should warn about such failures – print stuff in the error log file or at least have a global variable counter for allocation failures (which should stay 0 for any well configured system)
A note on crashing the server – you would do a lot better by allocating very large number of smaller buffers, for example 128K or so. If Server failed to allocate 1GB it still can likely allocate few MB needed to run most of concurrent queries successfully. If however you can’t allocate even 128K any more nothing else probably will work.
Also do not forget about over commit behavior for a lot of systems – they can let you to allocate a lot of memory but may fail when you can actually use it (access it)
Then what is the purpose of a setting that can only be used to waste memory?
Is it correct that the allocated cache size is then the maximum of *one* row of the join data and the size specified in the configuration? As I understand, that would allow it to degrade from a Block-Nested Loop Join to a Nested Loop Join should only one row fit in cache.
Baron,
Very interesting! A buffer-per-join-per query is news to me. What about deallocation, then? Do all these per-join buffers live for the duration of the connection?
Thanks
It’s not accurate to say it can ONLY be used for that. I explained in the blog post what the server uses it for.
Shlomi, buffers are usually deallocated at the end in “cleaning up” state. I haven’t checked where this one is.
Geoff, I am not sure; I did not look that closely at that code to figure out its purpose.
I disagree with “A join buffer is allocated to cache rows from each table in a join when the join can’t use an index.” because even if MySQL decides to scan all index rows and/or uses range for SELECTion it prefers to use the join buffer. Also, there is another variable that determines the max size of join buffer and that is max_join_size (no. of rows to be joined not the buffer size).
Shahryar, you are right. The manual (http://dev.mysql.com/doc/refman/5.0/en/nested-loop-joins.html) has the following quote: “Join buffering can be used when the join is of type ALL or index (in other words, when no possible keys can be used, and a full scan is done, of either the data or index rows, respectively), or range.”
Thanks for correcting me!
size=max(thd->variables.join_buff_size, cache->length);
and the main question is how does the cache->length is being determined?
If it just waists the memory what is the point to use join_buff_size then?
There is obviously a lack of documentation about it.
Hi,
I check my queries. Because I’ve a “queries without join” and I find out queries with cross join.
I’ve index in all involved columns.
The cross join increase my “Select_full_join”. Why?
Thanks in advance.
Pablo