How is join_buffer_size allocated?

When examining MySQL configuration, we quite often want to know how various buffer sizes are used. This matters because some buffers (sort_buffer_size for example) are allocated to their full size immediately as soon as they are needed, but others are effectively a “max size” and the corresponding buffers are allocated only as big as needed (key_buffer_size). There are many examples of this. What about join_buffer_size?

I saw a my.cnf with a 128M join_buffer_size the other day and needed to research this quickly before I gave advice. The join buffer is a special case. Unlike many of the buffers that are allocated per-thread (i.e. per-connection), this one is allocated per-join-per-thread, in special cases. A join buffer is allocated to cache rows from each table in a join when the join can’t use an index. This is because we know that the nested loop is effectively going to do a table scan on the inner table — it has to, because there’s no index. If the query joins several tables this way, you’ll get several join buffers allocated, for example, this one will have two:

select * from a_table
join b_table on b.col1 = a.col1
join c_table on c.col2 = b.col2

select * from a_table

join b_table on b.col1 = a.col1

join c_table on c.col2 = b.col2

You can see these un-indexed queries in SHOW STATUS as Select_full_join, and you want zero of them. Anyway, back to the question about how the buffer is allocated. Its meaning is actually “minimum join buffer size.” Here’s the code, in sql/sql_select.cc in 5.1.47 source:

14176 /*****************************************************************************
14177   Fill join cache with packed records
14178   Records are stored in tab->cache.buffer and last record in
14179   last record is stored with pointers to blobs to support very big
14180   records
14181 ******************************************************************************/
14182
14183 static int
14184 join_init_cache(THD *thd,JOIN_TAB *tables,uint table_count)
14185 {
... snip ...
14268   cache->length=length+blobs*sizeof(char*);
14269   cache->blobs=blobs;
14270   *blob_ptr=0;             /* End sequentel */
14271   size=max(thd->variables.join_buff_size, cache->length);
14272   if (!(cache->buff=(uchar*) my_malloc(size,MYF(0))))
14273     DBUG_RETURN(1);           /* Don't use cache */ /* purecov: inspected */
14274   cache->end=cache->buff+size;
14275   reset_cache_write(cache);
14276   DBUG_RETURN(0);
14277 }

14176 /*****************************************************************************

14177 Fill join cache with packed records

14178 Records are stored in tab->cache.buffer and last record in

14179 last record is stored with pointers to blobs to support very big

14180 records

14181 ******************************************************************************/

14182

14183 static int

14184 join_init_cache(THD *thd,JOIN_TAB *tables,uint table_count)

14185 {

... snip ...

14268 cache->length=length+blobs*sizeof(char*);

14269 cache->blobs=blobs;

14270 *blob_ptr=0; /* End sequentel */

14271 size=max(thd->variables.join_buff_size, cache->length);

14272 if (!(cache->buff=(uchar*) my_malloc(size,MYF(0))))

14273 DBUG_RETURN(1); /* Don't use cache */ /* purecov: inspected */

14274 cache->end=cache->buff+size;

14275 reset_cache_write(cache);

14276 DBUG_RETURN(0);

14277 }

On line 14271 the server decides the size of the cache: it is the greater of the join_buffer_size or the size that’s been determined to be needed. And it’s allocated all at once. So a 128M join_buffer_size is indeed very bad!

And this leads me to an interesting possibility to run the server out of memory:

set session join_buffer_size = 1 << 30; # 1GB
select * from
(select 1 union select 1) as x1
join (select 1 union select 1) as x2
join....

set session join_buffer_size = 1 << 30; # 1GB

select * from

(select 1 union select 1) as x1

join (select 1 union select 1) as x2

join....

That should try to allocate 1GB of memory for each join. If you execute this and nothing bad happens, you might be seeing this bug: http://bugs.mysql.com/55002

10 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Peter Zaitsev

Admin

13 years ago

Baron,

First about allocation – it is checked but it is NOT reported if it failed which can cause someone to set large buffer and get lower performance because it is not used.
I think MySQL should warn about such failures – print stuff in the error log file or at least have a global variable counter for allocation failures (which should stay 0 for any well configured system)

A note on crashing the server – you would do a lot better by allocating very large number of smaller buffers, for example 128K or so. If Server failed to allocate 1GB it still can likely allocate few MB needed to run most of concurrent queries successfully. If however you can’t allocate even 128K any more nothing else probably will work.

Also do not forget about over commit behavior for a lot of systems – they can let you to allocate a lot of memory but may fail when you can actually use it (access it)

dalin

13 years ago

Then what is the purpose of a setting that can only be used to waste memory?

Geoff

13 years ago

Is it correct that the allocated cache size is then the maximum of *one* row of the join data and the size specified in the configuration? As I understand, that would allow it to degrade from a Block-Nested Loop Join to a Nested Loop Join should only one row fit in cache.

Shlomi Noach

13 years ago

Baron,

Very interesting! A buffer-per-join-per query is news to me. What about deallocation, then? Do all these per-join buffers live for the duration of the connection?

Thanks

Baron Schwartz

Author

13 years ago

It’s not accurate to say it can ONLY be used for that. I explained in the blog post what the server uses it for.

Baron Schwartz

Author

13 years ago

Shlomi, buffers are usually deallocated at the end in “cleaning up” state. I haven’t checked where this one is.

Geoff, I am not sure; I did not look that closely at that code to figure out its purpose.

Shahryar Ghazi

13 years ago

I disagree with “A join buffer is allocated to cache rows from each table in a join when the join canâ€™t use an index.” because even if MySQL decides to scan all index rows and/or uses range for SELECTion it prefers to use the join buffer. Also, there is another variable that determines the max size of join buffer and that is max_join_size (no. of rows to be joined not the buffer size).

Baron Schwartz

Author

13 years ago

Shahryar, you are right. The manual (http://dev.mysql.com/doc/refman/5.0/en/nested-loop-joins.html) has the following quote: “Join buffering can be used when the join is of type ALL or index (in other words, when no possible keys can be used, and a full scan is done, of either the data or index rows, respectively), or range.”

Thanks for correcting me!

Normalex

12 years ago

size=max(thd->variables.join_buff_size, cache->length);

and the main question is how does the cache->length is being determined?
If it just waists the memory what is the point to use join_buff_size then?

There is obviously a lack of documentation about it.

Pablo

8 years ago

Hi,

I check my queries. Because I’ve a “queries without join” and I find out queries with cross join.
I’ve index in all involved columns.

The cross join increase my “Select_full_join”. Why?

Thanks in advance.

Pablo

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

How is join_buffer_size allocated?

Related

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis Replication and Auto-Failover With Sentinel Service

Seamless Table Modifications: Leveraging pt-online-schema-change for Online Alterations

Valkey/Redis: Sets and Sorted Sets

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

How is join_buffer_size allocated?

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis Replication and Auto-Failover With Sentinel Service

Seamless Table Modifications: Leveraging pt-online-schema-change for Online Alterations

Valkey/Redis: Sets and Sorted Sets

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation