Impact of the sort buffer size in MySQL

The parameter sort_buffer_size is one the MySQL parameters that is far from obvious to adjust. It is a per session buffer that is allocated every time it is needed. The problem with the sort buffer comes from the way Linux allocates memory. Monty Taylor (here) have described the underlying issue in detail, but basically above 256kB the behavior changes and becomes slower. After reading a post from Ronald Bradford (here), I decide to verify and benchmark performance while varying the size of the sort_buffer. It is my understanding that the sort_buffer is used when no index are available to help the sorting so I created a MyISAM table with one char column without an index:

CREATE TABLE `sorttest` (
  `data` char(30) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1

CREATE TABLE `sorttest` (

`data` char(30) DEFAULT NULL

) ENGINE=MyISAM DEFAULT CHARSET=latin1

and I inserted 100k rows with this simple script:

#!/bin/bash

NUMROW=100000
COUNT=0
while [ "$NUMROW" -gt "$COUNT" ]
do
    UUID=`uuidgen`
    mysql test -e "insert into sorttest value ('$UUID');"
    let "COUNT=COUNT+1"
done

#!/bin/bash

NUMROW=100000

COUNT=0

while [ "$NUMROW" -gt "$COUNT" ]

UUID=`uuidgen`

mysql test -e "insert into sorttest value ('$UUID');"

let "COUNT=COUNT+1"

done

I know, I could have used the uuid() function of MySQL. For the benchmark, I used an old PII 350 MHz computer, I think for such CPU bound benchmarks, an old computer is better, if there are small differences, they’ll be easier to observe. I varied the sort_buffer_size by steps of 32 KB and I recorded the time required to perform 12 queries like ‘select * from sorttest order by data limit 78000,1’ with, of course, the query cache disabled. I also verified that during the whole process, the computer never swapped and I pre-warmed the file cache before the benchmark by doing “alter table sorttest engine=myisam;”. The script used for the benchmark is the following:

#!/bin/bash

for i in `seq 1 1000`
do
        START=`date +%s.%N`
        OUT=`mysql -e "set session sort_buffer_size=32*1024*$i;select * from sorttest order by data limit 78000,1;show session status like 'Sort_merge_passes';select * from sorttest order by data limit 78000,1;select * from sorttest order by data limit 78000,1;select * from sorttest order by data limit 78000,1;select * from sorttest order by data limit 78000,1;select * from sorttest order by data limit 78000,1;select * from sorttest order by data limit 78000,1;select * from sorttest order by data limit 78000,1;select * from sorttest order by data limit 78000,1;select * from sorttest order by data limit 78000,1;select * from sorttest order by data limit 78000,1;select * from sorttest order by data limit 78000,1;" test`
        END=`date +%s.%N`
        MERGE=`echo $OUT | cut -d' ' -f6`
        TIME=`echo "$END - $START" | bc`
        echo "$i $MERGE $TIME"
done

#!/bin/bash

for i in `seq 1 1000`

START=`date +%s.%N`

OUT=`mysql -e "set session sort_buffer_size=32*1024*$i;select * from sorttest order by data limit 78000,1;show session status like 'Sort_merge_passes';select * from sorttest order by data limit 78000,1;select * from sorttest order by data limit 78000,1;select * from sorttest order by data limit 78000,1;select * from sorttest order by data limit 78000,1;select * from sorttest order by data limit 78000,1;select * from sorttest order by data limit 78000,1;select * from sorttest order by data limit 78000,1;select * from sorttest order by data limit 78000,1;select * from sorttest order by data limit 78000,1;select * from sorttest order by data limit 78000,1;select * from sorttest order by data limit 78000,1;" test`

END=`date +%s.%N`

MERGE=`echo $OUT | cut -d' ' -f6`

TIME=`echo "$END - $START" | bc`

echo "$i $MERGE $TIME"

done

which in addition to output the total time, output the number of Sort_merge_passes, which will be useful to interpret the results. The figure below shows a graphical representation of the results.

The first we can notice by looking at the graph is that the expected correspondence between the time for the queries and the number of sort merge passes. For the small values of the sort buffer size, below 440KB, there are many sort merge passes and the time for the queries hover around 18s. Above 440K, as the sort merge passes drops to 1, there is a large drop of the time for the queries below 14s. Then, as the sort buffer size is further risen, the performance gain is negative up to the point, around 6.4MB where no more sort merge pass are required and then, the time for the queries loses all dependency over the sort buffer size. I am still trying to figure out why the number of sort merge passes felt to zero at 6.4MB since the total size of the table is less than 3MB. That seems to be a pretty high overhead per row (~37 bytes) but if the structure has a few pointers, that can add up to such amount of bytes pretty quickly.

The important point here, at least for the Linux, glibc and MySQL versions I used and for the test I did, there doesn’t seem to be an observable negative impact of the glibc memory allocation threshold at 256KB. I’ll try to find ways to repeat this little experiment with the other per session buffers just to confirm the findings.

OS: Ubuntu 10.04 LTS, Linux test1 2.6.32-21-generic-pae #32-Ubuntu SMP Fri Apr 16 09:39:35 UTC 2010 i686 GNU/Linux
MySQL: 5.1.41-3ubuntu12.6-log

P.S.: Gnumerics for graphs is so much better than OpenOffice Calc.

11 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Morgan Tocker

13 years ago

I think it would be interesting to run a similar test on data that is at different levels of presorted nature. i.e.

100% unsorted – (data is stored in reverse order)
75% unsorted
50% unsorted
25% unsorted
0% unsorted – (everything is already in the right order)

A CPU should be able to sort millions of rows/second if they are more or less sorted. My understanding of most data is that it arrives in approximately sorted order (order by time, order by check number).

Peter Zaitsev

Admin

13 years ago

Yves,

Thanks. I wanted to do detailed test for long time. I looked at it years ago and had similar result:
http://www.mysqlperformanceblog.com/2007/08/18/how-fast-can-you-sort-data-with-mysql/

My guess though is it is CPU cache related not memory allocation as if we’re sorting 100000 rows mmap() memory allocation should be taking relatively small portion. It is very interesting to compare the graph to one on modern CPU and see if there is any difference.

I’d also really like to see 2 more things – whenever sorting by short columns (say INT) is same and whenever the sorting with storing the data (instead of data pointer) is same.

Baron Schwartz

13 years ago

I think a more common scenario I see is something like this (which I saw last week): sort_buffer_size = 128M, queries are all sorting 10 to 20 rows. What is the performance difference then, between 128M sort_buffer_size and, say, 128k? And what happens when there is high concurrency, so all those malloc() calls are happening at the same time?

I like gnuplot even better than gnumeric, but gnumeric is definitely better than openoffice 😉

Wagner Bianchi

13 years ago

I will appreciate to see the benchmark test proposed by Baron. I have some situations in which one I used to tune sort_buffer_size with 128M value to achieve better values on its status variables. 😉

Yves Trudeau

Author

13 years ago

Thanks all for the propositions, I’ll see what I can do to improve the benchmark.

@Baron Schwartz, yes, I use gnuplot a lot too but a spreadsheet is handsome to manipulate the data.

Ivan

13 years ago

Question – what factors (i.e. sort_buffer_size) affect the performance of Repair By Sorting when using alter table for large inserts?

Related question, assuming large tables, how would you determine the trade-off between using alter tables enable/disable keys vs. just taking the hit for slower inserts? e.g. If you are inserting 1M rows into a table with 300M rows, would you disable keys or not?

In my testing it was originally faster to alter table, but now it seems to have fallen off a cliff as the repair table is taking very long, and IÂ´m not sure what variable to change to fix that (I know it’s probably memory of some kind, but which?).

Thanks for any response.

Apachez

13 years ago

What about if you double inserted rows from 100k to 200k?

Will the drop go from 6.4MB to 12.8MB?

matt sporleder

12 years ago

I just did some similar testing. If you have a known small query, it’s worth doing a SET SESSION sort_buffer_size=$smallval; SELECT foo;

Tim Turner

10 years ago

What process did you use to create the graphs? Is this something within mysql?

Tim

Yves Trudeau

Author

10 years ago

The graph was made with gnumeric.

Adam

10 years ago

regarding: ” I am still trying to figure out why the number of sort merge passes felt to zero at 6.4MB since the total size of the table is less than 3MB. ”
At https://dev.mysql.com/doc/refman/5.0/en/temporary-files.html it states

(length of what is sorted + sizeof(row pointer)) * number of matched rows * 2

Adam

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Impact of the sort buffer size in MySQL

Related

Related Blog Articles

RECOMMENDED ARTICLES

Seamless Table Modifications: Leveraging pt-online-schema-change for Online Alterations

Securing Your MySQL Database: Essential Best Practices

MySQL 8.4 First Peek

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Impact of the sort buffer size in MySQL

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Seamless Table Modifications: Leveraging pt-online-schema-change for Online Alterations

Securing Your MySQL Database: Essential Best Practices

MySQL 8.4 First Peek

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation