Sometimes I see people thinking about buffers as “larger is always better” so if “large” MySQL sample configuration is designed for 2GB and they happen to have 16, they would simply multiply all/most values by 10 and hope it will work well.
Obviously it does not. The least problem would be wasting memory, allocating a lot of memory for operations which do not need it. Allocating 10MB to sort 10 rows will not make it faster than allocating 1MB, both are more than enough.
However not only it may cause memory being wasted but you may see some of queries actually performing slower, and not because the system starts to swap. Generally you want buffers and other values to be sized “just right” – working with smaller data structures would improve cache locality, will make it easier for OS to manage memory as well as cold provide quite unexpected improvements.
sort_buffer_size – recently I worked with case which was running much faster with 32K sort_buffer_size, compared to 32M. The problem was memory allocation – sorting was used in corelated subquery which was executed hundreds of thousands of times but only few rows were sorted. 32K block is allocated fully in user space, while 32M block requires mmap() call and a lot of work with tables. In this extreme case performance difference was 9 seconds vs 8 minutes – 50 times. I would not call it typical but watch out. This could be called but at some extent as why would you allocate/free sort buffer if it is used by so many queries.
Here is example if someone wants to play
1 2 3 4 5 6 7 8 9 | CREATE TABLE `t1` ( `id` int(10) unsigned NOT NULL default '0', `c` char(10) default NULL, `d` int(11) default NULL, PRIMARY KEY (`id`), KEY `d` (`d`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 select sql_no_cache count(*) from t1 where c in (select sql_big_result count(*) from t1 where d=50 group by c ); |
Populate it with 100.000 rows, having 100 values with d=50 and some 10 distinct c values.
read_buffer_size and read_rnd_buffer_size – These are buffers used by MyISAM to perform reads, in different scenarios. So should we go ahead and set them to 64M ? Not really – this could result in nasty suprises with full table scan LIMIT queries as well as in some other cases. If you run SELECT * FROM LARGETABLE LIMIT 5 MyISAM will allocate the buffer and populate if fully when MySQL will read only 5 rows from the buffer and throw away the rest.
These are some examples I’m quite sure there are others. For example too large table_cache may slow your opens and closes down, too large query_cache may cause long stalls on invalidation.
Configure it smart. Larger is not always better 🙂
Do there exist some kind of graphs/tables with benchmarks between different buffersizes in mysql 4.x/5.x like the case with the graphs for mmap feature of 5.1 or if it was 5.2 in this blog?
Because I have the impression that the “demo configs” which are included with mysql are somewhat outdated (im thinking of the huge, large etc which are included with the mysql installation for demostration purposes), while my own tests (not that scientific but still :P) shows that a relation of:
sort_buffer_size = 8M
read_buffer_size = 2M
read_rnd_buffer_size = 32M
join_buffer_size = 8M
seems to be a good performance wise start to use the buffers (compared to the default values).
Or is the 32k example in this blog post just a special case of when you perform a count inside a subquery along with a count in the outer query (which is most likely not that common) ?
Apachez,
I wish it would be as simple as couple of graphs. In reality so much here depends on workload it would be only missleading to have the graphs. There are also so many variables and some of them are related, so their impact can’t be measure independently so it is truly giant task. I however did some bechmarks with 4.1 looking at performance of DBT2 benchmark under certain workloads. You can find it on our presentations page.
The configs are kind of outdated. It is however something which is just named wrong. The definition of “large” is changing every couple of years. They should be named something like “myisam-mixed-4GB” which does not change over time a lot.
But honestly speaking default values per thread buggers for “large” are already decent. I use same “sample” config file for last 3 years. I typically only modify global buffers (innodb_buffer_pool, key_buffer_size, query_cache_size) to adjust them to size of the box. Most other settings are rather adjusted looking on workload and most typically only give marginal improvement.
Do not forget you also can set values locally, so if you need 64MB sort buffer for one of your queries – you can set it before query is run.
You values I guess are close to what I typically use. I just use smaller read_rnd_buffer_size=8MB and rarely change join_buffer_size as I try to avoid queries which would use it 🙂
If you have some benchmarks data that would be great if you could share it.
The 32K case was extreme as well as others. My point was mostly to demonstrate incresing the values do not always increase performance and even can affect it negatively.
Thanks for good comment.
The benchmarks in my case was more manual like in sqlyog and when running some of my applications (which in the end I could see the throughput of the queries).
What would be nice is if there could exist some kind of “recommended” values when using mysql administrator.
I mean like a specific page in mysql administrator which based of the output from “SHOW STATUS” etc could hint the admin of which values “might” be good to use for buffers etc from a theoretical point of view (there exist a couple of formulas regarding “recommended” settings in the manual).
I know flupps for example has a couple of “recommended” configs depending of the use of the server in question.
Apachez,
I see. If you would ever script benchmark please feel free to share them. It is great to have real user workloads.
Automatic configuration advice would be great and there are actually some work is going on in this area (in the different project however).
Yes there are number of recommended configs available out where. I should probably also post some.
Thanks for doing all of these benchmarks and actually sharing the results with everyone. You’ve saved me a lot of work and I can show this stuff to my sysadmins without having to do a bunch of benchmarking and stuff myself all the time to show them which settings I need. For a lot of them, the tables used are rather simplistic, though. Results might not be the same for matching against multiple columns with different values in the same query as they are for a table with one column. The only way to know it to look at the relevant MySQL source code or run your own benchmarks on your own tables. These are a great starting point though and get the basics out of the way and it’s a big timesaver for everyone who needs to work with big databases.