I used to think one should never look at max_binlog_size, however last year I had a couple of interesting cases which showed that sometimes it may be very important variable to tune properly. I meant to write about it earlier but never really had a chance to do it. I have it now!
One of our customers was complaining that the database would lock up at random times and then it would go back to normal in just a few seconds. This was MySQL 5.0 running MyISAM/InnoDB mix, not heavily loaded. We used pt-stalk (at that time it was aspersa stalk) trying to figure out what is happening, however all we found was a spike in writes, many queries piled up and looking at the system process list it was quite obvious that page flush daemon was acting out. I/O Pattern was rather awkward – here is an output from pt-diskstats (again, aspersa diskstats at that time):
1 2 3 4 5 | rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt wr_s wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt busy in_prg 24.0 13.0 0.2 0% 1.4 59.3 1089.2 195.8 104.1 95% 100.1 91.9 100% 1 14.0 8.0 0.1 0% 0.1 5.4 10.0 246.4 1.2 97% 0.0 2.0 8% 0 36.0 9.3 0.2 0% 0.2 4.2 6.0 109.3 0.3 93% 0.0 0.2 15% 0 12.0 8.7 0.1 0% 0.1 4.2 6.0 101.3 0.3 92% 0.0 0.0 5% 0 |
You can see that (a) reads are affected – response time dropped significantly and (b) the reason appears to be enormous amount of writes. For a while I was really puzzled, looking for answers in MySQL status counters, innodb status, memory info and what not. It seemed that the problem was gone before pt-stalk would kick-off and all I could see was symptoms. Indeed the case it was. I was not allowed to run pmp or oprofile on that system due to potential performance impact. However, somehow I have noticed that pt-stalk kick-off time matched that of binary logs modification time:
1 2 3 4 5 | -rw-r--r-- 1 root root 91 Nov 2 15:15 2011_11_02_15_15_13-trigger -rw-r--r-- 1 root root 91 Nov 2 16:17 2011_11_02_16_17_20-trigger -rw-r--r-- 1 root root 91 Nov 2 17:38 2011_11_02_17_38_22-trigger -rw-r--r-- 1 root root 91 Nov 3 08:36 2011_11_03_08_36_00-trigger ... |
1 2 3 4 5 6 | -rw-rw---- 1 mysql mysql 1073742171 Nov 2 15:15 /var/lib/mysqllogs/db1-bin-log.003229 -rw-rw---- 1 mysql mysql 1073742976 Nov 2 16:17 /var/lib/mysqllogs/db1-bin-log.003230 -rw-rw---- 1 mysql mysql 1073742688 Nov 2 17:38 /var/lib/mysqllogs/db1-bin-log.003231 -rw-rw---- 1 mysql mysql 1073742643 Nov 2 20:06 /var/lib/mysqllogs/db1-bin-log.003232 -rw-rw---- 1 mysql mysql 1073742974 Nov 3 00:10 /var/lib/mysqllogs/db1-bin-log.003233 -rw-rw---- 1 mysql mysql 1073742566 Nov 3 08:35 /var/lib/mysqllogs/db1-bin-log.003234 |
What is happening here?
Apparently, every time current binary log gets close to max_binlog_size in size, new binary log is created and, if expire_logs_days is not zeroish, log files older than expire_logs_days are removed. &LOCK_log mutex is held for the duration of the binary log removal and if your file system is slow at deleting files (both customers systems ran on ext3, which can be very slow deleting large files), then write operations in MySQL will suffer since that very same mutex is used inside MYSQL_LOG::write() call. Even though it only lasts 1-2s, it still gives users unstable response time – here’s how that looks on one of the customer system – this one was doing ~10k qps at a time:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | # mysqladmin ext -i1 | grep Threads_running ... | Threads_running | 2 | | Threads_running | 2 | | Threads_running | 2 | | Threads_running | 2 | | Threads_running | 3 | | Threads_running | 3 | | Threads_running | 2 | | Threads_running | 2 | | Threads_running | 2 | | Threads_running | 3 | | Threads_running | 5 | | Threads_running | 4 | | Threads_running | 3 | | Threads_running | 2 | | Threads_running | 3 | | Threads_running | 108 | | Threads_running | 174 | | Threads_running | 5 | | Threads_running | 2 | | Threads_running | 2 | | Threads_running | 3 | | Threads_running | 2 | |
A small blip, yet over 100 users are backlogged for a couple of seconds – not great user experience, considering this happens every 10-15 minutes.
Solution?
Here’s what we did: we have reduced the size of binary log file from default 1GB (some systems have it set to 100MB in their default my.cnf) down to 50MB and we never saw this problem ever again. Now the files were removed much more often, they were smaller and didn’t take that long to remove.
Note that on ext4 and xfs you should not have this problem as they would remove such files much faster.
Aurimas,
Running purge of binary logs under the lock looks like a very bad idea. Unlike creating new log file it is not something what really needs to be serialized. I wonder if it is fixed in later MySQL version or bug report is due ?
but is faster rotating few huge (1G) files OR many small (50M) files?
for example, when max_binlog_size = 1G and expire_logs_days=14 we have 200 files. With 100M it’s 2000 files and 50M => 4000 files. It could be slow to. it’s question of benchmark.
Peter,
It seems this was fixed in MySQL 5.1.43 (as well as 5.5.1) as part of bug #49536 fix. I didn’t have a chance to verify, but looking at the source code it should not be holding LOCK_log for purging. So it seems I just have too many customers who are still on 5.0 or early 5.1 versions 🙂
Martin, if you care more about how long the whole process will take as opposed to what the impact is going to be, then, assuming that you purge many binary log files at once, indeed it’s not clear what is going to take longer and fewer large files could be faster.
However, in this case I am referring to purging that happens automatically every time a new binary log is created. Smaller files means purging will happen more often. Also, if you do happen to purge many binary log files, small binary logs is also an advantage because the lock is held separately for each file, ergo smaller files means purging has lesser impact on threads that are writing at the same time.
And another Q, have you /var/lib/mysqllogs/ at dedicated disk array / filesystem, independent from disk for mysql data? It can make a difference.
Martin, I _think_ in the first case it was a separate disk, though the other customer had both data and binlogs in the same disk. I think the only difference there is, is the speed at which the underlying disk will complete the task, lock is held regardless.