Storing MySQL Binary logs on NFS Volume

There is a lot of discussions whenever running MySQL storing data on NFS is a good idea. There is a lot of things for and against this and this post is not about them.
The fact is number of people run their databases on NetApp and other forms of NFS storage and this post is about one of discoveries in such setup.

There are good reasons to have binary logs on NFS volume – binary logs is exactly the thing you want to survive the server crash – using them you can do point in time recovery from backup.

I was testing high volume replication today using Sysbench:

sysbench --test=oltp --oltp-table-size=10000000 --db-driver=mysql --mysql-user=root --mysql-db=sbsmall --init-rng=1 --max-requests=100000000 --max-time=600 --oltp-test-mode=nontrx --oltp-nontrx-mode=update_nokey --num-threads=8 run

1	sysbench --test=oltp --oltp-table-size=10000000 --db-driver=mysql --mysql-user=root --mysql-db=sbsmall --init-rng=1 --max-requests=100000000 --max-time=600 --oltp-test-mode=nontrx --oltp-nontrx-mode=update_nokey --num-threads=8 run

On this box I got around 12.000 of updates/sec which is not the perfect number, though it mainly was because of contention issues in MySQL 5.0 rather than any NAS issues.
This number was reachable even with binary log stored on NFS volume. This number is for sync_binlog=0 and innodb_flush_log_at_trx_commit=2

I noted however if I enable replication – connect the slave to this box the throughput on the Master drops to about 2800 updates sec…. which is very close to the magic number how many network roundtrips per second I can get over 1Gb link. It was even more interesting when that. If I would pause replication for prolonged period of time and let few GB of binary logs to accumulate the performance on Master will be high even with replication running, but it will slow down as soon as IO thread on the slave is caught up with master.

When I moved the Binary logs to the local storage I got very similar performance but there have been no degradation when replication is enabled.

I have not checked in details why this could be the case but I guess there is something which requires a network roundtrip when the binary log is written at the same time as slave-feeding thread is reading it.

I’d be curious to know if someone else can observe such behavior and if there is an NFS tuning which can be done to avoid it or if we need to fix MySQL

19 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Patrick Casey

13 years ago

Is there file system level locking going on? I don’t have the source in front of my, but its entirely plausible that:

master can write away w/o locking into the binary log
slave reader thread needs to get a lock through to read (so he doesn’t read partial records off the end of the log if he reads while master is writing)

so having a slave in place adds a set of locking overhead and means that sometimes when the master decides to write, he won’t be able to b/c the slave reader has a lock and hence he has to wait, reducing master throughput

My file system knowledge is way outdated, but I *think* NFS uses an external protocol for locking that requires even more roundtrips since I think the original NFS spec didn’t include a locking mechanic? That’d make the locking cost a lot higher on NFS than on, say, ext3.

Peter Zaitsev

Author

13 years ago

Patrick,

I did not look at the code or traced in clearly so I do not know… though I think it is unlikely it is locking on file system level. Threads writing binary log and threads reading binary logs and sending events to the slave are part of the same process and they are synchronized by mutex.

Rob Wultsch

13 years ago

Peter,
Do you consider a Linux server having a MySQL data dir on NFS safe? I have no first hand interaction with such a setup and have read numerous ominous reports. I am curious if this is a recipe for disaster.

Mark Callaghan

13 years ago

Yet another interesting artifact to debug. AFAIK, the thread on the master that pushes data to a slave gets the binlog lock, copies events out, releases the lock and then sends data to the slave. The Google patch has a small change to avoid allocating a buffer for each event copied from the binlog as that is done when the binlog lock is held. That makes a big difference when there are many slaves per master, but you aren’t running in that setup.

This would be a good time to use http://poormansprofiler.org

I prefer to use local storage but the MySQL community has yet to implement a tool to archive binlogs (write them locally, archive them to NFS). That tool is easier to do with recent changes in mysqlbinlog and Harrison will soon write about that.

Ajit

13 years ago

One idea to relieve the pressure on the master: mount the binlogs file-system ro on a different host. Run a mysql server on that host. And use that host as a mast to the slave.

Ajit

Kristian KÃ¶hntopp

13 years ago

I do believe that the approach of using NFS for shipping binlogs is somewhat wrong. It has the advantage of shipping live, or almost live logs off the box, but it creates a severe dependency on the availability of the NFS server – let the NFS hang, and watch your database hang as well (writeable mounts should be hard,intr).

I’d rather see a mechanism that lets me specify a UNIX shell command that is being invoked after a binlog cycle. That shell command would get the name of the old, just retired binlog file as a full pathname as $1 and is being run asynchronously as a forked and disowned process. The shell command typically would be a command that will then scp the old binlog offsite.

Postgres happens to have such a mechanism for their WAL, and MySQL is sorely missing such a feature (which would be very easily implemented, btw).

Mark Callaghan

13 years ago

NFS is more than 20 years old. Why are we still concerned about NFS hangs? Does this still occur in production on modern Linux distros? By “occur” I mean that it occurs frequently enough for it to be an issue. There are lots of things that can go wrong and I won’t end up using anything if I am to avoid all failures. I know of people successfully using NFS on a large scale.

Copying binlogs only after they are done isn’t good enough for me. I want to have as much of the binlog archived elsewhere to recover from the loss of a master. The loss of a master is more frequent for me than an NFS hang. And if NFS is the problem, I can tail the binlog to remote storage using something other than NFS.

Patrick Casey

13 years ago

I’ve had NFS mount points lock up on my production servers before, mounting a RedHat NFS point onto a RedHat server.
Triggering factor is usually, but not always, some sort of network interrupt.
I back up onto NFS mount points, but I don’t run anything realtime critical on them.

From my perspective, putting the binlogs on an NFS server triples my failure domain.

I lose the database if:
1) DB server dies
2) NFS mount point dies
3) NFS server dies

I’d rather risk losing a few binlogs than increase my risk of a service interruption. Naturally, everybody’s use case varies 🙂

Kristian KÃ¶hntopp

13 years ago

Where I work and have been working in the past, NFS fails more often than anything else. Also, the recovery is usually only through a box reboot instead of service restart or even SIGHUP, so this is also the worst possible behavior.

Live binlog shipping exists as well – it is called replication (START SLAVE IO_THREAD, and STOP SLAVE SQL_THREAD, if you are really paranoid; or a time delayed slave, if MySQL were to integrate patches for that into replication after all). But a simple binlog_cycle_command, that would be really easy, and quite useful.

Mark Callaghan

13 years ago

Replication doesn’t archive the master’s binlog. Were that the case, then failing a slave to a new master without losing transactions would be trivial. Today you get to play the game of finding the offset on the new master that corresponds to the offset from the old master. Or you can use global transaction IDs from the Google patch.

Replication is also much more expensive than archiving the binlog. I want to use both but I don’t want to pay for the hardware on a slave when all that I need is a binlog archive solution.

Patrick Casey

13 years ago

Mark, I think you still have an issue of binlog corruption even if you’ve put the binlogs on an NFS share don’t you? If I lose the network interconnect or the master dies or something else “bad” happens, its not guaranteed that I have a consistent binlog on the NFS server. I might have an incomplete record at the end for example if we were in the middle of a write when the failure occurred.

So if your requirement is *absolutely no* transactions may be lost in a failure, then I don’t think remote mounting the binlogs gets you there, although its probably marginally better than just running a slave.

Mark Callaghan

13 years ago

Yes, this allows for loss of transactions. But losing transactions from the last second might be much better than losing them from the last hour. I need sync replication in MySQL to avoid that or use DRBD to make it less likely. MariaDB is working on that with Galera.

Mark Callaghan

13 years ago

I would prefer to use network attached or remote storage as the place where binlog archiving is done rather than where the binlogs are written in the first place.

Default InnoDB/MySQL doesn’t report IO latency — I think Percona Server does and I know the Facebook patch also does. I assume that binlog sync to local storage accelerated by HW RAID write cache is much faster than binlog sync to NFS. Given the lack of group commit in InnoDB that can make a huge difference.

Baron Schwartz

13 years ago

I think that people are looking for different things here:

1) get binlogs off the master to save space
2) get binlogs off the master to keep them safe

In my opinion, the best technology for item 2) is probably DRBD at the moment.

Peter Zaitsev

Author

13 years ago

Copying Binary logs after they rotate is easy enough. If you use NFS/GFS/DRBD volume for storage logs they are immediately available after Master crash which allows you to use that logs to bring Slave up to speed and have no transactions lost if you have sync_binlog=1

Indeed the local storage is a lot faster you often can get over 10000 fsync/sec with local RAID while only some 3000 with NFS other 1GB ethernet.

William Jimenez

13 years ago

What type of file system was on this NFS server? Remember that NFS sits on top of the file system residing on the server you are exporting the share from. While I agree that NFS can be tricky, these aren’t plug and play technologies so the makeup of system stack holistically has a huge impact on the performance.

NFS is one of those technologies that either performs very well or very poorly, depending on how you configure it. I understand that the point of this article is not to be advice as much as a discussion starter, so I am just adding some more color to the picture :-).

ZFS based NFS on Solaris based kernels have had very impressive results, something to look into. You won’t see anything like local or direct attached storage however….

Peter Zaitsev

Author

13 years ago

William,

In this case the point is rather the network roundtrip seems to be needed (at least in Linux client implementation) if file is to be extended. Your last sentence says it all though. In many cases people think moving to high end NFS will get far better performance than local storage.

Peter Zaitsev

Author

13 years ago

William,

William Jimenez

13 years ago

ZFS based NFS on Solaris based kernels have had very impressive results, something to look into. You won’t see anything like local or direct attached storage however….

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Storing MySQL Binary logs on NFS Volume

Related

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

Valkey/Redis: Not-So-Good Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Storing MySQL Binary logs on NFS Volume

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

Valkey/Redis: Not-So-Good Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation