I often hear suggestion to use
1 | fadvise |
system call to avoid caching in OS cache.
We recently made patch for
1 | tar |
, which supposes to create archive without polluting OS cache, as like in case with backup, you do not really expect any benefits from caching.
However working on the patch, I noticed, that
1 | fadvise |
with
1 | FADV_DONTNEED |
, does not really do what I expected (I used this call as it is often suggested for this purpose). In fact it does not prevent caching, it only releases already cached data.
And if we do
1 | man fadvise |
, it says exactly:
1 2 3 4 | FADV_DONTNEED Do not expect access in the near future. Subsequent access of pages in this range will succeed, but will result either in reloading of the memory contents from the underlying mapped file or zero-fill-in-demand pages for mappings without an underlying file. |
So it is totally fair. What we may really want is
1 | FADV_NOREUSE |
call.
1 2 | FADV_NOREUSE Access data only once. |
But… But, there is surprise. It does not work. And no wonder, there is Linux kernel source code:
1 2 3 4 5 6 7 | SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice) { ... case POSIX_FADV_NOREUSE: break; ... } |
which means that Linux kernel does nothing on fadvise call with FADV_NOREUSE.
Digging a little more on this topic, I found
http://kerneltrap.org/node/7563, where Linus Torvalds, about 3 years ago, confirms that FADV_NOREUSE is no-op operation.
Quite hopeless that it is not fixed for many years.
As for the patch for tar, I ended up with FADV_DONTNEED call after each copy of each block. Dirty, but it works, it only uses OS cache with one block size.
You can get patch there
1 | lp:~percona-dev/perconatools/tar-patch |
, it adds parameter
1 | --no-oscache |
, along with our old patch
1 | --read-rate |
to throttle read IO.
I recall something “recent” about this being fixed… but not in any kernel that would have made it through enough “enterprise” releases to be found anywhere in production….
(i could of course be wrong too 🙂
Why did not use use O_DIRECT to bypass the OS cache ?
I did something similar in the past where I patched tar to use O_DIRECT. It seemed to help quite a bit at the time.
Could you give us a rough idea of the performance enhancements?
What’s really needed is a way to say:
from here on forward, I don’t want you to cache anything.
Maybe an LD_PRELOAD that intercepts read() so that all pages are immediately fadvised away.
The reason this is important is that we don’t have time to patch EVERY binary in the world.
tar is only one of many IO heavy programs that needs patching.
What about cat, dd, rsync, scp, etc.
Kevin
I just ran across this:
http://www.enricozini.org/blog/pdo/
A LD_PRELOAD tool that appears to do exactly what you’re looking for.
Also http://code.google.com/p/pagecache-mangagement/
Could one not use O_DIRECT I/O mode which should bypass the cache?