[mirror-admin] Inode caching [was: Re: ERROR: chroot failed for fedora-web]

Chris Schanzle schanzle at nist.gov
Wed Jan 14 12:17:09 EST 2009


On 01/13/2009 08:48 PM, Matt Domsch wrote:
> a) various kernel tunables to keep more NFS inodes and directory trees
>    in cache on the server.

Funny this has come up, as in the last couple weeks I've been 
investigating this issue and noticed a huge benefit on my local mirror / 
file server to NOT cache file data, as ironic as that sounds, but to 
cache inode data.  After doing some tweaking, I can only imagine the 
seeking that a public rsync server would get without preferring to cache 
inode data.  And when I connect to a mirror and see rsync (with -vP 
options) taking seconds to count by the hundreds, but yet will transfer 
files at > MB/sec speeds, I know it's inode cache is not working 
optimally, or the server needs more RAM.

The Linux kernel tunable is /proc/sys/vm/vfs_cache_pressure.  It's a 
balance knob, centered at 100, to equally balance caching of inode 
versus file data, with lower numbers preferring to cache inode data.  
E.g., near the extreme:

  echo 10 > /proc/sys/vm/vfs_cache_pressure

or add to /etc/sysctl.conf for persistence across reboots:

  vm.vfs_cache_pressure = 10

Monitor inode/dnode cache usage in from Slab in /proc/meminfo.  Cached 
file data is listed as "Cached".  For more info, see 
/usr/share/doc/kernel-doc-*/Documentation/filesystems/proc.txt

Caution: Like all things in life, too much of a good thing is bad.  
Twisting the knob too low (e.g., all the way to zero) and loading up the 
inode cache via "du -sh" on a 3.4 million file filesystem brought my 4GB 
x86_64 server to a near freeze.  Character echo through ssh took about 
two minutes, even with the "du" stopped.  As soon as I got 
vfs_cache_pressure up to 10, system was normal again.  The OOM killer 
was kicking in (dmesg | grep "Out of memory"), knocking out httpd 
processes.  At a vfs_cache_pressure value of 10, my Slab worked up to 
3.6/3.7GB and was managed without adverse side-effects.

It's worth noting a 4GB system cannot cache 3.4 million inodes in RAM.

I also suspect increasing the device readahead buffer with 'blockdev 
--setra NN DEVICE' would be beneficial as well, to reduce seeking back 
to read more file data while streaming data out the network.  But this 
caching essentially conflicts with vfs_cache_pressure; I'm not sure what 
instruments can be used to determine the right balance.

Please share your experience of tuning parameters in our mirroring 
application.

-Chris

--


More information about the Mirror-admin mailing list