[mirror-admin] Inode caching [was: Re: ERROR: chroot failed for fedora-web]

Thu Jan 15 20:44:43 EST 2009

Chris Schanzle (schanzle at nist.gov) wrote on 14 January 2009 12:17:
 >On 01/13/2009 08:48 PM, Matt Domsch wrote:
 >> a) various kernel tunables to keep more NFS inodes and directory trees
 >>    in cache on the server.
 >
 >Funny this has come up, as in the last couple weeks I've been 
 >investigating this issue and noticed a huge benefit on my local mirror / 
 >file server to NOT cache file data, as ironic as that sounds, but to 
 >cache inode data.  After doing some tweaking, I can only imagine the 
 >seeking that a public rsync server would get without preferring to cache 
 >inode data.  And when I connect to a mirror and see rsync (with -vP 
 >options) taking seconds to count by the hundreds, but yet will transfer 
 >files at > MB/sec speeds, I know it's inode cache is not working 
 >optimally, or the server needs more RAM.

Exactly.

 >The Linux kernel tunable is /proc/sys/vm/vfs_cache_pressure.  It's a 
 >balance knob, centered at 100, to equally balance caching of inode 
 >versus file data, with lower numbers preferring to cache inode data.  
 >E.g., near the extreme:
 >
 >  echo 10 > /proc/sys/vm/vfs_cache_pressure
 >
 >or add to /etc/sysctl.conf for persistence across reboots:
 >
 >  vm.vfs_cache_pressure = 10

We set it to 3 on the mirror. I also set it to 2 on a login/nfs
server. Works well even though the workloads are very different.

 >I also suspect increasing the device readahead buffer with 'blockdev 
 >--setra NN DEVICE' would be beneficial as well, to reduce seeking back 
 >to read more file data while streaming data out the network.  But this 
 >caching essentially conflicts with vfs_cache_pressure; I'm not sure what 
 >instruments can be used to determine the right balance.

It's only useful sometimes. If you have enough ram/light load the
filesystem cache will already have things in ram after some time
automatically, so it's not necessary to set the parameter. OTOH, if
you have a heavy load it's a disaster because very likely you'll have
to discard the extra reading before you send it to the client because
of memory pressure of the other requests. So you end up reading it
again... oops...

There can be a range where you discard old cache to serve new requests
but it lasts long enough for the request to finish; in this case it's
better to read more in advance. However it's quite dependent on the
load so it's hard to keep the setting optimized all the time. 

--