[mirror-admin] Having to throttle back rsync on download servers

Wed Mar 5 13:16:11 EST 2014

On 02/26/2014 11:11 AM, Stephen John Smoogen wrote:
> My attempts at shared iscsi read-only storage were a while ago but ended up with some amazingly corrupted data.

I realized later that shared read-only block devices can work only if *all* users of the filesystem are read-only.  If you had a tight window of upstream modifications, in theory it would be possible to have the read-only clients unmount the filesystem (flushing caches), server performs updates & remounts read-only, then clients remount, but that is obviously too disruptive for what should be a continuously operational download server.

> That is what I am seeing. Basically we were allowing 25 rsyncs per host which was working pretty well but recently we now have 3 rsyncs which get data and 22 ones which are slowly working through lstat data.
...
> After that it is more about trying to get NFS client to cache more of the metadata if possible.

Have you tried cranking acreg(min,max) way up to cache file attributes much longer?  I'm thinking a drastic change from the default one minute (or less) to at least 15 to 30 minutes, perhaps an hour or more.  Keeping acdir(min,max) at low defaults will allow rsync to find new directory entries, which rsync clients will use to find new files.  However, cache attributes for too long and rsyncd may (unverified) give stale stat() info on files modified (not created) on an update push, such as repodata/repomd.xml.

Since NFS uses the Slab to store cached inode data from NFS servers, having enough RAM and tuning vm.vfs_cache_pressure down to ensure caching all possible NFS entries, lest they get pushed out before they expire or if the system prefers to cache file data over precious inode data.

If increasing attribute caching can't work, the only other solution is local storage.  While you claim this isn't suitable due to high disk failure rates, I can only guess you got a bad batch of drives or you pummeled them into an early grave by having too little RAM and/or didn't tune vfs_cache_pressure to reduce seeking for inode lookups.  You could continue to use the NFS storage as your 'master' back-end storage on each download server and use rsync to keep the local storage current.  Perhaps with a little crafty intelligence, if local storage falls over, you could revert to offering the NFS storage to rsyncd.

I had pie-in-the-sky ideas of suggesting to get a few Samsung 1TB SSD's for each server for ~$500 each.  Even a 4-pack is not outrageous to suggest...configure as RAID10, giving the needed storage capacity and redundancy.  Since this application is read heavy (mount with noatime,nodiratime), they should live a long life and give outstanding I/O performance.

> Thanks for the tips you had and I hope that clarifies why and where we are currently.

Thank you much!

--