[mirror-admin] Having to throttle back rsync on download servers

Wed Feb 26 11:11:23 EST 2014

On 25 February 2014 23:31, Chris Schanzle <schanzle at nist.gov> wrote:

> On 02/25/2014 07:14 PM, Stephen John Smoogen wrote:
>
>>
>> We are experiencing a problem with the download servers peaking out nfs
>> traffic with our NetApp. We are trying to lower this and looking at some
>> mirrors that are rsyncing from the server over a 100 times a day.
>>
>
> Thanks for investigating and trying to make it better!
>
>
> I'd like to share some data, ask a few questions, offer suggestions, and
> consider some ideas.
>
>
> I notice it often takes a long time for rsync 'receiving file list' even
> when no files are transferred. The norm this month is 15 to 25 minutes,
> with numerous occasions 40 mins, an occasional >60 mins, and a few ~120
> mins.   That implies metadata lookups are the bottleneck.  E.g., I often
> see rsync --stats like:
>
>     sent 1.52K bytes  received 29.40M bytes  14.32K bytes/sec
>
> File data often transfers @ 2 to 4 MB/sec, so both our and your network
> connection is not dramatically overloaded.  Latency (ping) is consistently
> ~85ms.
>
>
That is what I am seeing. Basically we were allowing 25 rsyncs per host
which was working pretty well but recently we now have 3 rsyncs which get
data and 22 ones which are slowly working through lstat data.

>
> I'm surprised that the download servers use NFS, as attribute caching of
> inode data would be so critical to rsync metadata lookup performance.  With
> NFS you just toss it away after a specified time passes.  Yet to keep
> reasonably consistent with updates, you can't tune the NFS clients
> (download servers) attribute caching too high.
>
> Why aren't read-only block level devices (e.g., ISCSI) or GFS2 used on the
> download servers?
>
>
We don't use iscsi because shared iscsi even on read only could go really
wankers in the past. GFS2 isn't used because this setup was before we did
that and it has been a 'don't make mirrors lives harder because we want to
play with new toys' side of things. Both of those may have changed but
every time I read up on it the dates on horror stories don't get further in
the past.

Our storage has been various netapps over the years. We did this because we
need a lot of SAS disks and we don't have the ability to change out the 1-2
disks that fail a week from that amount of disks. It also allowed us to
share the data over the various other servers we have internal and external
with snapmirror. When trying to do the mirrors on local disks we got better
in some stats but were running into problems with keeping up with failed
disks and keeping stuff in sync (box x out of n would always not get y
package when the others and that would be the one server every dns picks).

My attempts at shared iscsi read-only storage were a while ago but ended up
with some amazingly corrupted data.

Our layout is 5 servers in one colocation and 3 servers in another
colocation.  I think we may have some internal mirrors also for engineers
and Red Hat TAMs.  These are all kept in sync using the netapp snapmirror
which cuts down our inter-colocation bandwidth from what it was when we
were doing rsync between them. This was swamping network links at times.

You are correct, metadata is the largest amount of work from the rsync
services. We are on more expensive SAS drives because when we were on SATA
drives the CPU load and latency were more than quadruple what we get on
equivalent number of SAS drives. One of the problems is that we may be
having is that a couple of months ago we moved to a new Netapp which is
shared storage which can make sure that it can't keep metadata in its cache
as well. Red Hat Storage is looking at ways we can be moved to a more
dedicated 'head' to see if that will fix that.

After that it is more about trying to get NFS client to cache more of the
metadata if possible.

Thanks for the tips you had and I hope that clarifies why and where we are
currently.

> Locally, my mirror uses a local filesystem with vfs_cache_pressure tuned
> down to ensure all inodes stay in RAM. This dramatically reduces the random
> I/O it requires to look up inode data.  E.g., it takes ~1.5 sec to "du -sh"
> our fedora repo (715G, ~700k objects, some things are excluded).
>
> Experience has taught me the *last* thing a file server should cache is
> files: it should cache metadata.  File data can stream off fairly
> efficiently with sequential I/O.
>
> I dream of a time when it will be commonplace to store filesystem metadata
> on a separate, reliable, fast random I/O (today that means mirrored SSD)
> device.
>
> It would be nifty to read about FedoraProject's config and the "lessons
> learned" (especially the mistakes).  It would be very helpful for others
> doing similar work!
>
> Regards,
> Chris
>
> --
>

-- 
Stephen J Smoogen.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ale.org/pipermail/mirror-admin/attachments/20140226/58e5de6b/attachment.html>
-------------- next part --------------
--