<div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote">On 25 February 2014 23:31, Chris Schanzle <span dir="ltr">&lt;<a href="mailto:schanzle@nist.gov" target="_blank">schanzle@nist.gov</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="">On 02/25/2014 07:14 PM, Stephen John Smoogen wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

We are experiencing a problem with the download servers peaking out nfs traffic with our NetApp. We are trying to lower this and looking at some mirrors that are rsyncing from the server over a 100 times a day.<br>

</blockquote>

<br></div>

Thanks for investigating and trying to make it better!<br>

<br>

<br>

I&#39;d like to share some data, ask a few questions, offer suggestions, and consider some ideas.<br>

<br>

<br>

I notice it often takes a long time for rsync &#39;receiving file list&#39; even when no files are transferred. The norm this month is 15 to 25 minutes, with numerous occasions 40 mins, an occasional &gt;60 mins, and a few ~120 mins.   That implies metadata lookups are the bottleneck.  E.g., I often see rsync --stats like:<br>


<br>

    sent 1.52K bytes  received 29.40M bytes  14.32K bytes/sec<br>

<br>

File data often transfers @ 2 to 4 MB/sec, so both our and your network connection is not dramatically overloaded.  Latency (ping) is consistently ~85ms.<br>

<br></blockquote><div><br></div><div>That is what I am seeing. Basically we were allowing 25 rsyncs per host which was working pretty well but recently we now have 3 rsyncs which get data and 22 ones which are slowly working through lstat data.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

I&#39;m surprised that the download servers use NFS, as attribute caching of inode data would be so critical to rsync metadata lookup performance.  With NFS you just toss it away after a specified time passes.  Yet to keep reasonably consistent with updates, you can&#39;t tune the NFS clients (download servers) attribute caching too high.<br>


<br>

Why aren&#39;t read-only block level devices (e.g., ISCSI) or GFS2 used on the download servers?<br>

<br></blockquote><div><br></div><div>We don&#39;t use iscsi because shared iscsi even on read only could go really wankers in the past. GFS2 isn&#39;t used because this setup was before we did that and it has been a &#39;don&#39;t make mirrors lives harder because we want to play with new toys&#39; side of things. Both of those may have changed but every time I read up on it the dates on horror stories don&#39;t get further in the past.</div>

<div><br></div><div>Our storage has been various netapps over the years. We did this because we need a lot of SAS disks and we don&#39;t have the ability to change out the 1-2 disks that fail a week from that amount of disks. It also allowed us to share the data over the various other servers we have internal and external with snapmirror. When trying to do the mirrors on local disks we got better in some stats but were running into problems with keeping up with failed disks and keeping stuff in sync (box x out of n would always not get y package when the others and that would be the one server every dns picks).</div>

<div><br></div><div>My attempts at shared iscsi read-only storage were a while ago but ended up with some amazingly corrupted data.</div><div><br></div><div>Our layout is 5 servers in one colocation and 3 servers in another colocation.  I think we may have some internal mirrors also for engineers and Red Hat TAMs.  These are all kept in sync using the netapp snapmirror which cuts down our inter-colocation bandwidth from what it was when we were doing rsync between them. This was swamping network links at times.</div>

<div><br></div><div>You are correct, metadata is the largest amount of work from the rsync services. We are on more expensive SAS drives because when we were on SATA drives the CPU load and latency were more than quadruple what we get on equivalent number of SAS drives. One of the problems is that we may be having is that a couple of months ago we moved to a new Netapp which is shared storage which can make sure that it can&#39;t keep metadata in its cache as well. Red Hat Storage is looking at ways we can be moved to a more dedicated &#39;head&#39; to see if that will fix that. </div>

<div><br></div><div>After that it is more about trying to get NFS client to cache more of the metadata if possible.</div><div><br></div><div>Thanks for the tips you had and I hope that clarifies why and where we are currently.</div>

<div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

Locally, my mirror uses a local filesystem with vfs_cache_pressure tuned down to ensure all inodes stay in RAM. This dramatically reduces the random I/O it requires to look up inode data.  E.g., it takes ~1.5 sec to &quot;du -sh&quot; our fedora repo (715G, ~700k objects, some things are excluded).<br>


<br>

Experience has taught me the *last* thing a file server should cache is files: it should cache metadata.  File data can stream off fairly efficiently with sequential I/O.<br>

<br>

I dream of a time when it will be commonplace to store filesystem metadata on a separate, reliable, fast random I/O (today that means mirrored SSD) device.<br>

<br>

It would be nifty to read about FedoraProject&#39;s config and the &quot;lessons learned&quot; (especially the mistakes).  It would be very helpful for others doing similar work!<br>

<br>

Regards,<br>

Chris<br>

<br>

--<br>

</blockquote></div><br><br clear="all"><div><br></div>-- <br><div dir="ltr">Stephen J Smoogen.<br><br></div>

</div></div>