[mirror-admin] rsync filtering to reduce master mirror load

dale at fedoraproject.org dale at fedoraproject.org
Sat May 7 14:59:03 EDT 2011


On Fri, 10 Apr 2009, J.H. wrote:
> Matt Domsch wrote:
>> One of the things that's bothered me for a while is that each mirror
>> syncs itself to it's upstream mirror (either a master, Tier 0, or Tier
>> 1).  But in general, content on the master mirrors changes only a
>> few times a day (generally one rawhide push, one updates/ push, one
>> pub/epel/updates/testing push).  Most of the content doesn't change.
>> And running rsync to discover that nothing has changed is expensive on
>> the upstream server - millions of stat() calls.
>> 
>> I call these the "null rsyncs".
>> 
>> In the next version of MirrorManager to roll out (hopefully today if I
>> finish working the bugs out), the MM database now keeps track of the
>> "last changed time" of each directory.  Using this, it can generate an
>> rsync FILTER RULES file (rsync --exclude-file=<somefile>), which rsync
>> then uses to reduce the full directory tree traversal, and limits it
>> only to those directory paths that have changed.
>> 
>> For example, this script:
>> 
>> #!/bin/sh
>> now=$(date -u +%s)
>> yesterday=$((now - (24 * 60 * 60)))
>> wget -O - \
>>   "http://localhost/mirrormanager/rsyncFilter?categories=Fedora%20Linux&since=$yesterday&stripprefix=pub/fedora" 
>> \
>>   2>/dev/null
>> 
>> 
>> returns an rsync filter rules file that looks like:
>
>
> So this solves the problem for effectively the 'tier 0' or 'tier 1' mirrors, 
> and the few people who are still syncing directly from Fedora.  I would love, 
> and I'm sure I'm not alone in this, the ability (maybe through report_mirror) 
> that when a tier [01] completes a sync that it can report, get discovered, 
> something where it's at in it's update schedule.  This would then allow tier 
> [n+1] mirrors to add a small change to your url above to something like:
>
> https://admin.fedoraproject.org/mirrormanager/rsyncFilter?categories=Fedora%20Linux&since=$yesterday&stripprefix=pub/fedora&upstream=<upstream 
> base url like mirrors.kernel.org>
>
> And the tier[n+1] mirrors then have the ability to gain an rsync list custom 
> to where they are syncing from.  I would be more than happy to mod my rsync 
> script, post it back here, in some form that could take advantage of this 
> should something get modified.
>
> Something like this would really help the larger mirrors, cut rsync times 
> down and likely help keep people better in sync.
>
> Just my $0.02
>
> - John 'Warthog9' Hawley
> Chief Kernel.org Administrator
>
> --

Hi, I was wondering if any progress was ever made on John's suggestion
to allow querying for changes of arbitrary upstream mirrors.

I also wonder how this has worked out. Are many mirrors using this
feature, and has been helpful to the upstreams? It still seems like a good idea.

I'm looking into starting another mirror, and decided to clean up old
code I used for a mirror before and place it on github. It currently
uses the rsyncFilter above, which isn't really safe for a tier 2 mirror.

  https://github.com/dlbewley/mirror-fedora

--


More information about the Mirror-admin mailing list