[mirror-admin] rsync filtering to reduce master mirror load
dale at fedoraproject.org
dale at fedoraproject.org
Sat May 7 14:59:03 EDT 2011
On Fri, 10 Apr 2009, J.H. wrote:
> Matt Domsch wrote:
>> One of the things that's bothered me for a while is that each mirror
>> syncs itself to it's upstream mirror (either a master, Tier 0, or Tier
>> 1). But in general, content on the master mirrors changes only a
>> few times a day (generally one rawhide push, one updates/ push, one
>> pub/epel/updates/testing push). Most of the content doesn't change.
>> And running rsync to discover that nothing has changed is expensive on
>> the upstream server - millions of stat() calls.
>>
>> I call these the "null rsyncs".
>>
>> In the next version of MirrorManager to roll out (hopefully today if I
>> finish working the bugs out), the MM database now keeps track of the
>> "last changed time" of each directory. Using this, it can generate an
>> rsync FILTER RULES file (rsync --exclude-file=<somefile>), which rsync
>> then uses to reduce the full directory tree traversal, and limits it
>> only to those directory paths that have changed.
>>
>> For example, this script:
>>
>> #!/bin/sh
>> now=$(date -u +%s)
>> yesterday=$((now - (24 * 60 * 60)))
>> wget -O - \
>> "http://localhost/mirrormanager/rsyncFilter?categories=Fedora%20Linux&since=$yesterday&stripprefix=pub/fedora"
>> \
>> 2>/dev/null
>>
>>
>> returns an rsync filter rules file that looks like:
>
>
> So this solves the problem for effectively the 'tier 0' or 'tier 1' mirrors,
> and the few people who are still syncing directly from Fedora. I would love,
> and I'm sure I'm not alone in this, the ability (maybe through report_mirror)
> that when a tier [01] completes a sync that it can report, get discovered,
> something where it's at in it's update schedule. This would then allow tier
> [n+1] mirrors to add a small change to your url above to something like:
>
> https://admin.fedoraproject.org/mirrormanager/rsyncFilter?categories=Fedora%20Linux&since=$yesterday&stripprefix=pub/fedora&upstream=<upstream
> base url like mirrors.kernel.org>
>
> And the tier[n+1] mirrors then have the ability to gain an rsync list custom
> to where they are syncing from. I would be more than happy to mod my rsync
> script, post it back here, in some form that could take advantage of this
> should something get modified.
>
> Something like this would really help the larger mirrors, cut rsync times
> down and likely help keep people better in sync.
>
> Just my $0.02
>
> - John 'Warthog9' Hawley
> Chief Kernel.org Administrator
>
> --
Hi, I was wondering if any progress was ever made on John's suggestion
to allow querying for changes of arbitrary upstream mirrors.
I also wonder how this has worked out. Are many mirrors using this
feature, and has been helpful to the upstreams? It still seems like a good idea.
I'm looking into starting another mirror, and decided to clean up old
code I used for a mirror before and place it on github. It currently
uses the rsyncFilter above, which isn't really safe for a tier 2 mirror.
https://github.com/dlbewley/mirror-fedora
--
More information about the Mirror-admin
mailing list