[mirror-admin] rsync filtering to reduce master mirror load
J.H.
warthog9 at kernel.org
Fri Apr 10 14:17:27 EDT 2009
Matt Domsch wrote:
> One of the things that's bothered me for a while is that each mirror
> syncs itself to it's upstream mirror (either a master, Tier 0, or Tier
> 1). But in general, content on the master mirrors changes only a
> few times a day (generally one rawhide push, one updates/ push, one
> pub/epel/updates/testing push). Most of the content doesn't change.
> And running rsync to discover that nothing has changed is expensive on
> the upstream server - millions of stat() calls.
>
> I call these the "null rsyncs".
>
> In the next version of MirrorManager to roll out (hopefully today if I
> finish working the bugs out), the MM database now keeps track of the
> "last changed time" of each directory. Using this, it can generate an
> rsync FILTER RULES file (rsync --exclude-file=<somefile>), which rsync
> then uses to reduce the full directory tree traversal, and limits it
> only to those directory paths that have changed.
>
> For example, this script:
>
> #!/bin/sh
> now=$(date -u +%s)
> yesterday=$((now - (24 * 60 * 60)))
> wget -O - \
> "http://localhost/mirrormanager/rsyncFilter?categories=Fedora%20Linux&since=$yesterday&stripprefix=pub/fedora" \
> 2>/dev/null
>
>
> returns an rsync filter rules file that looks like:
So this solves the problem for effectively the 'tier 0' or 'tier 1'
mirrors, and the few people who are still syncing directly from Fedora.
I would love, and I'm sure I'm not alone in this, the ability (maybe
through report_mirror) that when a tier [01] completes a sync that it
can report, get discovered, something where it's at in it's update
schedule. This would then allow tier [n+1] mirrors to add a small
change to your url above to something like:
https://admin.fedoraproject.org/mirrormanager/rsyncFilter?categories=Fedora%20Linux&since=$yesterday&stripprefix=pub/fedora&upstream=<upstream
base url like mirrors.kernel.org>
And the tier[n+1] mirrors then have the ability to gain an rsync list
custom to where they are syncing from. I would be more than happy to
mod my rsync script, post it back here, in some form that could take
advantage of this should something get modified.
Something like this would really help the larger mirrors, cut rsync
times down and likely help keep people better in sync.
Just my $0.02
- John 'Warthog9' Hawley
Chief Kernel.org Administrator
--
More information about the Mirror-admin
mailing list