[mirror-admin] rsync filtering to reduce master mirror load

J.H. warthog9 at kernel.org
Fri Apr 10 14:17:27 EDT 2009


Matt Domsch wrote:
> One of the things that's bothered me for a while is that each mirror
> syncs itself to it's upstream mirror (either a master, Tier 0, or Tier
> 1).  But in general, content on the master mirrors changes only a
> few times a day (generally one rawhide push, one updates/ push, one
> pub/epel/updates/testing push).  Most of the content doesn't change.
> And running rsync to discover that nothing has changed is expensive on
> the upstream server - millions of stat() calls.
> 
> I call these the "null rsyncs".
> 
> In the next version of MirrorManager to roll out (hopefully today if I
> finish working the bugs out), the MM database now keeps track of the
> "last changed time" of each directory.  Using this, it can generate an
> rsync FILTER RULES file (rsync --exclude-file=<somefile>), which rsync
> then uses to reduce the full directory tree traversal, and limits it
> only to those directory paths that have changed.
> 
> For example, this script:
> 
> #!/bin/sh
> now=$(date -u +%s)
> yesterday=$((now - (24 * 60 * 60)))
> wget -O - \
>   "http://localhost/mirrormanager/rsyncFilter?categories=Fedora%20Linux&since=$yesterday&stripprefix=pub/fedora" \
>   2>/dev/null
> 
> 
> returns an rsync filter rules file that looks like:


So this solves the problem for effectively the 'tier 0' or 'tier 1' 
mirrors, and the few people who are still syncing directly from Fedora. 
  I would love, and I'm sure I'm not alone in this, the ability (maybe 
through report_mirror) that when a tier [01] completes a sync that it 
can report, get discovered, something where it's at in it's update 
schedule.  This would then allow tier [n+1] mirrors to add a small 
change to your url above to something like:

https://admin.fedoraproject.org/mirrormanager/rsyncFilter?categories=Fedora%20Linux&since=$yesterday&stripprefix=pub/fedora&upstream=<upstream 
base url like mirrors.kernel.org>

And the tier[n+1] mirrors then have the ability to gain an rsync list 
custom to where they are syncing from.  I would be more than happy to 
mod my rsync script, post it back here, in some form that could take 
advantage of this should something get modified.

Something like this would really help the larger mirrors, cut rsync 
times down and likely help keep people better in sync.

Just my $0.02

- John 'Warthog9' Hawley
Chief Kernel.org Administrator

--


More information about the Mirror-admin mailing list