[mirror-admin] rsync filtering to reduce master mirror load
Matt Domsch
Matt_Domsch at dell.com
Tue Apr 7 09:14:18 EDT 2009
One of the things that's bothered me for a while is that each mirror
syncs itself to it's upstream mirror (either a master, Tier 0, or Tier
1). But in general, content on the master mirrors changes only a
few times a day (generally one rawhide push, one updates/ push, one
pub/epel/updates/testing push). Most of the content doesn't change.
And running rsync to discover that nothing has changed is expensive on
the upstream server - millions of stat() calls.
I call these the "null rsyncs".
In the next version of MirrorManager to roll out (hopefully today if I
finish working the bugs out), the MM database now keeps track of the
"last changed time" of each directory. Using this, it can generate an
rsync FILTER RULES file (rsync --exclude-file=<somefile>), which rsync
then uses to reduce the full directory tree traversal, and limits it
only to those directory paths that have changed.
For example, this script:
#!/bin/sh
now=$(date -u +%s)
yesterday=$((now - (24 * 60 * 60)))
wget -O - \
"http://localhost/mirrormanager/rsyncFilter?categories=Fedora%20Linux&since=$yesterday&stripprefix=pub/fedora" \
2>/dev/null
returns an rsync filter rules file that looks like:
--
Matt Domsch
Linux Technology Strategist, Dell Office of the CTO
linux.dell.com & www.dell.com/linux
--
More information about the Mirror-admin
mailing list