[mirror-admin] master server sync stats and recommendations
Matt Domsch
Matt_Domsch at dell.com
Tue Apr 21 13:37:25 EDT 2009
I have been examining the rsyncd log files from the master download
servers (download*.fedora.redhat.com). Data measured between
2009-04-12 04:03:01 and 2009-04-17 12:00:08 on hosts
download[1235]. (no logs for download4 are available for this run).
Total: 4.4TB via 15550 successful transfers in 5 days 8 hours by 113
hosts. 104 hosts were downloading Fedora content, the others the
historical Red Hat content.
Last year we implement mirror tiering, whereby a few Tier 1 mirrors
sync from download*.fedora.redhat.com directly, and other mirrors sync
from those Tier 1 mirrors. We list 11 Tier 1 mirrors [1]. I would
like to see the number of downstream mirrors hitting the masters
decrease by half or more, and for those mirrors to hit a Tier 1 mirror
instead.
** If you are not a Tier 1 mirror, please change your rsync scripts to
sync from a Tier 1 mirror.
There were additionally 5.4 million (yes million!) unsuccessful
transfers during this same time. I don't have the IP addresses of
these, but I'd sure like to know what's going on there.
In terms of transfer sizes, we get the following frequency breakdown
(size listed is the upper limit; if a transfer was 400 bytes, it
appears in the 1KB frequency bucket).
Size Freq %age
1KB 1248 8.03
10KB 6 0.04
100KB 190 1.22
1MB 4623 29.73
10MB 7164 46.07
100MB 1396 8.98
1GB 443 2.85
10GB 475 3.05
>10GB 5 0.03
Nearly 40% of the successful transfers are "null rsyncs" - meaning
less than 1MB of data was transferred (essentially no content
changes). It's possible that some of the 10MB bucket should be
included in the "null rsync" list too. If true, then nearly 85% of the
time, mirrors are performing "null rsyncs".
** If we can reduce the number of "null rsyncs", that will drop the
number of connection and master directory walks and related stat()s
considerably.
How often do/should mirrors sync? We're trying to get smarter about
this. Content changes on the master mirrors far less frequently than mirrors
are syncing.
- fedora-enchilada/linux/development changes at most once a day,
around 5am Eastern Standard Time US.
- fedora-enchilada/linux/updates changes once or maybe twice a day,
but not on a fixed schedule (humans involved in the posting). This
can be synced 4-6 times a day if necessary.
- fedora-enchilada/linux/releases changes about monthly. This content
is staged and then bitflipped around 10am Eastern Standard Time US
on release day.
- fedora-epel changes about monthly. Sync this at most once a day.
** I recommend if you are syncing all of fedora-enchilada, that you do
so at most 4-6 times a day (every 4-6 hours), and we'll work to
reduce even this number.
** I recommend if you are syncing fedora-epel, you sync at most once a day.
** Tier 1 mirrors can use the new rsyncFilter API, or sync the
'fullfilelist' file, to see if anything has changed. I'm still
working on enabling the rsyncFilter API for Tier 2 mirrors.
** For Tier 2 mirrors, if your mirror is syncing more than every 4
hours, you are syncing too often. I believe we can live with a
4-hour propogation delay.
** If we can signal to mirrors when new content is available, to
trigger them to sync, we can eliminate most of the "null rsyncs".
Tiering makes this more difficult, but still possible. Advise (and
code!) welcome.
I'll work with John on kernel.org's sync script, which appears to
download (not sync) the fullfilelist every 15 minutes, and compares to
what's on the mirror. That inflates their sync number considerably,
even though each transfer is only 2-4MB. If others are similarly
downloading fullfilelist and not syncing it, please adjust also.
Here's the mirrors I think are syncing too often. The #Syncs is the
number of successful syncs they had over the 5.3 days measured above.
I would expect ~30 syncs. Most of these _should_ sync to one of the
Tier 1 mirrors instead of the masters. Please adjust accordingly.
#Syncs Hostname rsync-module
1872 alviss.et.tudelft.nl fedora-linux-development
1508 alviss.et.tudelft.nl fedora-linux-updates
(I've already sent a note to the admin for alviss).
511 zeus1.kernel.org fedora-enchilada
509 zeus2.kernel.org fedora-enchilada
479 zeus3.kernel.org fedora-enchilada
475 zeus4.kernel.org fedora-enchilada
301 sunsite.ms.mff.cuni.cz fedora-enchilada
256 mirror01.widexs.nl fedora-linux-updates
256 mirror01.widexs.nl fedora-linux-releases
238 mandril.creatis.insa-lyon.fr fedora-enchilada
135 babbage.hrz.tu-chemnitz.de fedora-enchilada
128 zeus4.kernel.org fedora-epel
128 zeus3.kernel.org fedora-epel
128 zeus2.kernel.org fedora-epel
128 opal.cat.pdx.edu fedora-epel
128 mirror01.widexs.nl fedora-epel
127 zeus1.kernel.org fedora-epel
124 paja.nic.funet.fi fedora-enchilada
123 web1.lanscape.net fedora-epel
116 opal.cat.pdx.edu fedora-enchilada
110 spheniscus.uninett.no fedora-linux-development
102 riksun.riken.go.jp fedora-linux-updates
101 chernabog.cc.vt.edu fedora-enchilada
93 spheniscus.uninett.no fedora-enchilada
91 mirror01.widexs.nl fedora-enchilada
85 ftp.jaist.ac.jp fedora-enchilada
73 files01.es6.egwn.net fedora-linux-development
72 ftp-fhg.bi.fraunhofer.de fedora-linux-updates
68 mirror.hiwaay.net fedora-enchilada
63 gd.tuwien.ac.at fedora-linux-updates
62 files01.es6.egwn.net fedora-linux-updates
59 ftp.udl.es fedora-enchilada
56 nephtys.lip6.fr fedora-linux-updates
44 nat-pool-rdu.redhat.com fedora-linux-development
43 mandril.creatis.insa-lyon.fr fedora-epel
[1] http://fedoraproject.org/wiki/Infrastructure/Mirroring/Tiering
Thanks,
Matt
Fedora Mirror Wrangler
--
Matt Domsch
Linux Technology Strategist, Dell Office of the CTO
linux.dell.com & www.dell.com/linux
--
More information about the Mirror-admin
mailing list