[mirror-admin] master server sync stats and recommendations

Matt Domsch Matt_Domsch at dell.com
Tue Apr 21 13:37:25 EDT 2009


I have been examining the rsyncd log files from the master download
servers (download*.fedora.redhat.com). Data measured between
2009-04-12 04:03:01 and 2009-04-17 12:00:08 on hosts
download[1235]. (no logs for download4 are available for this run).

Total: 4.4TB via 15550 successful transfers in 5 days 8 hours by 113
hosts.  104 hosts were downloading Fedora content, the others the
historical Red Hat content.

Last year we implement mirror tiering, whereby a few Tier 1 mirrors
sync from download*.fedora.redhat.com directly, and other mirrors sync
from those Tier 1 mirrors. We list 11 Tier 1 mirrors [1].  I would
like to see the number of downstream mirrors hitting the masters
decrease by half or more, and for those mirrors to hit a Tier 1 mirror
instead.

** If you are not a Tier 1 mirror, please change your rsync scripts to
   sync from a Tier 1 mirror.

There were additionally 5.4 million (yes million!) unsuccessful
transfers during this same time.  I don't have the IP addresses of
these, but I'd sure like to know what's going on there.

In terms of transfer sizes, we get the following frequency breakdown
(size listed is the upper limit; if a transfer was 400 bytes, it
appears in the 1KB frequency bucket).

Size	Freq	%age
1KB	1248	8.03
10KB	6	0.04
100KB	190	1.22
1MB	4623	29.73
10MB	7164	46.07
100MB	1396	8.98
1GB	443	2.85
10GB	475	3.05
>10GB	5	0.03

Nearly 40% of the successful transfers are "null rsyncs" - meaning
less than 1MB of data was transferred (essentially no content
changes). It's possible that some of the 10MB bucket should be
included in the "null rsync" list too. If true, then nearly 85% of the
time, mirrors are performing "null rsyncs".

** If we can reduce the number of "null rsyncs", that will drop the
   number of connection and master directory walks and related stat()s
   considerably.


How often do/should mirrors sync?  We're trying to get smarter about
this.  Content changes on the master mirrors far less frequently than mirrors
are syncing.

- fedora-enchilada/linux/development changes at most once a day,
  around 5am Eastern Standard Time US.
- fedora-enchilada/linux/updates changes once or maybe twice a day,
  but not on a fixed schedule (humans involved in the posting).  This
  can be synced 4-6 times a day if necessary.
- fedora-enchilada/linux/releases changes about monthly.  This content
  is staged and then bitflipped around 10am Eastern Standard Time US
  on release day.
- fedora-epel changes about monthly.  Sync this at most once a day.

** I recommend if you are syncing all of fedora-enchilada, that you do
   so at most 4-6 times a day (every 4-6 hours), and we'll work to
   reduce even this number.

** I recommend if you are syncing fedora-epel, you sync at most once a day.

** Tier 1 mirrors can use the new rsyncFilter API, or sync the
   'fullfilelist' file, to see if anything has changed.  I'm still
   working on enabling the rsyncFilter API for Tier 2 mirrors.

** For Tier 2 mirrors, if your mirror is syncing more than every 4
   hours, you are syncing too often.  I believe we can live with a
   4-hour propogation delay.

** If we can signal to mirrors when new content is available, to
   trigger them to sync, we can eliminate most of the "null rsyncs".
   Tiering makes this more difficult, but still possible.  Advise (and
   code!) welcome.

I'll work with John on kernel.org's sync script, which appears to
download (not sync) the fullfilelist every 15 minutes, and compares to
what's on the mirror.  That inflates their sync number considerably,
even though each transfer is only 2-4MB.  If others are similarly
downloading fullfilelist and not syncing it, please adjust also.

Here's the mirrors I think are syncing too often.  The #Syncs is the
number of successful syncs they had over the 5.3 days measured above.
I would expect ~30 syncs.  Most of these _should_ sync to one of the
Tier 1 mirrors instead of the masters.  Please adjust accordingly.

#Syncs  Hostname  rsync-module
   1872 alviss.et.tudelft.nl fedora-linux-development
   1508 alviss.et.tudelft.nl fedora-linux-updates
       (I've already sent a note to the admin for alviss).
    511 zeus1.kernel.org fedora-enchilada
    509 zeus2.kernel.org fedora-enchilada
    479 zeus3.kernel.org fedora-enchilada
    475 zeus4.kernel.org fedora-enchilada
    301 sunsite.ms.mff.cuni.cz fedora-enchilada
    256 mirror01.widexs.nl fedora-linux-updates
    256 mirror01.widexs.nl fedora-linux-releases
    238 mandril.creatis.insa-lyon.fr fedora-enchilada
    135 babbage.hrz.tu-chemnitz.de fedora-enchilada
    128 zeus4.kernel.org fedora-epel
    128 zeus3.kernel.org fedora-epel
    128 zeus2.kernel.org fedora-epel
    128 opal.cat.pdx.edu fedora-epel
    128 mirror01.widexs.nl fedora-epel
    127 zeus1.kernel.org fedora-epel
    124 paja.nic.funet.fi fedora-enchilada
    123 web1.lanscape.net fedora-epel
    116 opal.cat.pdx.edu fedora-enchilada
    110 spheniscus.uninett.no fedora-linux-development
    102 riksun.riken.go.jp fedora-linux-updates
    101 chernabog.cc.vt.edu fedora-enchilada
     93 spheniscus.uninett.no fedora-enchilada
     91 mirror01.widexs.nl fedora-enchilada
     85 ftp.jaist.ac.jp fedora-enchilada
     73 files01.es6.egwn.net fedora-linux-development
     72 ftp-fhg.bi.fraunhofer.de fedora-linux-updates
     68 mirror.hiwaay.net fedora-enchilada
     63 gd.tuwien.ac.at fedora-linux-updates
     62 files01.es6.egwn.net fedora-linux-updates
     59 ftp.udl.es fedora-enchilada
     56 nephtys.lip6.fr fedora-linux-updates
     44 nat-pool-rdu.redhat.com fedora-linux-development
     43 mandril.creatis.insa-lyon.fr fedora-epel


[1] http://fedoraproject.org/wiki/Infrastructure/Mirroring/Tiering

Thanks,
Matt
Fedora Mirror Wrangler

-- 
Matt Domsch
Linux Technology Strategist, Dell Office of the CTO
linux.dell.com & www.dell.com/linux

--


More information about the Mirror-admin mailing list