[mirror-admin] master server sync stats and recommendations

Chris Adams cmadams at hiwaay.net
Tue Apr 21 19:31:57 EDT 2009


Just a few comments (not disagreeing with the idea that things need
changing, just some observations):

Once upon a time, Matt Domsch <Matt_Domsch at dell.com> said:
> - fedora-enchilada/linux/development changes at most once a day,
>   around 5am Eastern Standard Time US.

The timing on this is highly erratic.  I often see changes during the
day.  With rawhide changing every day (and taking time to stage), I
found that if I don't sync repeatedly, I had many days with an
inconsistent tree that was unusable (which makes testing rawhide hard).
Looking back at my last week of logs, I see my mirror getting rawhide
updates at:

   15 Apr 09:00
   16 Apr 09:00
   17 Apr 13:00
   19 Apr 13:00
   20 Apr 13:00
   21 Apr 13:00

Times are CDT, and my syncs were running at 1-23/6 (which I've reduced).
I also was syncing too much due to an old hack in my script around some
bug or another (I ran the same rsync back-to-back in some cases, and
then a script bug meant I did it every time).  That's fixed now.

I used to sync rawhide, updates, and releases separately, but I thought
it was recommended to sync enchilada as a whole (to keep hardlinks).

> - fedora-epel changes about monthly.  Sync this at most once a day.

I'm pretty sure EPEL changes more than monthly, since I've seen multiple
sets of changes in a single day in the last week (unless I just hit
while content was being updated).  It also doesn't appear to be at a
consistent time of day.

> I'll work with John on kernel.org's sync script, which appears to
> download (not sync) the fullfilelist every 15 minutes, and compares to
> what's on the mirror.

At least for download3.fedora.redhat.com, I can do an HTTP HEAD request
and get the correct Last-Modified time, so that would help here (I don't
think all the tier 1 mirrors run HTTP on the same host as the rsync
server though).

A simple shell bit to compare your timestamp against a master and exit
if they are the same:

########################################################################
MASTER=download3.fedora.redhat.com
LOCAL=/data/mirror
FILE=/pub/fedora/fullfilelist

mstamp=$(curl -I http://$MASTER$FILE 2> /dev/null | grep '^Last-Modified:' | cut -d' ' -f2- | tr -d '\r')
lstamp=$(LANG=C date -u '+%a, %d %b %Y %T GMT' -d @$(stat -c '%Y' $LOCAL$FILE))
[ "$mstamp" = "$lstamp" ] && exit 0
########################################################################


My main problems with tiering are (mostly just personal annoyances about
it):

- As you noted, I have no way to know when a tier 1 mirror is synced (so
  I'm still going to have to sync repeatedly through the day).

- Personally, I also mirror CPAN, and I twice ended up syncing from a
  mirror where somebody had lost interest or something, so the mirror
  ended unmaintained (and mine out of sync).  I ended up switching to
  the master (if I'm having to look for a new mirror to sync from
  periodically it is just a PITA).

- The fact that I need to go manually coordinate with a tier 1 mirror
  (or two) is annoying.  We have MirrorManager; shouldn't there be a way
  for tier 1 mirrors to use data from there for configuring ACLs?  I
  should be able to change download3.fedora.redhat.com with foo.bar.com
  (same rsync modules and paths) and keep going.  If I have a problem
  with a tier 1, I should be able to easily switch to another tier 1.


-- 
Chris Adams <cmadams at hiwaay.net>
Systems and Network Administrator - HiWAAY Internet Services
I don't speak for anybody but myself - that's enough trouble.

--


More information about the Mirror-admin mailing list