[mirror-admin] master server sync stats and recommendations

Matt Domsch Matt_Domsch at dell.com
Tue Apr 21 20:45:15 EDT 2009


On Tue, Apr 21, 2009 at 06:31:57PM -0500, Chris Adams wrote:
> Just a few comments (not disagreeing with the idea that things need
> changing, just some observations):
> 
> Once upon a time, Matt Domsch <Matt_Domsch at dell.com> said:
> > - fedora-enchilada/linux/development changes at most once a day,
> >   around 5am Eastern Standard Time US.
> 
> The timing on this is highly erratic.  I often see changes during the
> day.  With rawhide changing every day (and taking time to stage), I
> found that if I don't sync repeatedly, I had many days with an
> inconsistent tree that was unusable (which makes testing rawhide hard).
> Looking back at my last week of logs, I see my mirror getting rawhide
> updates at:
> 
>    15 Apr 09:00
>    16 Apr 09:00
>    17 Apr 13:00
>    19 Apr 13:00
>    20 Apr 13:00
>    21 Apr 13:00

Thanks for checking.  This is odd.  I wonder if it is because we're in
freeze for Fedora 11, and rawhide is being manually composed.  The
actual times (UTC) for the past 5 days have indeed fluctuated quite a
bit.  Here are the times for rawhide i386, in reverse chronological
order.

2009-04-21 18:29:47
2009-04-20 10:45:52
2009-04-19 11:35:13
2009-04-17 12:36,56
2009-04-16 08:42:26

I'll have to ask release-engineering.


> Times are CDT, and my syncs were running at 1-23/6 (which I've reduced).
> I also was syncing too much due to an old hack in my script around some
> bug or another (I ran the same rsync back-to-back in some cases, and
> then a script bug meant I did it every time).  That's fixed now.
> 
> I used to sync rawhide, updates, and releases separately, but I thought
> it was recommended to sync enchilada as a whole (to keep hardlinks).
> 
> > - fedora-epel changes about monthly.  Sync this at most once a day.
> 
> I'm pretty sure EPEL changes more than monthly, since I've seen multiple
> sets of changes in a single day in the last week (unless I just hit
> while content was being updated).  It also doesn't appear to be at a
> consistent time of day.

Again, good catch.  epel-X changes about monthly.  epel-testing
changes daily.

OK, this means I really need to figure out how to get notifications
sent on content changes.  It clearly happens more frequently and on
different schedules than I had believed.


> > I'll work with John on kernel.org's sync script, which appears to
> > download (not sync) the fullfilelist every 15 minutes, and compares to
> > what's on the mirror.
> 
> At least for download3.fedora.redhat.com, I can do an HTTP HEAD request
> and get the correct Last-Modified time, so that would help here (I don't
> think all the tier 1 mirrors run HTTP on the same host as the rsync
> server though).
> 
> A simple shell bit to compare your timestamp against a master and exit
> if they are the same:
> 
> ########################################################################
> MASTER=download3.fedora.redhat.com
> LOCAL=/data/mirror
> FILE=/pub/fedora/fullfilelist
> 
> mstamp=$(curl -I http://$MASTER$FILE 2> /dev/null | grep '^Last-Modified:' | cut -d' ' -f2- | tr -d '\r')
> lstamp=$(LANG=C date -u '+%a, %d %b %Y %T GMT' -d @$(stat -c '%Y' $LOCAL$FILE))
> [ "$mstamp" = "$lstamp" ] && exit 0
> ########################################################################


Thanks for this!

> My main problems with tiering are (mostly just personal annoyances about
> it):
> 
> - As you noted, I have no way to know when a tier 1 mirror is synced (so
>   I'm still going to have to sync repeatedly through the day).
> 
> - Personally, I also mirror CPAN, and I twice ended up syncing from a
>   mirror where somebody had lost interest or something, so the mirror
>   ended unmaintained (and mine out of sync).  I ended up switching to
>   the master (if I'm having to look for a new mirror to sync from
>   periodically it is just a PITA).

Would it help if I made a set of DNS entries such as:
mirror-tier1-us-1.fedoraproject.org
mirror-tier1-us-2.fedoraproject.org
mirror-tier1-de-1.fedoraproject.org
...

and caused them to be CNAMEs to the actual Tier 1 mirrors?  Only, I
suppose, if they all use the same rsync module definitions.

> - The fact that I need to go manually coordinate with a tier 1 mirror
>   (or two) is annoying.  We have MirrorManager; shouldn't there be a way
>   for tier 1 mirrors to use data from there for configuring ACLs?  I
>   should be able to change download3.fedora.redhat.com with foo.bar.com
>   (same rsync modules and paths) and keep going.  If I have a problem
>   with a tier 1, I should be able to easily switch to another tier 1.

I made the rsync_acl file exactly so downstream mirrors could pull it
in.  But to be fair, it's not hard to get your own IP added to that
list, so most Tier 1 mirrors have their own ACL or username/password
pair, so some coordination would still be necessary.

The fact that the master mirrors don't use the MM rsync_acl is in part
due to the fact that it's so easy to put yourself on the list.

I'm open to suggestions on how to improve this, with the caveat that
the pre-bitflipped content we still want to restrict from
non-public-mirrors.

Thanks,
Matt

-- 
Matt Domsch
Linux Technology Strategist, Dell Office of the CTO
linux.dell.com & www.dell.com/linux

--


More information about the Mirror-admin mailing list