[mirror-admin] master server sync stats and recommendations
Chris Adams
cmadams at hiwaay.net
Tue Apr 21 19:31:57 EDT 2009
Just a few comments (not disagreeing with the idea that things need
changing, just some observations):
Once upon a time, Matt Domsch <Matt_Domsch at dell.com> said:
> - fedora-enchilada/linux/development changes at most once a day,
> around 5am Eastern Standard Time US.
The timing on this is highly erratic. I often see changes during the
day. With rawhide changing every day (and taking time to stage), I
found that if I don't sync repeatedly, I had many days with an
inconsistent tree that was unusable (which makes testing rawhide hard).
Looking back at my last week of logs, I see my mirror getting rawhide
updates at:
15 Apr 09:00
16 Apr 09:00
17 Apr 13:00
19 Apr 13:00
20 Apr 13:00
21 Apr 13:00
Times are CDT, and my syncs were running at 1-23/6 (which I've reduced).
I also was syncing too much due to an old hack in my script around some
bug or another (I ran the same rsync back-to-back in some cases, and
then a script bug meant I did it every time). That's fixed now.
I used to sync rawhide, updates, and releases separately, but I thought
it was recommended to sync enchilada as a whole (to keep hardlinks).
> - fedora-epel changes about monthly. Sync this at most once a day.
I'm pretty sure EPEL changes more than monthly, since I've seen multiple
sets of changes in a single day in the last week (unless I just hit
while content was being updated). It also doesn't appear to be at a
consistent time of day.
> I'll work with John on kernel.org's sync script, which appears to
> download (not sync) the fullfilelist every 15 minutes, and compares to
> what's on the mirror.
At least for download3.fedora.redhat.com, I can do an HTTP HEAD request
and get the correct Last-Modified time, so that would help here (I don't
think all the tier 1 mirrors run HTTP on the same host as the rsync
server though).
A simple shell bit to compare your timestamp against a master and exit
if they are the same:
########################################################################
MASTER=download3.fedora.redhat.com
LOCAL=/data/mirror
FILE=/pub/fedora/fullfilelist
mstamp=$(curl -I http://$MASTER$FILE 2> /dev/null | grep '^Last-Modified:' | cut -d' ' -f2- | tr -d '\r')
lstamp=$(LANG=C date -u '+%a, %d %b %Y %T GMT' -d @$(stat -c '%Y' $LOCAL$FILE))
[ "$mstamp" = "$lstamp" ] && exit 0
########################################################################
My main problems with tiering are (mostly just personal annoyances about
it):
- As you noted, I have no way to know when a tier 1 mirror is synced (so
I'm still going to have to sync repeatedly through the day).
- Personally, I also mirror CPAN, and I twice ended up syncing from a
mirror where somebody had lost interest or something, so the mirror
ended unmaintained (and mine out of sync). I ended up switching to
the master (if I'm having to look for a new mirror to sync from
periodically it is just a PITA).
- The fact that I need to go manually coordinate with a tier 1 mirror
(or two) is annoying. We have MirrorManager; shouldn't there be a way
for tier 1 mirrors to use data from there for configuring ACLs? I
should be able to change download3.fedora.redhat.com with foo.bar.com
(same rsync modules and paths) and keep going. If I have a problem
with a tier 1, I should be able to easily switch to another tier 1.
--
Chris Adams <cmadams at hiwaay.net>
Systems and Network Administrator - HiWAAY Internet Services
I don't speak for anybody but myself - that's enough trouble.
--
More information about the Mirror-admin
mailing list