[mirror-admin] fullfilelist (was Re: Please use --delay-updates)

Matt_Domsch at Dell.com Matt_Domsch at Dell.com
Tue Apr 20 12:16:59 EDT 2010


Is anyone using the plain format of the current fullfilelist file?  

fullfilelist
linux
linux/extras
linux/extras/README
linux/core
linux/core/development
linux/core/updates
linux/core/updates/README
linux/core/1
linux/core/1/README
linux/core/test
linux/core/2
linux/core/2/README
linux/core/3
linux/core/3/README
linux/core/4

If not, then this change should be painless.
If so, then we'll either need to get people to change their parser, or have both files (ugh).

--
Matt Domsch
Technology Strategist
Dell | Office of the CTO


-----Original Message-----
From: mirror-list-d-bounces at redhat.com [mailto:mirror-list-d-bounces at redhat.com] On Behalf Of Carlos Carvalho
Sent: Tuesday, April 20, 2010 10:45 AM
To: A private discussion group for official mirrors of ftp.redhat.com
Subject: fullfilelist (was Re: Please use --delay-updates)

Mike McGrath (mmcgrath at redhat.com) wrote on 20 April 2010 09:48:
 >On Fri, 16 Apr 2010, Carlos Carvalho wrote:
 >
 >> Chuck Anderson (cra at wpi.edu) wrote on 16 April 2010 08:41:
 >>  >Each time you run rsync against your upstream mirror, it scans the
 >>  >entire filesystem to build a filelist.  This could take anywhere from
 >>  >5 to 20 minutes or more
 >>
 >> More... :-(
 >>
 >>  >and has been a factor in overloading the master mirrors in the past.
 >>
 >> I'd say nowadays too... The table below shows the time we take just to
 >> get the file list from sync.fedoraproject, for the last days. We
 >> mirror everything starting from release 11. It shows clearly that the
 >> machine suffers significantly from disk scanning. The file list is
 >> only about 22MB. Times are in UTC-3.
 >>
 >> If fullfilelist was done properly we could completely avoid this
 >> scanning...
 >>
 >
 >Can you expand more on this, how can we do fullfilelist properly?

Including timestamp and size (and type of object).

The current version only gives the names. Downstream mirrors can use it
to see what has been removed and created but cannot know what has been
modified. They're thus forced to request a full disk scan. If you put
the necessary info in fullfilelist mirrors can rsync it, see
*everything* that must be updated and directly request only what's
necessary with rsync --files-from. This way no disk scanning would be
necessary upstream.

The format I propose is the one generated by rsync itself:

% cd /path/to/repository
% rsync -r . > /path/to/fullfilelist

If you want fullfilelist to include itself it's of course necessary to
adjust it afterwards but that's easy. Note also that "self-inclusion"
is not necessary because mirrors would pull it always.

It's possible to maintain this list without scanning the repo; it can
be done by the procedure that updates the master. However even if it's
done by scanning, its cost will be compensated by the scans that the
mirrors will not inflict on the master. Even if it's only
fedora.c3sl.ufpr.br that avoids it :-)

--

--


More information about the Mirror-admin mailing list