[mirror-admin] fullfilelist (was Re: Please use --delay-updates)

Carlos Carvalho carlos at fisica.ufpr.br
Tue Apr 20 11:45:07 EDT 2010


Mike McGrath (mmcgrath at redhat.com) wrote on 20 April 2010 09:48:
 >On Fri, 16 Apr 2010, Carlos Carvalho wrote:
 >
 >> Chuck Anderson (cra at wpi.edu) wrote on 16 April 2010 08:41:
 >>  >Each time you run rsync against your upstream mirror, it scans the
 >>  >entire filesystem to build a filelist.  This could take anywhere from
 >>  >5 to 20 minutes or more
 >>
 >> More... :-(
 >>
 >>  >and has been a factor in overloading the master mirrors in the past.
 >>
 >> I'd say nowadays too... The table below shows the time we take just to
 >> get the file list from sync.fedoraproject, for the last days. We
 >> mirror everything starting from release 11. It shows clearly that the
 >> machine suffers significantly from disk scanning. The file list is
 >> only about 22MB. Times are in UTC-3.
 >>
 >> If fullfilelist was done properly we could completely avoid this
 >> scanning...
 >>
 >
 >Can you expand more on this, how can we do fullfilelist properly?

Including timestamp and size (and type of object).

The current version only gives the names. Downstream mirrors can use it
to see what has been removed and created but cannot know what has been
modified. They're thus forced to request a full disk scan. If you put
the necessary info in fullfilelist mirrors can rsync it, see
*everything* that must be updated and directly request only what's
necessary with rsync --files-from. This way no disk scanning would be
necessary upstream.

The format I propose is the one generated by rsync itself:

% cd /path/to/repository
% rsync -r . > /path/to/fullfilelist

If you want fullfilelist to include itself it's of course necessary to
adjust it afterwards but that's easy. Note also that "self-inclusion"
is not necessary because mirrors would pull it always.

It's possible to maintain this list without scanning the repo; it can
be done by the procedure that updates the master. However even if it's
done by scanning, its cost will be compensated by the scans that the
mirrors will not inflict on the master. Even if it's only
fedora.c3sl.ufpr.br that avoids it :-)

--


More information about the Mirror-admin mailing list