[mirror-admin] fullfilelist (was Re: Please use --delay-updates)
Carlos Carvalho
carlos at fisica.ufpr.br
Tue Apr 20 18:50:44 EDT 2010
Mike McGrath (mmcgrath at redhat.com) wrote on 20 April 2010 14:52:
>On Tue, 20 Apr 2010, Carlos Carvalho wrote:
>
>> Mike McGrath (mmcgrath at redhat.com) wrote on 20 April 2010 09:48:
>> >On Fri, 16 Apr 2010, Carlos Carvalho wrote:
>> >
>> >> Chuck Anderson (cra at wpi.edu) wrote on 16 April 2010 08:41:
>> >> >Each time you run rsync against your upstream mirror, it scans the
>> >> >entire filesystem to build a filelist. This could take anywhere from
>> >> >5 to 20 minutes or more
>> >>
>> >> More... :-(
>> >>
>> >> >and has been a factor in overloading the master mirrors in the past.
>> >>
>> >> I'd say nowadays too... The table below shows the time we take just to
>> >> get the file list from sync.fedoraproject, for the last days. We
>> >> mirror everything starting from release 11. It shows clearly that the
>> >> machine suffers significantly from disk scanning. The file list is
>> >> only about 22MB. Times are in UTC-3.
>> >>
>> >> If fullfilelist was done properly we could completely avoid this
>> >> scanning...
>> >>
>> >
>> >Can you expand more on this, how can we do fullfilelist properly?
>>
>> Including timestamp and size (and type of object).
>>
>> The current version only gives the names. Downstream mirrors can use it
>> to see what has been removed and created but cannot know what has been
>> modified. They're thus forced to request a full disk scan. If you put
>> the necessary info in fullfilelist mirrors can rsync it, see
>> *everything* that must be updated and directly request only what's
>> necessary with rsync --files-from. This way no disk scanning would be
>> necessary upstream.
>>
>> The format I propose is the one generated by rsync itself:
>>
>> % cd /path/to/repository
>> % rsync -r . > /path/to/fullfilelist
>>
>> If you want fullfilelist to include itself it's of course necessary to
>> adjust it afterwards but that's easy. Note also that "self-inclusion"
>> is not necessary because mirrors would pull it always.
>>
>> It's possible to maintain this list without scanning the repo; it can
>> be done by the procedure that updates the master. However even if it's
>> done by scanning, its cost will be compensated by the scans that the
>> mirrors will not inflict on the master. Even if it's only
>> fedora.c3sl.ufpr.br that avoids it :-)
>>
>
>I take it once you pull that fullfilelist down, you'll do a diff against
>the fullfilelist you currently have to generate a final list or is there a
>step in there I'm not following?
That's the general idea, yes.
--
More information about the Mirror-admin
mailing list