[mirror-admin] Generating fullfilelist

Chris Schanzle schanzle at nist.gov
Tue Mar 29 19:27:26 EDT 2016


On 03/29/2016 05:15 PM, Carlos wrote:
> Stephen John Smoogen (smooge at gmail.com) wrote on Tue, Mar 29, 2016 at 03:57:09PM BRT:
>> currently fullfilelist gets generated in several areas via a find *
>> command and has the format of:
>>
>>
>> fedora
>> fedora/linux
>>
>> if we use a rsync and put the the file format would be:
>>
>> ==> good-fullfilelist <==
>> drwxr-xr-x        4096 2015/08/20 01:15:38 .
>> -rw-r--r--     1273297 2016/03/28 22:46:34 DIRECTORY_SIZES.txt
>> drwxr-xr-x        4096 2016/02/25 01:54:43 alt
>> drwxrwxr-x        4096 2016/01/29 20:58:53 alt/anaconda
>> -rw-r--r--   486539264 2016/01/29 21:05:07
>> alt/anaconda/rawhide-docker-2-boot.iso
>> -rw-r--r--   488636416 2016/01/21 22:41:23 alt/anaconda/rawhide-docker-boot.iso
>> -rw-rw-r--          90 2016/01/21 22:48:33
>> alt/anaconda/rawhide-docker-boot.iso.sha256
>> drwxrwsr-x        4096 2015/10/02 18:25:17 alt/atomic
>> drwxrwsr-x        4096 2016/01/27 17:46:45 alt/atomic/stable
>> drwxrwsr-x        4096 2016/03/08 19:23:15 alt/atomic/stable/Cloud-Images
>>
>> I can then cut -c44- good-fullfilelist > fullfilelist to get a similar
>> output as the 'find *' however it won't exactly be the same and I want
>> to get an idea of who this will break?
> The paths will of course be the same, only the order can change.
>
> However, cut -c44- may be dangerous because rsync will break the alignment if
> the file size doesn't fit in the allocated space. If this happens fullfilelist
> will be corrupted and make a mess for mirrors.
>
> Some distros don't allow whitespace in filenames; if this is the case of Fedora
> you can use a simple awk '{print $NF}'. If names are allowed to have whitespace
> a more complicated awk script is necessary. I can dig out what I use if
> necessary.
>

Perhaps sed?  E.g.,

echo '-rw-r--r--   488636416   2016/01/21 22:41:23 alt foo bar' | sed -r 's/^(\S*\s*){4}//'
alt foo bar

Though leading spaces in filenames will get removed.  Specifically targeting the final timestamp, this might work:

echo '-rw-r--r--   488636416   2016/01/21 22:41:23   alt foo bar' | sed -r 's/^(\S*\s*){3}[0-9:]* //'
  alt foo bar

Consider running through sort...perhaps LANG=C sort.

--


More information about the Mirror-admin mailing list