[mirror-admin] ERROR: chroot failed for fedora-web

Matt Domsch Matt_Domsch at dell.com
Thu Jan 15 21:06:42 EST 2009


On Thu, Jan 15, 2009 at 11:28:07PM -0200, Carlos Carvalho wrote:
> Jesse Keating (jkeating at redhat.com) wrote on 15 January 2009 16:34:
>  >On Tue, 2009-01-13 at 23:54 -0200, Carlos Carvalho wrote:
>  >>  >We're generating something that rsync can consume directly, rather
>  >>  >than building up some other infrastructure around it.
>  >> 
>  >> What do you mean by "rsync can consume"? Using the filelist generated
>  >> by rsync has the info
>  >
>  >erm, I mean exactly what I say. It appears that rsync --files-from
>  >can only consume a file that has paths, either null or newline
>  >terminated. No other information.
> 
> Correct.
> 
>  >So again, I'm not sure what value you're getting from using rsync vs
>  >find to generate the filelist.
>  
> What are we mirrors supposed to do with it? Do you want us to just
> feed fullfilelist to --files-from? That's useless because the disk
> scavenging will continue to happen on both ends. Should we diff from
> the previous version and just give the differences to --files-from?
> This doesn't work because files with the same name may have changed.
> Since they won't be given to rsync we'll end up with a broken mirror.
> 
> This simple list of just file names just increases the size of the
> archive and doesn't help mirroring.
> 
> You can use find or even ls (with appropriate options) to generate the
> list. The important point is that it *must* have enough info for us to
> decide if we have to give the name to rsync or not so that we only ask
> you to check files that really should be checked, thus avoiding disk
> scanning on both ends. There are several ways to do it; one of them is
> to check for the n-tuple (permissions,size,timestamp), which is what
> rsync does btw. Another one is to use checksums.
> 
> Once we have enough info in that file our scripts can determine what
> must be given in --files-from. As I said in another post, we already
> do it, even using a ls-lR awful list (for other distros). The script
> is not a one-liner though.

If people are willing to run a (gasp: python!) script on their servers
(hey, you already run report_mirror...), it would be pretty trivial to
write an app to help with this.

Master: os.walk() and generate a list of dirs and files with the
(permissions,size,timestamp) tuple.  Hey look - report_mirror does
this already...  Write this out as a pickle.

Downstream Mirrors:  rsync pickle.  If it hasn't changed, exit.  If it
has changed, read the pickle, do your own os.walk(), drop them into a
python set() object, and diff them.  Regen the file list from that.
Feed list to rsync.

This will reduce the load on the upstream mirrors considerably, at the
cost of needing to run a script on the downstream mirrors, the results
of which are fed to rsync.


-- 
Matt Domsch
Linux Technology Strategist, Dell Office of the CTO
linux.dell.com & www.dell.com/linux

--


More information about the Mirror-admin mailing list