[mirror-admin] push mirroring plans

Fri Jan 16 13:15:41 EST 2009

Chuck Anderson (cra at WPI.EDU) wrote on 16 January 2009 11:42:
 >On Fri, Jan 16, 2009 at 11:33:57AM -0500, Chris Schanzle wrote:
 >> On 12/16/2008 11:17 PM, Matt_Domsch at Dell.com wrote:
 >>> In a traditional "pull mirror" system like we have, each mirror site
 >>> schedules an rsync job to happen at various times, and maybe there's new
 >>> content, maybe there's not.  Polling, over nearly 1TB of data.  Not so
 >>> nice.
 >>>   
 >>
 >> I'm not convinced, if we're all effectively caching inode data  
 >> (particularly the master rsync server), if any of this optimization  
 >> effort is really needed.

Yes, but we are not caching [enough] inode data, and won't be, because
we also have to serve files.

 >> Just because I don't think it has been discussed, has it been considered  
 >> that the master use rsync's --only-write-batch=FILE option?  Then the  
 >> mirrors download that one big "diff" that gets applied via  
 >> --read-batch=FILE.  Efficient for both ends.

This isn't useful because it can only be used if all mirrors are in
the same situation as the master before the update.

 >I did mention this option at FUDCon.  My problem with relying on 
 >incremental filelists or incremental batches is that if a mirror gets 
 >behind for some reason, it will need to run a full rsync to get up to 
 >date.  If there isn't any way to automate that, the system is going to 
 >fail with broken mirrors.

 >I'd much prefer the fullfilelist method where the client's rsync can
 >do its job and automatically figure out what changed, rather than
 >trying to re-implment the "what changed" part with ad-hoc scripts.

Exactly. Incremental lists are not robust. Worse, they are even not
necessary.

--