[mirror-admin] push mirroring plans

Chuck Anderson cra at WPI.EDU
Fri Jan 16 11:42:03 EST 2009


On Fri, Jan 16, 2009 at 11:33:57AM -0500, Chris Schanzle wrote:
> On 12/16/2008 11:17 PM, Matt_Domsch at Dell.com wrote:
>> In a traditional "pull mirror" system like we have, each mirror site
>> schedules an rsync job to happen at various times, and maybe there's new
>> content, maybe there's not.  Polling, over nearly 1TB of data.  Not so
>> nice.
>>   
>
> I'm not convinced, if we're all effectively caching inode data  
> (particularly the master rsync server), if any of this optimization  
> effort is really needed.
>
> It's not the 1TB we're concerned about, it's the N million files rsync  
> ends up lstat()-ing.  Or is it the bandwidth to send these scans?  Or is  
> it the rsync memory footprint we're using on the master?  [What problem  
> needs solving?]
>
>
> Just because I don't think it has been discussed, has it been considered  
> that the master use rsync's --only-write-batch=FILE option?  Then the  
> mirrors download that one big "diff" that gets applied via  
> --read-batch=FILE.  Efficient for both ends.

I did mention this option at FUDCon.  My problem with relying on 
incremental filelists or incremental batches is that if a mirror gets 
behind for some reason, it will need to run a full rsync to get up to 
date.  If there isn't any way to automate that, the system is going to 
fail with broken mirrors.

> While figuring out a versioning mechanism (timestamp file with contents  
> = latest batch filename with a date/time?), it could be a very efficient  
> way to transport updates for the "big" mirrors with insignificant  
> polling effort.  One additional cost is increasing disk usage on the  
> master, mitigated by retaining change batches for N days (where N is  
> probably most effective < 7 days).  And if something gets out of wack,  
> the old rsync method is still available, and the new batch method is 
> opt-in.

The batch method or incremental filelist method only contains what 
changed since "last time".  I'd much prefer the fullfilelist method 
where the client's rsync can do its job and automatically figure out 
what changed, rather than trying to re-implment the "what changed" 
part with ad-hoc scripts.

--


More information about the Mirror-admin mailing list