[mirror-admin] push mirroring plans

J.H. warthog19 at eaglescrag.net
Fri Jan 16 11:40:14 EST 2009


Chris Schanzle wrote:
> On 12/16/2008 11:17 PM, Matt_Domsch at Dell.com wrote:
>> In a traditional "pull mirror" system like we have, each mirror site
>> schedules an rsync job to happen at various times, and maybe there's new
>> content, maybe there's not.  Polling, over nearly 1TB of data.  Not so
>> nice.
>>   
> 
> I'm not convinced, if we're all effectively caching inode data 
> (particularly the master rsync server), if any of this optimization 
> effort is really needed.
> 
> It's not the 1TB we're concerned about, it's the N million files rsync 
> ends up lstat()-ing.  Or is it the bandwidth to send these scans?  Or is 
> it the rsync memory footprint we're using on the master?  [What problem 
> needs solving?]

The basic problem here is doing the lstat on both the client and the 
server end is *painful* particularly when this brings things into your 
disk cache that aren't actually getting used by other processes, doing a 
lot of extraneous read operations forcing higher I/O to your disks, etc. 
  Effectively: all of the above.

> 
> 
> Just because I don't think it has been discussed, has it been considered 
> that the master use rsync's --only-write-batch=FILE option?  Then the 
> mirrors download that one big "diff" that gets applied via 
> --read-batch=FILE.  Efficient for both ends.
> 
> While figuring out a versioning mechanism (timestamp file with contents 
> = latest batch filename with a date/time?), it could be a very efficient 
> way to transport updates for the "big" mirrors with insignificant 
> polling effort.  One additional cost is increasing disk usage on the 
> master, mitigated by retaining change batches for N days (where N is 
> probably most effective < 7 days).  And if something gets out of wack, 
> the old rsync method is still available, and the new batch method is 
> opt-in.
> 
> Best regards,
> Chris
> 
> -- 

--


More information about the Mirror-admin mailing list