[mirror-admin] push mirroring plans

Chris Schanzle schanzle at nist.gov
Fri Jan 16 11:33:57 EST 2009


On 12/16/2008 11:17 PM, Matt_Domsch at Dell.com wrote:
> In a traditional "pull mirror" system like we have, each mirror site
> schedules an rsync job to happen at various times, and maybe there's new
> content, maybe there's not.  Polling, over nearly 1TB of data.  Not so
> nice.
>   

I'm not convinced, if we're all effectively caching inode data 
(particularly the master rsync server), if any of this optimization 
effort is really needed.

It's not the 1TB we're concerned about, it's the N million files rsync 
ends up lstat()-ing.  Or is it the bandwidth to send these scans?  Or is 
it the rsync memory footprint we're using on the master?  [What problem 
needs solving?]


Just because I don't think it has been discussed, has it been considered 
that the master use rsync's --only-write-batch=FILE option?  Then the 
mirrors download that one big "diff" that gets applied via 
--read-batch=FILE.  Efficient for both ends.

While figuring out a versioning mechanism (timestamp file with contents 
= latest batch filename with a date/time?), it could be a very efficient 
way to transport updates for the "big" mirrors with insignificant 
polling effort.  One additional cost is increasing disk usage on the 
master, mitigated by retaining change batches for N days (where N is 
probably most effective < 7 days).  And if something gets out of wack, 
the old rsync method is still available, and the new batch method is opt-in.

Best regards,
Chris

--


More information about the Mirror-admin mailing list