[mirror-admin] push mirroring plans
Chris Schanzle
schanzle at nist.gov
Fri Jan 16 11:33:57 EST 2009
On 12/16/2008 11:17 PM, Matt_Domsch at Dell.com wrote:
> In a traditional "pull mirror" system like we have, each mirror site
> schedules an rsync job to happen at various times, and maybe there's new
> content, maybe there's not. Polling, over nearly 1TB of data. Not so
> nice.
>
I'm not convinced, if we're all effectively caching inode data
(particularly the master rsync server), if any of this optimization
effort is really needed.
It's not the 1TB we're concerned about, it's the N million files rsync
ends up lstat()-ing. Or is it the bandwidth to send these scans? Or is
it the rsync memory footprint we're using on the master? [What problem
needs solving?]
Just because I don't think it has been discussed, has it been considered
that the master use rsync's --only-write-batch=FILE option? Then the
mirrors download that one big "diff" that gets applied via
--read-batch=FILE. Efficient for both ends.
While figuring out a versioning mechanism (timestamp file with contents
= latest batch filename with a date/time?), it could be a very efficient
way to transport updates for the "big" mirrors with insignificant
polling effort. One additional cost is increasing disk usage on the
master, mitigated by retaining change batches for N days (where N is
probably most effective < 7 days). And if something gets out of wack,
the old rsync method is still available, and the new batch method is opt-in.
Best regards,
Chris
--
More information about the Mirror-admin
mailing list