[mirror-admin] push mirroring plans
J.H.
warthog19 at eaglescrag.net
Fri Jan 16 11:40:14 EST 2009
Chris Schanzle wrote:
> On 12/16/2008 11:17 PM, Matt_Domsch at Dell.com wrote:
>> In a traditional "pull mirror" system like we have, each mirror site
>> schedules an rsync job to happen at various times, and maybe there's new
>> content, maybe there's not. Polling, over nearly 1TB of data. Not so
>> nice.
>>
>
> I'm not convinced, if we're all effectively caching inode data
> (particularly the master rsync server), if any of this optimization
> effort is really needed.
>
> It's not the 1TB we're concerned about, it's the N million files rsync
> ends up lstat()-ing. Or is it the bandwidth to send these scans? Or is
> it the rsync memory footprint we're using on the master? [What problem
> needs solving?]
The basic problem here is doing the lstat on both the client and the
server end is *painful* particularly when this brings things into your
disk cache that aren't actually getting used by other processes, doing a
lot of extraneous read operations forcing higher I/O to your disks, etc.
Effectively: all of the above.
>
>
> Just because I don't think it has been discussed, has it been considered
> that the master use rsync's --only-write-batch=FILE option? Then the
> mirrors download that one big "diff" that gets applied via
> --read-batch=FILE. Efficient for both ends.
>
> While figuring out a versioning mechanism (timestamp file with contents
> = latest batch filename with a date/time?), it could be a very efficient
> way to transport updates for the "big" mirrors with insignificant
> polling effort. One additional cost is increasing disk usage on the
> master, mitigated by retaining change batches for N days (where N is
> probably most effective < 7 days). And if something gets out of wack,
> the old rsync method is still available, and the new batch method is
> opt-in.
>
> Best regards,
> Chris
>
> --
--
More information about the Mirror-admin
mailing list