[mirror-admin] Please use --delay-updates

J.H. warthog19 at eaglescrag.net
Thu Apr 15 17:28:00 EDT 2010


On 04/15/2010 12:25 PM, Carlos Carvalho wrote:
> J.H. (warthog19 at eaglescrag.net) wrote on 15 April 2010 11:40:
>  >I'm not saying anything against using --delay-updates explicitly, beyond
>  >that it's an I/O thrash.
> 
> It's not I/O thrash. There's only one extra move for each file. It's
> only a move, never a copy because rsync won't work if the partial-dir
> is not in the same filesystem.

Even mucking with inodes can and does cause I/O thrash.

>  >Well there are the problems, as I continue to watch them for a lot of
>  >other distros, when you have the .~tmp~ directories in the main path,
>  >specifically if an rsync is running you can end up with a large number
>  >of .~tmp~ directories in your own tree.
> 
> You don't. rsync has protections for this, and even without them the
> permissions are locked. So in a properly configured machine the
> clients won't have access to the contents of the partial-dir, they
> will at most see the dir itself:
> 
> .../fedora-enchilada/linux/development/rawhide/i386/debug[ 3:57PM] ls -ld .~tmp~
> drwx------ 2 fedora fedora 4096 Apr 15 15:24 .~tmp~

Ok at the very least the directory is still owned by, in your case, the
fedora user, it's quite likely that for mirroring purposes your using
the same user, and thus mirrors from you are now fully capable of
syncing this, and no having yet another user to get past this is not
always acceptable.

>  >While this is obviously
>  >expected when you have, mirrors or other entities syncing from you (and
>  >keep in mind there are entities that use things like FTP to sync their
>  >trees) you can cause three separate and problematic occurrences:
>  >	1) You can end up with a nasty nesting of .~tmp~ directories if
>  >	   you have multiple servers in the chain syncing simultaneously
>  >	   (again keep in mind while the mirrors are using rsync, end
>  >	   users aren't all doing that)
> 
> Not true, for the reason above.

No this is still quite true, and it's possible for it to get leaked
across a different protocol.

>  >	2) You can end up in situations where files in .~tmp~ cannot be
>  >	   deleted from your mirror.  Rsync has issues with this, why I
>  >	   have no idea, but I can point out numerous instances where
>  >	   if it gets synced to a client it has to be gone and manually
>  >	   deleted
> 
> Not if you have a recent enough version, as everyone should. Version
> 2.6.8 had a bug where a temporary (not a partial) would be left. It's
> a very old version though and should no longer be used.

Calling 2.6.8 old and not used is hard since it's currently still
shipping with CentOS 5 (and RHEL 5).  Not everyone who mirrors content
uses Fedora like kernel.org, lots of people use a more stable distro
like RHEL or CentOS for a multitude of reasons.

>  >	3) You get an inconsistent view across rsync, ftp and http
> 
> Irrelevant. The point here is to offer a consistent view for
> downloaders. These clients will never look for anything that's
> not in the distro, so they'll never see the partial-dirs.

Yes but I don't think that presenting a consistent view to things like
yum is not being argued here.  Yes, --delay-updates achieves that I'm
not contesting that in the slightest here.  What I'm worried about
specifically is the other implications of using --delay-updates, and if
we are going to --delay-updates now, we are ultimately going to end up
at proper atomic syncs from our up-streams.

Beyond that claiming that people who are mirroring the entire directory
structure with FTP, or tools that may be parsing the directories from an
HTML perspective as irrelevant is just asking for trouble in external
tools.  Now your likely to say that they should re-code tools or
scripts, but personally I think it's easier for the few of us who are
running mirrors to "do the right thing (tm)" vs. doing something
partial.  Couple that with the grumbling problems I see happening when
--delay-updates gets turned on for other distros, I'm just trying to
head off those discussions now.

I'm also at the point where I'm happy to fix my own scripts to do full
atomic syncing where the tree that would be visible to any protocol is
consistent and share them, but this discussion is starting to
degenerate.  All I'm proposing is that we should be properly consistent,
and avoid problems that I am actively seeing elsewhere, that's it.

- John 'Warthog9' Hawley

--


More information about the Mirror-admin mailing list