[mirror-admin] missing and extraneous files at kernel.org
J.H.
warthog19 at eaglescrag.net
Mon Aug 10 17:24:08 EDT 2009
Carlos Carvalho wrote:
> J.H. (warthog19 at eaglescrag.net) wrote on 8 August 2009 15:00:
> >We don't, and never have, included any of the index.html files from
> >upstream, that will not be fixed,
>
> Why? That sounds arbitrary. If you're a tier-1 mirror you should
> replicate upstream, not decide on its contents. You can only exclude
> whole portions of the tree due to resource limitation. This is not the
> case for a few small files.
In fact it is not arbitrary, and completely intentional. Upstream
distro providers have a large number of people who have access to their
repositories, and I have an automated sync process. Explicitly dropping
index.html files means several things:
1) When a user browses to a page on mirrors.kernel.org they are
guaranteed that what they are seeing was generated from the
file listing, not by a rogue index.html page. This means the
users can implicitly trust what they are seeing.
2) It means I'm not unintentionally hosting websites under the
mirrors that I either don't know about or don't want to be
hosting. I've got a lot of server / bandwidth resources I'm
sure there's a botnet somewhere that would love to sneak
something on there, I'm just trying to make things more
obvious if/when something like that does happen.
3) This also means that I don't propagate these things to
mirrors who are mirroring from me.
I have in the past, with Fedora having been a specific instance of this,
made exceptions and mirrored whole portions of websites and provided
access to these, which I know Fedora was quite thankful for at one point
several releases ago. But no this is not arbitrary in the slightest.
>
> >specifically those files aren't used for anything.
>
> But we never know what downstream or users may want to do...
If the distro maintainers were to come to me and tell me it was
important not to block those files I would concede and lift that
restriction, but so far none of them have and if you look at what those
files are / do they aren't used for anything useful
> I also feel tempted to do this (particularly with that massive and
> nearly useless fullfilelist) but up to now have [painfully] resisted :-)
To downstream end users it's more or less useless but I have seen people
using sync scripts that make use of fullfilelist, and that i would
consider vaguely useful file, and it doesn't have any indirect immediate
security implications which something like index.html would have.
> >As for the broken symlinks I will get those dealt with,
> >but that is an exciting problem of syncing from the master servers that
> >sometimes they don't fully cleanup their entire tree.
>
> However these have disappeared from the master more than a day ago. Or
> maybe there's a difference between the masters.
I've noticed, especially with Fedora's rsync, that it won't always clean
up everything properly, which this is a case in point of. Why, I've got
no clue. I perform full rsyncs on a very frequent basis so this
shouldn't be happening, but I can only rack it up to Fedora running a
*MUCH* older version of rsync than I.
- John 'Warthog9' Hawley
Chief Kernel.org Administrator
--
More information about the Mirror-admin
mailing list