[mirror-admin] missing and extraneous files at kernel.org

J.H. warthog19 at eaglescrag.net
Mon Aug 10 17:24:08 EDT 2009


Carlos Carvalho wrote:
> J.H. (warthog19 at eaglescrag.net) wrote on 8 August 2009 15:00:
>  >We don't, and never have, included any of the index.html files from 
>  >upstream, that will not be fixed,
> 
> Why? That sounds arbitrary. If you're a tier-1 mirror you should
> replicate upstream, not decide on its contents. You can only exclude
> whole portions of the tree due to resource limitation. This is not the
> case for a few small files.

In fact it is not arbitrary, and completely intentional.  Upstream 
distro providers have a large number of people who have access to their 
repositories, and I have an automated sync process.  Explicitly dropping 
index.html files means several things:

	1) When a user browses to a page on mirrors.kernel.org they are
	   guaranteed that what they are seeing was generated from the
	   file listing, not by a rogue index.html page.  This means the
	   users can implicitly trust what they are seeing.

	2) It means I'm not unintentionally hosting websites under the
	   mirrors that I either don't know about or don't want to be
	   hosting.  I've got a lot of server / bandwidth resources I'm
	   sure there's a botnet somewhere that would love to sneak
	   something on there, I'm just trying to make things more
	   obvious if/when something like that does happen.

	3) This also means that I don't propagate these things to
	   mirrors who are mirroring from me.

I have in the past, with Fedora having been a specific instance of this, 
made exceptions and mirrored whole portions of websites and provided 
access to these, which I know Fedora was quite thankful for at one point 
several releases ago.  But no this is not arbitrary in the slightest.

> 
>  >specifically those files aren't used for anything.
> 
> But we never know what downstream or users may want to do...

If the distro maintainers were to come to me and tell me it was 
important not to block those files I would concede and lift that 
restriction, but so far none of them have and if you look at what those 
files are / do they aren't used for anything useful

> I also feel tempted to do this (particularly with that massive and
> nearly useless fullfilelist) but up to now have [painfully] resisted :-)

To downstream end users it's more or less useless but I have seen people 
using sync scripts that make use of fullfilelist, and that i would 
consider vaguely useful file, and it doesn't have any indirect immediate 
security implications which something like index.html would have.

>  >As for the broken symlinks I will get those dealt with, 
>  >but that is an exciting problem of syncing from the master servers that 
>  >sometimes they don't fully cleanup their entire tree.
> 
> However these have disappeared from the master more than a day ago. Or
> maybe there's a difference between the masters.


I've noticed, especially with Fedora's rsync, that it won't always clean 
up everything properly, which this is a case in point of.  Why, I've got 
no clue.  I perform full rsyncs on a very frequent basis so this 
shouldn't be happening, but I can only rack it up to Fedora running a 
*MUCH* older version of rsync than I.

- John 'Warthog9' Hawley
Chief Kernel.org Administrator

--


More information about the Mirror-admin mailing list