[mirror-admin] rsync targets for Fedora content - other ideas for slooowness.
David Timms
dtimms at iinet.net.au
Tue Jul 16 09:41:42 EDT 2013
On 13/07/13 08:27, Matt_Domsch at dell.com wrote:
> As the number of directories with Fedora content has exploded, in large
> part due to splitting the 20k files into separate [a]/ [b]/ [c]/
> directories, the MirrorManager crawler has gotten slower too. 90
I wonder if there is something else not quite right, for eg the mirror
that I've pointed to [iinet.net.au] are happy to be pointed at, but not
run report mirror.
I have noticed when manually checking via either http or ftp, that even
to get to the top directory, their web site takes some time to show the
content:
http://ftp.iinet.net.au/pub/fedora/linux/updates/18/x86_64/
with firefox took around 90secs just for dir list.(about 82 to some of
the page drawn).
And all updates rpm are in the one folder. Currently this seems to have
only the most recent update.rpm (rather than keeping every previous
version as they had done some time ago, and I think was crippling their
server), and causing even slower access.
===
A while ago their performance was even worse ... many minutes to show:
http://ftp.iinet.net.au/pub/fedora/linux/updates/18/x86_64/repodata/
When it did, they had hundreds, maybe a thousand
randomasciihex-type-f18-xml.gz files in the folder, rather than just the
newest ones.
----
The drpms folder has twice the content count (rpmdiffs beteen two
different versions and current, usually), and is even slower to load.
What I'm getting at is:
1. encourage mirrors to ensure they have rsyns's --delete-(after) etc
to make sure that old files get removed from the mirror, eg. that could
be tested with the crawl (is their old content present - and send an
email like done recently).
2. The updates and updates/drpms folders could well be split up like
Fedora N/Everything into first letter folders. The reason I suggest this
is that a typical (apache) servering this content is very slow just to
load that directory list. Might be something to push for in the
infrastructure/yum/repodata creation side for updates ?
3. In mirrormanager, shortly after a release, each site could be checked
for content for Fedora release, but once it has this complete unchanging
content, it would be possible to just check that repodata is present,
along with a single/random/latest created file in each lettered folder.
If they are present, assume mirror is complete.
You could then do that (isn't that 20000 rpms?) once every month/3/6
etc, to at least eventually drop mirrors that cull content (for whatever
reason).
4. In similar vein to (3) have the crawl concentrate on
updates/repodata, and perhaps check that only the 10 newest rpms/drpms
in the repodata are actually present on the mirror.
Anyway, just throwing around some ideas..., not having the slightest
idea of ow mm actually works :-)
David.
--
More information about the Mirror-admin
mailing list