[mirror-admin] report_mirror taking a long time to run
Matt Domsch
matt at domsch.com
Wed Nov 24 13:00:27 EST 2010
On Wed, Nov 24, 2010 at 10:56 AM, HEAnet Mirror Admin
<rh-mirror-list at heanet.ie> wrote:
> Hi,
>
> I'm still noticing excessively long run times for report_mirror, it
> took nearly six hours to complete this morning. Here's the output with
> start/finish timestamps (in epoch seconds) for each of the directory
> traversals:
>
> gen_dirtree started at 1290575953.78 /ftp/pub/fedora/linux/
> gen_dirtree finished at 1290583867.54
> gen_dirtree started at 1290583867.54 /ftp/pub/fedora/epel/
> gen_dirtree finished at 1290585276.7
> gen_dirtree started at 1290585276.7 /ftp/pub/fedora-archive/fedora/extras/
> gen_dirtree finished at 1290586649.04
> gen_dirtree started at 1290586649.04 /ftp/pub/fedora-archive/fedora/core/
> gen_dirtree finished at 1290588254.69
> gen_dirtree started at 1290588254.69 /ftp/pub/fedora-archive/fedora/linux/
> gen_dirtree finished at 1290599021.38
> Category Fedora Linux directories updated: 500 added: 0 deleted 0
> Category Fedora EPEL directories updated: 162 added: 0 deleted 0
> Category Fedora Extras directories updated: 87 added: 0 deleted 0
> Category Fedora Core directories updated: 426 added: 0 deleted 0
> Category Fedora Archive directories updated: 1 added: 0 deleted 652
> checked in successful
>
> Again, most of the delay seems to be happening when reading the fedora
> archive directory.
>
> We're using the latest version of report_mirror from git on Ubuntu
> 8.04 with Python 2.5.2, I've also attached our report_mirror.conf.
The reason for the deletions is that your report_mirror.conf [Fedora
Archive] section is incorrect. I blame myself, as I see the wiki
Mirroring page doesn't list the archive module. I'll correct the
wiki...
(we anchor the top of this module at /pub/archive/ on the master tree,
which contains fedora/ and thereunder.)
Yours reads:
[Fedora Archive]
enabled=1
path=/ftp/pub/fedora-archive/fedora/linux/
instead, the path line needs to be:
path=/ftp/pub/fedora-archive/
Now, that doesn't explain the slow directory listings. If you look at
the python code though, it's simply doing an os.walk() on the tree, so
it's limited to the speed it can do that. This part of the tree is
really really big - it has hundreds of thousands of files and hundreds
of directories, so it could easily take the nearly 3 hours you note
here just to walk it. For this reason, as the content changes only
very rarely, you can set enabled=0 for your regular rsync runs, and on
the rare occasions when you sync the archive content, run it once with
enabled=1.
Also, We have also done away with the [Fedora Core] and [Fedora
Extras] categories, as they're encompassed by [Fedora Archive] now.
So you need not scan those separately in report_mirror.conf.
FWIW, it takes MirrorManager right at 2 hours to run the
update-master-directory-list job that does the same, using NetApp
storage on the back end. And that's storage that isn't under
tremendous other load, as I presume yours is.
Hope this helps.
Thanks,
Matt
--
More information about the Mirror-admin
mailing list