[mirror-admin] Mirroring scripts?
David Richardson
david.richardson at utah.edu
Wed Apr 8 12:35:53 EDT 2015
On Wed, 8 Apr 2015, Jason L Tibbitts III wrote:
> I'm bringing up a couple of mirrors. They're large enough to hold all
> of fedora-buffet0 plus some other stuff, and after several days of
> downloading I have all of that content copied over. (The machines are
> already in the ACL for that rsync module.)
>
> I was wondering if someone could share their mirroring scripts,
> particularly how you deal with the accumulation of those .~tmp~
> directories when using the various delay options to rsync. I can just
> cobble things together from what's on
> https://fedoraproject.org/wiki/Infrastructure/Mirroring but I'd hate to
> get it wrong.
>
> Also, does anyone who mirrors fedora-buffet0 separately pull subtrees at
> a different rate, or do you always pull the whole thing? It takes a
> _long_ time to generate the entire file list (> 3M files).
Jason,
I had the same problem you did with rsync taking forever (and find, and
ls, and httpd).
Changing the sysctl vm.vfs_cache_pressure made a night-and-day difference
(default is 100, I set it to 10).
vm.vfs_cache_pressure controls caching of inode data versus file contents.
The default (and centerpoint) is 100. Values less than 100 favor inode
data, values greater than 100 favors file contents. Do NOT set it to zero.
My understanding that if you set it to zero, bad things will happen and
you will eventually OOM.
With this change, all my metadata stays in cache. I have two million
inodes in use, and this setting costs me about 4GB of RAM. A no-change
Fedora rsync takes 20 seconds for 425GB of content in 950k files (I
exclude development and SRPMS).
I use --delete to handle the .~tmp~ directories. If one of my runs aborts,
the next run will clean up after it.
My script is basically rsync wrapped with flock (rather than trying to
cobble together a lock-file system).
The 200 in the flock command (and again at the end) is just a filehandle
number; it doesn't really matter what it is, as long as nothing else uses
it. The file name at the end also doesn't much matter. The file needs to
be writeable (or creatable if it doesn't exist), but nothing is written to
it. There's also no need to remove it afterwards.
### SCRIPT BEGINS ###
(
flock -n 200 || { echo "Script is already running. Aborting." ; exit 1 ; }
# ... commands executed under lock ...
/usr/bin/rsync --progress -aHv --update --delete --delete-excluded
--delete-after --delay-updates rsync://your/source /your/dest/path
/usr/bin/report_mirror
) 200>/tmp/lock.update-fedora
### SCRIPT ENDS ###
Hope you find this useful!
DR
--
David Richardson <david.richardson at utah.edu>
Center for High Performance Computing at the University of Utah
--
More information about the Mirror-admin
mailing list