[ale] screen -d -m
Dow Hurst
Dow.Hurst at mindspring.com
Wed Apr 12 22:57:22 EDT 2006
I found a nice trick. I wanted to run a python program called Modeller
that generates protein structures from templates. Not important to
y'all, but the point is that it is a program that somewhere was starting
something that wouldn't go into the background. I was trying to start
multiple instances across a cluster using a script. Each node needed
two ssh commands to start two instances of Modeller. This true
embarrassing parallellism at it's best!
My example:
#!/bin/bash
CWD=`pwd`
ssh node001a "cd $CWD/01;mod8v2 01_vary_loop.py &"
ssh node001a "cd $CWD/02;mod8v2 02_vary_loop.py &"
ssh node002a "cd $CWD/03;mod8v2 03_vary_loop.py &"
and so on up to 40.
Well, the first ssh statement would not go into the background with a
"&" at the end, or a "2>&1 ./log &" at the end. I tried regular
expression tricks like \& and such but finally realized that some part
of the python process was holding on the the terminal and ssh would not
complete and return. I hope my explanation makes sense here! Anyway, I
found that using screen -d -m would start a virtual terminal for ssh to
run in and allow Modeller to run to completion while my script started
other instances.
So I ended up with this, which works:
#!/bin/bash
CWD=`pwd`
screen -d -m ssh node001a "cd $CWD/01;mod8v2 01_vary_loop.py"
screen -d -m ssh node001a "cd $CWD/02;mod8v2 02_vary_loop.py"
screen -d -m ssh node002a "cd $CWD/03;mod8v2 03_vary_loop.py"
I thought this was a neat solution for my problem and wanted to share
it. By the way, the cluster here has ssh setup so no password is
required and that is how the script can run without problems. I'm sure
there is a much better way to do this with the installed Torque and Maui
but I haven't figured out the job templates to start using them yet.
I'm still building my own stuff like this or using mpirun for the
parallel enabled programs. Modeller is not parallelized so most people
use a single random number seed and run one instance of the program for
a long time to generate lots of possible protein structure conformations
for their target. I wanted to turn around the job faster so use a
different random number for each instance of Modeller I run and then use
the whole cluster to run 40 instances of Modeller. I got 1000
structures in 1 hour rather than taking a couple of days on 1 CPU. I
just hadn't realized screen could spawn a command in the background
pre-detached and then exit when finished.
Thanks,
Dow
More information about the Ale
mailing list