[ale] parallel processing
jeff hubbs
hbbs at mediaone.net
Mon Jan 7 18:40:24 EST 2002
Zyman, Andy wrote:
> Jeff,
> Thank You for reply.
> Yes, if I specify "&" the job will go in the background. So having a couple
> of background jobs is the answer.
> But the reason I was asking this is:
> 1. I don't know how many files the dir has, so I can't (?) specify
> "filesA .... fileB shuld be copied by this job "
> "filesC .... fileD should be copied by this job "
> ....
>
> I was thinking about this situation - cp ./dirA/* ./dirB
>
> The files are big enough to drive me nuts waiting when cp will be done (
> each about 5Gb -50GB X 10-15 files in diff. dirs)
> 2. To copy files, I'm creating the file with locations of these files and
> do
> "while read"
> loop to copy each one in a time ( which is not efficient :< )
> I can't really apply "&" here because I need to check that all files are
> copied before proceeding any farther - this is mean "control point" in this
> operation...
> So I was thinking about smth. else, but not background....
>
> Thank You
> Andy
> office: 212 849 3543
Looks like you're running up against the limitations of disk drives.
You're trying to interleave writes and reads within the same partition.
This is always a bad scenario, although trying to do the same thing
between two partitions on the same drive might be worse, almost
certainly if the partitions are on opposite ends of the drive.
My feeling is that trying to parallelize the file copies like we/you are
suggesting could give you a *slightly* faster experience than trying to
do it one at a time because the multiple processes will be fighting over
the drive, trying to both read and write, and to some degree
happenstance and the design of the drive, kernel disk I/O, and the file
system will wind up helping you a bit. However, you're never going to
really get much of an edge this way, IMHO.
Assuming that you don't have a second drive, do you have enough RAM that
you could create a ramdisk, copy the files to it, and then copy the
files from the ramdisk to the destination? Serializing what the drive
has to do could give you a faster overall experience. Drives love long,
sustained reads and writes.
If you have a choice of disk drives, pick the one with the fastest
spindle speed and/or the most physical heads (pay no attention to the
BIOS C/H/S data if the drive is even remotely recently - look up the
specs via Google). Drives love being able to read/write *across* the
heads as opposed to radially across the platters. I used to work with a
Compaq box with a 2.1GB Quantum Bigfoot drive - a weird, horrible
8"x5.25" contraption with only a single platter and two heads. It was
about like dealing with a laptop drive. A large internal cache is good
too; some drives I've got lying around don't have any at all, I don't
think.
I'm going to be facing this issue myself soon, as I've got a big stack
of drives that were bought from Microseconds at $1 each ranging from
60MB to 1GB, and I'll be using them as swap drives in boxes that will be
booting over the network. It's bad enough that I'll be having to rely
on swap drives, but I want to try to use the fastest ones of at all
possible.
You can use hdparm -t to get some feel for drive I/O speed but there's a
utility called bonnie (search freshmeat or google) that's a lot better.
- Jeff
---
This message has been sent through the ALE general discussion list.
See http://www.ale.org/mailing-lists.shtml for more info. Problems should be
sent to listmaster at ale dot org.
More information about the Ale
mailing list