<p dir="ltr">Ugh. Sounds like you'll need to do it stages. Coarse grain search written to new files and a fine grained search on those new files.</p>
<div class="gmail_quote">On Jul 29, 2014 6:08 PM, "Robert L. Harris" <<a href="mailto:robert.l.harris@gmail.com">robert.l.harris@gmail.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Unfortunately I can't touch the VM's configuration or the hardware<br>
underneath it. Supposedly I'm spread across a minimum of 6 "fast" disks<br>
already. I can't really go less than 10 files though as I am concerned<br>
with information being spread across multiple files. I was hoping someone<br>
knew a tool/util which would rip through the data faster I had not found<br>
yet.<br>
<br>
Robert<br>
<br>
<br>
<br>
On Tue, Jul 29, 2014 at 4:00 PM, Jim Kinney <<a href="mailto:jim.kinney@gmail.com">jim.kinney@gmail.com</a>> wrote:<br>
<br>
> unless you can spread that read/write load out over many, many spindles,<br>
> you're stuck. Now add in the VMmust access through the virtual drive<br>
> process and you've got another performance hit.<br>
><br>
> You _could_ add extra drives to the VM that are hosted on a decent array<br>
> (fiber channel or LA network iSCSI), copy the files to the new home in a<br>
> batch and hit the 4G RAM limit.<br>
><br>
> If possible, can you add more RAM to that VM?<br>
><br>
><br>
> On Tue, Jul 29, 2014 at 5:10 PM, Robert L. Harris <<br>
> <a href="mailto:robert.l.harris@gmail.com">robert.l.harris@gmail.com</a><br>
> > wrote:<br>
><br>
> > I'm working on a tool to parse through a lot of data for processing.<br>
> Right<br>
> > now it's taking longer than I wish it would so I'm trying to find ways to<br>
> > improve the performance. Right now it appears the biggest bottleneck is<br>
> > IO. I'm looking at about 2000 directories which contain between 1 and<br>
> 200<br>
> > files in tar.gz format on a VM with 4 Gigs of RAM. I need to load the<br>
> data<br>
> > into an array to do some pre-processing cleanup so I am currently<br>
> chopping<br>
> > the files in each of the directories into an array of groups of 10 files<br>
> at<br>
> > a time ( seems to be the sweet spot to prevent swap ) and then a straight<br>
> > forward loop of which each iteration executes:<br>
> ><br>
> > tar xzOf $Loop |<br>
> ><br>
> > and then pushes it into my array for processing.<br>
> ><br>
> > I have tried:<br>
> ><br>
> > gzcat $Loop | tar xO |<br>
> ><br>
> > which is actually slower. Yes, I'm at the point of trying to squeeze<br>
> > seconds of time out of a group. Any thoughts of a method which might be<br>
> > quicker?<br>
> ><br>
> > Robert<br>
> ><br>
> ><br>
> ><br>
> ><br>
> ><br>
> ><br>
> ><br>
> > --<br>
> > :wq!<br>
> ><br>
> ---------------------------------------------------------------------------<br>
> > Robert L. Harris<br>
> ><br>
> > DISCLAIMER:<br>
> > These are MY OPINIONS With Dreams To Be A King,<br>
> > ALONE. I speak for First One Should Be A Man<br>
> > no-one else. - Manowar<br>
> > -------------- next part --------------<br>
> > An HTML attachment was scrubbed...<br>
> > URL: <<br>
> ><br>
> <a href="http://mail.ale.org/pipermail/ale/attachments/20140729/38cb3da3/attachment.html" target="_blank">http://mail.ale.org/pipermail/ale/attachments/20140729/38cb3da3/attachment.html</a><br>
> > ><br>
> > _______________________________________________<br>
> > Ale mailing list<br>
> > <a href="mailto:Ale@ale.org">Ale@ale.org</a><br>
> > <a href="http://mail.ale.org/mailman/listinfo/ale" target="_blank">http://mail.ale.org/mailman/listinfo/ale</a><br>
> > See JOBS, ANNOUNCE and SCHOOLS lists at<br>
> > <a href="http://mail.ale.org/mailman/listinfo" target="_blank">http://mail.ale.org/mailman/listinfo</a><br>
> ><br>
><br>
><br>
><br>
> --<br>
> --<br>
> James P. Kinney III<br>
><br>
> Every time you stop a school, you will have to build a jail. What you gain<br>
> at one end you lose at the other. It's like feeding a dog on his own tail.<br>
> It won't fatten the dog.<br>
> - Speech 11/23/1900 Mark Twain<br>
><br>
><br>
> *<a href="http://heretothereideas.blogspot.com/" target="_blank">http://heretothereideas.blogspot.com/</a><br>
> <<a href="http://heretothereideas.blogspot.com/" target="_blank">http://heretothereideas.blogspot.com/</a>>*<br>
> -------------- next part --------------<br>
> An HTML attachment was scrubbed...<br>
> URL: <<br>
> <a href="http://mail.ale.org/pipermail/ale/attachments/20140729/385b6337/attachment.html" target="_blank">http://mail.ale.org/pipermail/ale/attachments/20140729/385b6337/attachment.html</a><br>
> ><br>
> _______________________________________________<br>
> Ale mailing list<br>
> <a href="mailto:Ale@ale.org">Ale@ale.org</a><br>
> <a href="http://mail.ale.org/mailman/listinfo/ale" target="_blank">http://mail.ale.org/mailman/listinfo/ale</a><br>
> See JOBS, ANNOUNCE and SCHOOLS lists at<br>
> <a href="http://mail.ale.org/mailman/listinfo" target="_blank">http://mail.ale.org/mailman/listinfo</a><br>
><br>
<br>
<br>
<br>
--<br>
:wq!<br>
---------------------------------------------------------------------------<br>
Robert L. Harris<br>
<br>
DISCLAIMER:<br>
These are MY OPINIONS With Dreams To Be A King,<br>
ALONE. I speak for First One Should Be A Man<br>
no-one else. - Manowar<br>
-------------- next part --------------<br>
An HTML attachment was scrubbed...<br>
URL: <<a href="http://mail.ale.org/pipermail/ale/attachments/20140729/e382a9b2/attachment.html" target="_blank">http://mail.ale.org/pipermail/ale/attachments/20140729/e382a9b2/attachment.html</a>><br>
_______________________________________________<br>
Ale mailing list<br>
<a href="mailto:Ale@ale.org">Ale@ale.org</a><br>
<a href="http://mail.ale.org/mailman/listinfo/ale" target="_blank">http://mail.ale.org/mailman/listinfo/ale</a><br>
See JOBS, ANNOUNCE and SCHOOLS lists at<br>
<a href="http://mail.ale.org/mailman/listinfo" target="_blank">http://mail.ale.org/mailman/listinfo</a><br>
</blockquote></div>