<div dir="ltr"><div><br></div>I'm working on a tool to parse through a lot of data for processing. Right now it's taking longer than I wish it would so I'm trying to find ways to improve the performance. Right now it appears the biggest bottleneck is IO. I'm looking at about 2000 directories which contain between 1 and 200 files in tar.gz format on a VM with 4 Gigs of RAM. I need to load the data into an array to do some pre-processing cleanup so I am currently chopping the files in each of the directories into an array of groups of 10 files at a time ( seems to be the sweet spot to prevent swap ) and then a straight forward loop of which each iteration executes:<div>
<br></div><div> tar xzOf $Loop |</div><div><br></div><div>and then pushes it into my array for processing.</div><div><br></div><div>I have tried:</div><div><br></div><div> gzcat $Loop | tar xO |</div><div><br></div><div>
which is actually slower. Yes, I'm at the point of trying to squeeze seconds of time out of a group. Any thoughts of a method which might be quicker?</div><div><br></div><div>Robert</div><div><br></div><div><br></div>
<div><br></div><div><br><br><br><div><div><br></div>-- <br>:wq!<br>---------------------------------------------------------------------------<br>Robert L. Harris<br><br>DISCLAIMER:<br> These are MY OPINIONS With Dreams To Be A King,<br>
ALONE. I speak for First One Should Be A Man<br> no-one else. - Manowar
</div></div></div>