[ale] Speed un-tar?

Robert L. Harris robert.l.harris at gmail.com
Tue Jul 29 17:10:27 EDT 2014


I'm working on a tool to parse through a lot of data for processing.  Right
now it's taking longer than I wish it would so I'm trying to find ways to
improve the performance.  Right now it appears the biggest bottleneck is
IO.  I'm looking at about 2000 directories which contain between 1 and 200
files in tar.gz format on a VM with 4 Gigs of RAM.  I need to load the data
into an array to do some pre-processing cleanup so I am currently chopping
the files in each of the directories into an array of groups of 10 files at
a time ( seems to be the sweet spot to prevent swap ) and then a straight
forward loop of which each iteration executes:

  tar xzOf $Loop |

and then pushes it into my array for processing.

I have tried:

 gzcat $Loop | tar xO |

which is actually slower.  Yes, I'm at the point of trying to squeeze
seconds of time out of a group.  Any thoughts of a method which might be
quicker?

Robert







-- 
:wq!
---------------------------------------------------------------------------
Robert L. Harris

DISCLAIMER:
      These are MY OPINIONS             With Dreams To Be A King,
       ALONE.  I speak for                      First One Should Be A Man
       no-one else.                                     - Manowar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ale.org/pipermail/ale/attachments/20140729/38cb3da3/attachment.html>


More information about the Ale mailing list