[ale] Parsing CSV file in perl

Fri Jul 8 15:28:33 EDT 2005

OK, I'm a perl novice, but here goes:
I'm not sure about the format of your data file, but I'm assuming that the
line numbers aren't there.  I looked for a simpler solution than the previous
two suggestions because I'm not sure what you wanted to do with the result.
See if this is closer to what you want:

#!/usr/bin/perl -w

open TXT, "< datafile" or die "Can't open datafile : $!";
while(<TXT>) {
         if ( m/^[ ]*\d+[        ]+/ ) {
             print "$_\n" or die "print failed: $!";
         } elsif ( m/^[ ]*"ID"[  ]+/ ) {
             # do nothing - don't print this line
         } else {
             print ">>missing ID on this line:  $_";
         }
}

Note: the above code includes an invisible tab in each of the [	]+ character
classes.  There were no tabs in the email you posted, so I chose to compensate
in the code.  I did not understand your regex - see comment below.

For the "else" you could choose to not print the line or ?
If you want to actually parse the line to test other fields for being empty,
I've seen some scripts online
that use ParseWords.  I could not find any reference to a perl command that
assigns tokens into positional parameters (ala $1 $2 $3 etc.) in the way
that set does in the Korn shell.

>I'm trying to parse a CSV file in perl and I'm having a issue with some
>of the columns being blank.
>
>Here is a sample piece of data.
>
>Id    LASTNAME    FIRSTNAME
>       Adams       Portia
>10572 Alexander   Robert
>
>You can see that the first row does not have an ID.  This can be true
>for all columns.  They may or may not have values. 
>
>Here is how I'm trying the parse:
>
>open TXT, "< Expanded_2005_Select_1.csv";
>while(<TXT>) {
>         m/^(\d+?)\t/;

OK, in this regex, I don't believe the parens are necessary (is this 
Perl 5 or 6?)
and the \d+? is not as clear as \d*.  Did you mean to put "(\d+)?" ?
Not sure what you were thinking here.

>         print "$1\n";

Perl on my Mac OS X barfed on the print statement.

>}
>
>Each columns is tab delimeted.  When I run this I get the lastname in $1
>for the first line and the the ID in $1 for the second line.  I need to
>somehow create a regex that would be unforgiving of nothing being there.
>
>Data file looks like this:
>       1 "ID"    "LASTNAME"      "FIRSTNAME"     "TITLE" "COMPANY"
>"ADDRESS        "       "ADDRESS2"      "CITY"  "STATE" "ZIPCODE"
>"COUNTRY"               "PHONE" "EMAIL" "REGTYPE"       "DATE"  "TIME"
>"Question1"     "Questio        n2"     "Question3"     "READERID"
>       2         "Adams" "Portia"        "Director"      "The Rockefeller
>Univers        ity"    "1230 York Ave "                "New York"
>"NY"    "10021-6        399"    "USA"   2123277719
>"adams at rockefeller.edu" "Member"               
>       3 10572   "Alexander"     "Robert"        "Manager Voice & Video
>Solution"                "Air Products and Chemicals, Inc"       "7201
>Hamilton Blvd"                    "Allentown"     "PA"    "18195-1501"
>"USA"   "610-481-7156"          "alexanrw at airproducts.com"      "Member"
>06/12/2005      06:06:14         pm                             60711
>
>The 1,2,3 that you see is the line numbers in VI
>
>
>_______________________________________________
>Ale mailing list
>Ale at ale.org
>http://www.ale.org/mailman/listinfo/ale