[ale] Parsing CSV file in perl
J o n K e t t e n h o f e n
jonkettenhofen at yahoo.com
Fri Jul 8 15:28:33 EDT 2005
OK, I'm a perl novice, but here goes:
I'm not sure about the format of your data file, but I'm assuming that the
line numbers aren't there. I looked for a simpler solution than the previous
two suggestions because I'm not sure what you wanted to do with the result.
See if this is closer to what you want:
#!/usr/bin/perl -w
open TXT, "< datafile" or die "Can't open datafile : $!";
while(<TXT>) {
if ( m/^[ ]*\d+[ ]+/ ) {
print "$_\n" or die "print failed: $!";
} elsif ( m/^[ ]*"ID"[ ]+/ ) {
# do nothing - don't print this line
} else {
print ">>missing ID on this line: $_";
}
}
Note: the above code includes an invisible tab in each of the [ ]+ character
classes. There were no tabs in the email you posted, so I chose to compensate
in the code. I did not understand your regex - see comment below.
For the "else" you could choose to not print the line or ?
If you want to actually parse the line to test other fields for being empty,
I've seen some scripts online
that use ParseWords. I could not find any reference to a perl command that
assigns tokens into positional parameters (ala $1 $2 $3 etc.) in the way
that set does in the Korn shell.
>I'm trying to parse a CSV file in perl and I'm having a issue with some
>of the columns being blank.
>
>Here is a sample piece of data.
>
>Id LASTNAME FIRSTNAME
> Adams Portia
>10572 Alexander Robert
>
>You can see that the first row does not have an ID. This can be true
>for all columns. They may or may not have values.
>
>Here is how I'm trying the parse:
>
>open TXT, "< Expanded_2005_Select_1.csv";
>while(<TXT>) {
> m/^(\d+?)\t/;
OK, in this regex, I don't believe the parens are necessary (is this
Perl 5 or 6?)
and the \d+? is not as clear as \d*. Did you mean to put "(\d+)?" ?
Not sure what you were thinking here.
> print "$1\n";
Perl on my Mac OS X barfed on the print statement.
>}
>
>Each columns is tab delimeted. When I run this I get the lastname in $1
>for the first line and the the ID in $1 for the second line. I need to
>somehow create a regex that would be unforgiving of nothing being there.
>
>Data file looks like this:
> 1 "ID" "LASTNAME" "FIRSTNAME" "TITLE" "COMPANY"
>"ADDRESS " "ADDRESS2" "CITY" "STATE" "ZIPCODE"
>"COUNTRY" "PHONE" "EMAIL" "REGTYPE" "DATE" "TIME"
>"Question1" "Questio n2" "Question3" "READERID"
> 2 "Adams" "Portia" "Director" "The Rockefeller
>Univers ity" "1230 York Ave " "New York"
>"NY" "10021-6 399" "USA" 2123277719
>"adams at rockefeller.edu" "Member"
> 3 10572 "Alexander" "Robert" "Manager Voice & Video
>Solution" "Air Products and Chemicals, Inc" "7201
>Hamilton Blvd" "Allentown" "PA" "18195-1501"
>"USA" "610-481-7156" "alexanrw at airproducts.com" "Member"
>06/12/2005 06:06:14 pm 60711
>
>The 1,2,3 that you see is the line numbers in VI
>
>
>_______________________________________________
>Ale mailing list
>Ale at ale.org
>http://www.ale.org/mailman/listinfo/ale
More information about the Ale
mailing list