[ale] Mass changing of file extension

mike at trausch.us mike at trausch.us
Mon Sep 17 13:28:25 EDT 2012


On 09/14/2012 08:35 PM, Jay Lozier wrote:
> LO Calc (the spreadsheet being used) allows you to change the delimiter
> of the csv file. I use this feature often to change the delimiter to a
> tab or semicolon. Sometimes commas are used a punctuation within the
> dataset (XYX Corp, Inc. in a company field) which can cause problems
> when importing into spreadsheet or database. Importing into
> MySQL/MariaDB requires one to specify the delimiter used as well as the
> file type.

LO Calc handles commas in fields properly, **IF** the CSV file is RFC
compliant.

MySQL emits utter CRAP that is not and CAN NOT be RFC compliant.

I had to take a crapton of data out of a MySQL database about three
weeks ago and generate valid CSV from it... the MySQL docs say "do this,
and you get CSV!" except that it totally screws it up.  It insists on
quoting things even though they needn't be, and this screws up a lot of
programs that expect proper RFC-compliant CSV input.

For example:

Name: ACME, Inc.
Gobbledygook: Foo\nBar\nBaz! "Hello, too!"
Neenerneener: NULL

MySQL emits this as:

"ACME\, Inc.","Foo\
Bar\
Baz! \"Hello\, too!\"",\N

Properly, it should be:

"ACME, Inc.","Foo
Bar
Baz! ""Hello, too!""",

CSV can get very ugly in its simplicity, but I have a parser now that
reads proper and MySQL-fscked variants...

CSV is, surprisingly, when properly written, robust enough to carry
anything, including binary data.  Of course, you need a proper parser,
too, and for large data sets, I'd not use LO Calc... it can easily use
5x the RAM for a CSV file, and if that's a 600 MB CSV document... have fun!

	--- Mike

-- 
A man who reasons deliberately, manages it better after studying Logic
than he could before, if he is sincere about it and has common sense.
                                   --- Carveth Read, “Logic”

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 726 bytes
Desc: OpenPGP digital signature
Url : http://mail.ale.org/pipermail/ale/attachments/20120917/a8ca6a2e/attachment-0001.bin 


More information about the Ale mailing list