[ale] perl/postgresql question

Jim Kinney jim.kinney at gmail.com
Fri Apr 1 08:47:23 EDT 2011


Try a single table dump then feed through perl and output to a file.  Now
attempt the pgrestore to find which table is choking. I'm starting to think
the problem is the restore not the perl.
I've never had to do a character translation like this before on a database.
My pg stuff always goes to files as an intermediate step unless it's an
emergency replication.
Need to do a search for the offending character in the pgdump.
On Apr 1, 2011 7:45 AM, "Geoffrey Myers" <lists at serioustechnology.com>
wrote:
> Jim Kinney wrote:
>> I was searching for a clue and saw the post on the pg list :-) Got all
>> excited thinking I had found a solution to point to you to! Then I
>> realized, "Hey! This looks REALLY similar..."
>>
>> Maybe a single run then a grep for non-ascii characters?
>
> Not sure what you're suggesting. The perl code is pulling the data out
> of the database, cleaning it, putting it back in. Are you suggesting
> pulling the data out and piping to grep? Are you thinking there's a bug
> in the perl code?
>
>>
>> On Thu, Mar 31, 2011 at 4:11 PM, Geoffrey Myers
>> <lists at serioustechnology.com <mailto:lists at serioustechnology.com>> wrote:
>>
>> Jim Kinney wrote:
>> > The "g" _should_ do it. Quite puzzling.
>> > Can you do a run and find an example of when it fails? Is it
>> always 0xbd?
>>
>> The sad thing is, all it tells me is which table in the database has the
>> error. It does not identify the field. I've posted to the Postgres
>> list to see if there's any way to narrow down the culprit, there is not.
>>
>> Once it fails, it stops loading that table, so I never get pass this
>> error.
>>
>> When I first caught this error, I was simply trying to load an SQL_ASCII
>> database dump into a UTF8 database. That's when I wrote my
>> cleandata.pl <http://cleandata.pl>
>> code. The code cleans 62750 records. I just don't understand why it
>> doesn't fix this one.
>>
>> > Maybe someone in Tibet doesn't like your code :-) (It's a Tibetan
>> > character code in utf8)
>>
>> But what is it in SQL_ASCII? What happened is, some enterprising users
>> figured out how to enter special characters from the keyboard, say, like
>> a degrees symbol. That goes in to the SQL_ASCII database fine, but when
>> you try to load that into the UTF8 encoded database, it pukes.
>>
>> >
>> > On Thu, Mar 31, 2011 at 2:59 PM, Geoffrey Myers
>> > <lists at serioustechnology.com <mailto:lists at serioustechnology.com>
>> <mailto:lists at serioustechnology.com
>> <mailto:lists at serioustechnology.com>>> wrote:
>> >
>> > Jim Kinney wrote:
>> > > should it be replaced with nothing or with a NULL? Does
>> the perl
>> > pointer
>> > > jump to the next character automatically when a match is
>> made thus
>> > > skipping a character in your replace? Does a double run
>> fix it?
>> >
>> > Missed the first question. It should be removed, not
>> replaced with an
>> > null. For example I have:
>> >
>> > This is my <UGLY CHARACTER>data
>> >
>> > Should then be:
>> >
>> > This is my data
>> >
>> > >
>> > > On Thu, Mar 31, 2011 at 12:04 PM, Geoffrey Myers
>> > > <lists at serioustechnology.com
>> <mailto:lists at serioustechnology.com>
>> <mailto:lists at serioustechnology.com
>> <mailto:lists at serioustechnology.com>>
>> > <mailto:lists at serioustechnology.com
>> <mailto:lists at serioustechnology.com>
>> > <mailto:lists at serioustechnology.com
>> <mailto:lists at serioustechnology.com>>>> wrote:
>> > >
>> > > So, we are trying to convert a number of postgresql
>> databases
>> > that were
>> > > created with SQL_ASCII encoding to UTF8 encoding. As
>> such, I
>> > need to
>> > > strip certain characters out of the data before
>> dumping and
>> > reloading
>> > > the new databases. I'm using the following:
>> > >
>> > > data =~ s/(.)/((ord($1) >= 0) && (ord($1) <= 8))
>> > > || (ord($1) == 11)
>> > > || ((ord($1) >= 13) && (ord($1) <= 31))
>> > > || ((ord($1) >= 127)) ?"": $1/egs;
>> > >
>> > > Yet I'm getting the following error, which indicates a
>> > character that
>> > > should be handled by the above code is still in the data:
>> > >
>> > > pg_restore: [archiver (db)] COPY failed: ERROR:
>> invalid byte
>> > sequence
>> > > for encoding "UTF8": 0xbd
>> > >
>> > >
>> > > Certainly, the above code should replace the 0xbd with
>> nothing?
>> > >
>> > > Any perl/postgres gurus out there?
>> > >
>> > > --
>> > > Until later, Geoffrey
>> > >
>> > > "I predict future happiness for America if they can
>> prevent
>> > > the government from wasting the labors of the people under
>> > > the pretense of taking care of them."
>> > > - Thomas Jefferson
>> > > _______________________________________________
>> > > Ale mailing list
>> > > Ale at ale.org <mailto:Ale at ale.org> <mailto:Ale at ale.org
>> <mailto:Ale at ale.org>> <mailto:Ale at ale.org <mailto:Ale at ale.org>
>> > <mailto:Ale at ale.org <mailto:Ale at ale.org>>>
>> > > http://mail.ale.org/mailman/listinfo/ale
>> > > See JOBS, ANNOUNCE and SCHOOLS lists at
>> > > http://mail.ale.org/mailman/listinfo
>> > >
>> > >
>> > >
>> > >
>> > > --
>> > > --
>> > > James P. Kinney III
>> > > I would rather stumble along in freedom than walk
>> effortlessly in
>> > chains.
>> > >
>> > >
>> > >
>> > >
>> >
>> ------------------------------------------------------------------------
>> > >
>> > > _______________________________________________
>> > > Ale mailing list
>> > > Ale at ale.org <mailto:Ale at ale.org> <mailto:Ale at ale.org
>> <mailto:Ale at ale.org>>
>> > > http://mail.ale.org/mailman/listinfo/ale
>> > > See JOBS, ANNOUNCE and SCHOOLS lists at
>> > > http://mail.ale.org/mailman/listinfo
>> >
>> >
>> > --
>> > Until later, Geoffrey
>> >
>> > "I predict future happiness for America if they can prevent
>> > the government from wasting the labors of the people under
>> > the pretense of taking care of them."
>> > - Thomas Jefferson
>> > _______________________________________________
>> > Ale mailing list
>> > Ale at ale.org <mailto:Ale at ale.org> <mailto:Ale at ale.org
>> <mailto:Ale at ale.org>>
>> > http://mail.ale.org/mailman/listinfo/ale
>> > See JOBS, ANNOUNCE and SCHOOLS lists at
>> > http://mail.ale.org/mailman/listinfo
>> >
>> >
>> >
>> >
>> > --
>> > --
>> > James P. Kinney III
>> > I would rather stumble along in freedom than walk effortlessly in
>> chains.
>> >
>> >
>> >
>> >
>> ------------------------------------------------------------------------
>> >
>> > _______________________________________________
>> > Ale mailing list
>> > Ale at ale.org <mailto:Ale at ale.org>
>> > http://mail.ale.org/mailman/listinfo/ale
>> > See JOBS, ANNOUNCE and SCHOOLS lists at
>> > http://mail.ale.org/mailman/listinfo
>>
>>
>> --
>> Until later, Geoffrey
>>
>> "I predict future happiness for America if they can prevent
>> the government from wasting the labors of the people under
>> the pretense of taking care of them."
>> - Thomas Jefferson
>> _______________________________________________
>> Ale mailing list
>> Ale at ale.org <mailto:Ale at ale.org>
>> http://mail.ale.org/mailman/listinfo/ale
>> See JOBS, ANNOUNCE and SCHOOLS lists at
>> http://mail.ale.org/mailman/listinfo
>>
>>
>>
>>
>> --
>> --
>> James P. Kinney III
>> I would rather stumble along in freedom than walk effortlessly in chains.
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Ale mailing list
>> Ale at ale.org
>> http://mail.ale.org/mailman/listinfo/ale
>> See JOBS, ANNOUNCE and SCHOOLS lists at
>> http://mail.ale.org/mailman/listinfo
>
>
> --
> Until later, Geoffrey
>
> "I predict future happiness for America if they can prevent
> the government from wasting the labors of the people under
> the pretense of taking care of them."
> - Thomas Jefferson
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.ale.org/pipermail/ale/attachments/20110401/a3787026/attachment.html 


More information about the Ale mailing list