[ale] mass file modifcation
Jim Kinney
jim.kinney at gmail.com
Sun Mar 30 22:21:03 EDT 2008
So "cleaning up bad MS-HTML" did not include "unlink $crapfile". Your
patience and tolerance is astounding :-)
On Sun, Mar 30, 2008 at 6:19 PM, Mike Harrison <meuon at geeklabs.com> wrote:
> Jim
> > I need to update about 43k files and sed just won't cut it for this
> > task. What I need to do is replace 2 lines with 4 new ones, and the
> > lines contain URLs (backslashes, brackets, etc.). What I would like
> > to do is put the new text in a file and pass it and the search text to
> > some program that will modify all the files. Any ideas on whats
> > available to do that?
>
> I've not done as much of this as I used to fixing mailQ's and such
> at an ISP, but I always ended up doing it in PERL.
> Often with a switch for doing 10 files, writing the changed files
> in /tmp so I could manually verify them before bulk changing hundreds of
> thousands (or more) files. I'm not as good with find/sed/awk, but one of
> the reasons I was doing things like this on Perl is it worked well
> when there were lots of files in a single directory, and shell scripting
> couldn't handle the lists of files well.
>
> I also often found it easier to write and debug complex regex's in perl
> as several steps. Regex's are incredible, and powerful,
> and really easy to do things that you didn't realize with exceptions.
>
> I don't have my old perl scripts from those days,
>
> But they all had something like what is below (which cleans up bad
> MS-HTML):
> (note, the character encoding in the regex's didn't cut and past well into
> e-mail:
>
> -------------------------------------------------------------------------------------------
> opendir(INC,"$dd") ;
> print "Opening: $dd" ;
> @incfiles = readdir(INC) ;
> closedir INC ;
> foreach(sort @incfiles) {
> if(/^\./ ) { } else {
> if(/(.*).html/ ) {
> $file = $_ ;
> fixheader($file) ;
> #sleep 1 ; # let the server breath. Optional.
> } ;
> };
> } ;
>
> sub fixheader($file) {
> $page = '' ;
> $body = 'F' ;
> open(IN,"$dd/$file") ;
> while(<IN>) {
> if(/\<body/) { $body = "T" ; } ; #don't process headers..
> if($body eq "T") {
> $page .= $_ ;
> } ;
> } ; # end while IN
> close IN ;
> $page =~ s/M//g ; #deletes cr's
> $page =~ s/\&\#13;/[\[P\]\]/g ; #turns encoded CR's into <P>
> $page =~ s/\U/\[[li]]/g ; # NOTE X is Magic Char 95. Turns into
> bullets/listed items
> $page =~ s/\n//g ; # deletes lf's
> #lots more of these..
> open(OUT,">$dd/$file.new") ;
> print OUT $page
> close OUT ;
> } ;
>
>
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
>
--
--
James P. Kinney III
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.ale.org/pipermail/ale/attachments/20080330/f6c7fbb8/attachment.html
More information about the Ale
mailing list