[ale] sed regexp question
Joseph A. Knapka
jknapka at earthlink.net
Tue Jul 10 20:14:41 EDT 2001
Wandered Inn wrote:
>
> "Joseph A. Knapka" wrote:
> >
> > Christopher Bergeron wrote:
> > >
> > > That would only get websites that start with www; I can't predict all the
> > > possible names that might arise. i do know that the url is always encoded
> > > in a page as:
> > >
> > > <A HREF="http://xxx.pornsite.com/pictures1.html/">
> > >
> > > so, all I need to do is take everything between the "http:// and the ">
> > >
> > > any suggestions?
> >
> > Here's a briefish Tcl script that will do it:
>
> You've heard people scream FOOD FIGHT, well, LANGUAGE WAR!!
>
> Perl one liner, I think, at least it worked for my data file. Put it
> into a script and execute it by passing the file(s) as command line
> arguments. This works for any case combination of the string 'href=':
Mine too.
> #!/usr/bin/perl
>
> while (<>) {
>
> chomp;
> /[Hh][Rr][Ee][Ff]=/ && printf "%s\n", substr($_, index($_, "=")
> + 2,
> index($_, ">") - index($_, "=") - 3);
> }
Hey, you used newlines! I can see you're a Perl stylist committed
to producing maintainable code. :-)
Your version seems to only get the first URL on a line.
Mine gets 'em all. Therefore Tcl is obviously superior!
<wiping foam from mouth with page torn from "Programming Perl">
-- Joe Knapka
"You know how many remote castles there are along the gorges? You
can't MOVE for remote castles!" -- Lu Tze re. Uberwald
// Linux MM Documentation in progress:
// http://home.earthlink.net/~jknapka/linux-mm/vmoutline.html
* Evolution is an "unproven theory" in the same sense that gravity is. *
--
To unsubscribe: mail majordomo at ale.org with "unsubscribe ale" in message body.
More information about the Ale
mailing list