RE: Simple Tag Strip - Striphtml ?
Bob Webb (bob.webb@snyder.com)
Mon, 16 Mar 1998 09:58:59 -0500
For future reference. No reply necessary.
Thanks,
Bob/
> -----Original Message-----
> From: Robin Houston [SMTP:robin@oneworld.org]
> Sent: Monday, March 16, 1998 6:10 AM
> To: Bob Webb; 'libwww-perl@ics.uci.edu'
> Subject: Re: Simple Tag Strip - Striphtml ?
>
> At 01:45 PM 3/13/98 -0500, Bob Webb wrote:
>
> >I found a script written by Tom Christiansen called striphtml.
> >Unfortunately, I get an error when running the script, and I was
> >wondering if any regular expression folks out there could help me
> out.
>
> That's due to a regex bug-fix which went into 5.004 (IIRC).
> You can avoid the problem using this variation to the snippet you
> gave:
>
> s{ < # opening angle bracket
> (?: # Non-backreffing grouping paren
> [^>'"] + # 1 or more things that are neither > nor 'nor
> "
> | # or else
> ".*?" # a section between double quotes (stingy
> match)
> | # or else
> '.*?' # a section between single quotes (stingy
> match)
> ) * # repetire ad libitum
> # hm.... are null tags <> legal? XXX
> > # closing angle bracket
> }{}gsx; # mutate into nada, nothing, and niente
>
>
> .robin.