RE: Simple Tag Strip - Striphtml ?

Bob Webb (bob.webb@snyder.com)
Mon, 16 Mar 1998 09:58:59 -0500


For future reference. No reply necessary.
Thanks,
Bob/

> -----Original Message-----
> From:	Robin Houston [SMTP:robin@oneworld.org]
> Sent:	Monday, March 16, 1998 6:10 AM
> To:	Bob Webb; 'libwww-perl@ics.uci.edu'
> Subject:	Re: Simple Tag Strip - Striphtml ?
> 
> At 01:45 PM 3/13/98 -0500, Bob Webb wrote:
> 
> >I found a script written by Tom Christiansen called striphtml.
> >Unfortunately, I get an error when running the script, and I was
> >wondering if any regular expression folks out there could help me
> out.
> 
> That's due to a regex bug-fix which went into 5.004 (IIRC).
> You can avoid the problem using this variation to the snippet you
> gave:
> 
> s{ <                    # opening angle bracket
>     (?:                 # Non-backreffing grouping paren
>          [^>'"] +       # 1 or more things that are neither > nor 'nor
> "
>             |           #    or else
>          ".*?"          # a section between double quotes (stingy
> match)
>             |           #    or else
>          '.*?'          # a section between single quotes (stingy
> match)
>     ) *                 # repetire ad libitum
>                         #  hm.... are null tags <> legal? XXX
>    >                    # closing angle bracket
> }{}gsx;                 # mutate into nada, nothing, and niente
> 
> 
>  .robin.