Simple Tag Strip - Striphtml ?

Bob Webb (bob.webb@snyder.com)
Fri, 13 Mar 1998 13:45:13 -0500


I found a script written by Tom Christiansen called striphtml.
Unfortunately, I get an error when running the script, and I was
wondering if any regular expression folks out there could help me out.

When I run the script: 'cat htmlfile | perl striphtml', I get "regexp *+
operand could be empty at striphtml line 66".
htmlfile contains the following:
<!-- This is a comment -->
This is a test

Now is the time for all good men to...


The snippet of code that is in question is:
55: s{ <                    # opening angle bracket
56: 
57:     (?:                 # Non-backreffing grouping paren
58:          [^>'"] *       # 0 or more things that are neither > nor '
nor "
59:             |           #    or else
60:          ".*?"          # a section between double quotes (stingy
match)
61:             |           #    or else
62:          '.*?'          # a section between single quotes (stingy
match)
63:     ) +                 # repetire ad libitum
64:                         #  hm.... are null tags <> legal? XXX
65:    >                    # closing angle bracket
66: }{}gsx;                 # mutate into nada, nothing, and niente

The full source of Tom's script can be found on CPAN at
http://www.perl.com/CPAN-local/authors/Tom_Christiansen/scripts/striphtm
l.gz

I do have libwww, and have used HTML:Parse, but I am also looking for an
alternate way to do a simple "tag strip". 

Any help would appreciated.

Regards,
Bob