HTML::Parser->end($tag,$origtext) would be useful.
Brian McCauley (B.A.McCauley@bham.ac.uk)
Thu, 11 Dec 1997 16:36:37 +0000 (GMT)
According to libwww-perl-5.17/README, bug reports and suggestions for
improvements can be sent to the <libwww-perl@ics.uci.edu> mailing
list. So here goes even though I'm not a subscriber. (If I've broken
netiquette by posting here without subscribing please tell me.)
I want to parse some HTML and pass most of it through completely
unchanged making just a few alterations. The trouble is that the
obvious way to do this using HTML::Parser has the side-effect of
downcasing all end tags and removing any whitespace embeded in end
tags.
To make what I want to do possible I had to change Parser.pm. Would
it be possible for this change to find it's way back into the sources?
Thanks in advance.
--- libwww-perl-5.17/lib/HTML/Parser.pm Fri Feb 21 09:32:14 1997
+++ /usr/lib/perl5/site_perl/HTML/Parser.pm Thu Dec 11 15:59:08 1997
@@ -59,10 +59,11 @@
original HTML text.
-=item $self->end($tag)
+=item $self->end($tag,$origtext)
This method is called when an end tag has been recognized. The
-argument is the lower case tag name.
+first argument is the lower case tag name, the second the original
+HTML text of the tag.
=item $self->text($text)
@@ -227,8 +228,8 @@
# Then, look for a end tag
} elsif ($$buf =~ s|^</||) {
# end tag
- if ($$buf =~ s|^([a-zA-Z][a-zA-Z0-9\.\-]*)\s*>||) {
- $self->end(lc($1));
+ if ($$buf =~ s|^([a-zA-Z][a-zA-Z0-9\.\-]*)(\s*>)||) {
+ $self->end(lc($1),"</$1$2");
} elsif ($$buf =~ m|^[a-zA-Z]*[a-zA-Z0-9\.\-]*\s*$|) {
$$buf = "</" . $$buf; # need more data to be sure
return $self;
@@ -364,7 +365,7 @@
sub end
{
- my($self, $tag) = @_;
+ my($self, $tag, $origtext) = @_;
}
1;