HTML::Parser->end($tag,$origtext) would be useful.

Brian McCauley (B.A.McCauley@bham.ac.uk)
Thu, 11 Dec 1997 16:36:37 +0000 (GMT)


According to libwww-perl-5.17/README, bug reports and suggestions for
improvements can be sent to the <libwww-perl@ics.uci.edu> mailing
list.  So here goes even though I'm not a subscriber.  (If I've broken
netiquette by posting here without subscribing please tell me.)

I want to parse some HTML and pass most of it through completely
unchanged making just a few alterations.  The trouble is that the
obvious way to do this using HTML::Parser has the side-effect of
downcasing all end tags and removing any whitespace embeded in end
tags.

To make what I want to do possible I had to change Parser.pm.  Would
it be possible for this change to find it's way back into the sources?

Thanks in advance.

--- libwww-perl-5.17/lib/HTML/Parser.pm	Fri Feb 21 09:32:14 1997
+++ /usr/lib/perl5/site_perl/HTML/Parser.pm	Thu Dec 11 15:59:08 1997
@@ -59,10 +59,11 @@
 original HTML text.
 
 
-=item $self->end($tag)
+=item $self->end($tag,$origtext)
 
 This method is called when an end tag has been recognized.  The
-argument is the lower case tag name.
+first argument is the lower case tag name, the second the original
+HTML text of the tag.
 
 =item $self->text($text)
 
@@ -227,8 +228,8 @@
 	# Then, look for a end tag
 	} elsif ($$buf =~ s|^</||) {
 	    # end tag
-	    if ($$buf =~ s|^([a-zA-Z][a-zA-Z0-9\.\-]*)\s*>||) {
-		$self->end(lc($1));
+	    if ($$buf =~ s|^([a-zA-Z][a-zA-Z0-9\.\-]*)(\s*>)||) {
+		$self->end(lc($1),"</$1$2");
 	    } elsif ($$buf =~ m|^[a-zA-Z]*[a-zA-Z0-9\.\-]*\s*$|) {
 		$$buf = "</" . $$buf;  # need more data to be sure
 		return $self;
@@ -364,7 +365,7 @@
 
 sub end
 {
-    my($self, $tag) = @_;
+    my($self, $tag, $origtext) = @_;
 }
 
 1;