bug with HTML::Parse, from libwww-perl-5.04

sondeen@isi.edu
Fri, 1 Nov 1996 10:44:39 -0800


i think an html construction like this:
	<B><font size=8>Java Blackjack</font size></B>
					^^^^^^^^^^
(where the /font tag is giving some parameter for some reason)
causes the parse_html routine in HTML::Parse, from libwww-perl-5.04
to find nothing (return no top level Element/node in:)
		  $ht_tree  = parse_html($content);
however, removing the "extraneous" size parameter in </font size> (to
be </font>) fixes the problem (parse_html works fine -- returns a
parse tree).

illustration:
on blackjack.html using: </font size>:
	% perl5 ~/bin/extract_links.p blackjack.html
	%

on blackjack.html using: </font>:
	% perl5 ~/bin/extract_links.p blackjack.html
	<a href="mailto:skister@us.oracle.com">
	<a href="scores.html">
	%


	% perl5 -v
This is perl, version 5.003 with EMBED
        built under sunos at Jul  1 1996 14:46:28
        + suidperl security patch

Copyright 1987-1996, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5.0 source kit.



here are the complete files:

blackjack.html: -------------------------------------------

Java Blackjack
Java Blackjack by Scott Kister. Last updated Mar 20 1996.

You may use key presses instead of the mouse for Hit/Stand/Deal/sPlit/...
You may also press 1-8 to bet that many units. One unit is the current bet amount.

The count is the Hi/Lo (2-6 = +1; T,A = -1). Press 'C' to hide/display the count

Note: Since Netscape does not resize the window, you should not change the resplit option to allow resplitting, or hands will be drawn off of the screen.

See the High Scores
Let me know of any problems or suggestions. extract_links.p: ----------------------------------------- #!/home/sondeen/bin/perl5 my $usage = "extract_links.p file [basefilename]\neg:\textract_links.p examples.html http://munkora.cs.mu.oz.au/%7Edutyprog/bash_intro/examples.html"; my $debug = 0; use strict; use URI::URL; use HTML::Parse; use HTML::Element; #use LWP::UserAgent; #use LWP::Simple qw(get); my $href = 0; my @anchors = qw; my $file = shift; while (substr($file,0,1) eq '-') { if ($file eq '-h') { $href = 1; } elsif ($file eq '-a') { @anchors = qw(a); } else { print "usage: $usage\n"; exit 1; } $file = shift; } if ($file eq '') { print "usage: $usage\n"; exit 1; } my $base = shift; if ($base ne '') { print &ashtml('a'); print "\"$base\">\n"; } &parsehtml($base,$file, ' '); exit 0; sub parsehtml { my($base,$file,$parse) = @_; my($content, @content, $ht_tree, $linkpair, $fqurl, %saw, @urls); if ($file eq '' && $base =~ /^http:/i) { $content = get($base); } elsif ($file ne '') { open(IN,$file) || die "can't open file: $file\n"; @content = ; $content = "@content"; close(IN); } die "nil content\n" if $content eq ''; $ht_tree = parse_html($content); if ($parse eq '') { print $ht_tree->as_HTML(); } else { foreach $linkpair (@{$ht_tree->extract_links(@anchors)}) { my($link,$elem) = @$linkpair; # if ($elem ne '') { # print "elem: "; # print $elem->as_HTML(); # } my $url = url($link,$base); # XXX not real base # push(@urls, $fqurl) my $tag = &ashtml($elem->tag); if ($href) { print "$fqurl\n" unless $saw{ $fqurl = eval { $url->abs } || $url->as_string }++; } else { print "$tag\"$fqurl\">\n" unless $saw{ $fqurl = eval { $url->abs } || $url->as_string }++; } } } # print join("\n", @urls), "\n"; } sub ashtml { my $tag = shift; if ($tag eq 'a') { return("