i think an html construction like this:
<B><font size=8>Java Blackjack</font size></B>
^^^^^^^^^^
(where the /font tag is giving some parameter for some reason)
causes the parse_html routine in HTML::Parse, from libwww-perl-5.04
to find nothing (return no top level Element/node in:)
$ht_tree = parse_html($content);
however, removing the "extraneous" size parameter in </font size> (to
be </font>) fixes the problem (parse_html works fine -- returns a
parse tree).
illustration:
on blackjack.html using: </font size>:
% perl5 ~/bin/extract_links.p blackjack.html
%
on blackjack.html using: </font>:
% perl5 ~/bin/extract_links.p blackjack.html
<a href="mailto:skister@us.oracle.com">
<a href="scores.html">
%
% perl5 -v
This is perl, version 5.003 with EMBED
built under sunos at Jul 1 1996 14:46:28
+ suidperl security patch
Copyright 1987-1996, Larry Wall
Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5.0 source kit.
here are the complete files:
blackjack.html: -------------------------------------------
The count is the Hi/Lo (2-6 = +1; T,A = -1). Press 'C' to hide/display the count
Note: Since Netscape does not resize the window, you should not change the resplit option to allow resplitting, or hands will be drawn off of the screen.
See the High Scores
Let me know of any problems or suggestions.
extract_links.p: -----------------------------------------
#!/home/sondeen/bin/perl5
my $usage = "extract_links.p file [basefilename]\neg:\textract_links.p examples.html http://munkora.cs.mu.oz.au/%7Edutyprog/bash_intro/examples.html";
my $debug = 0;
use strict;
use URI::URL;
use HTML::Parse;
use HTML::Element;
#use LWP::UserAgent;
#use LWP::Simple qw(get);
my $href = 0;
my @anchors = qw;
my $file = shift;
while (substr($file,0,1) eq '-') {
if ($file eq '-h') {
$href = 1;
} elsif ($file eq '-a') {
@anchors = qw(a);
} else {
print "usage: $usage\n";
exit 1;
}
$file = shift;
}
if ($file eq '') {
print "usage: $usage\n";
exit 1;
}
my $base = shift;
if ($base ne '') {
print &ashtml('a');
print "\"$base\">\n";
}
&parsehtml($base,$file, ' ');
exit 0;
sub parsehtml {
my($base,$file,$parse) = @_;
my($content, @content, $ht_tree, $linkpair, $fqurl, %saw, @urls);
if ($file eq '' && $base =~ /^http:/i) {
$content = get($base);
} elsif ($file ne '') {
open(IN,$file) || die "can't open file: $file\n";
@content =