re: HTML parsing library

Madeline Gonzalez (madeline@bcn.boulder.co.us)
Wed, 1 Feb 1995 11:59:58 -0700


Help!  I'm having trouble trying to use Brooks' parsing library,
am new to perl and don't quite know how to proceed with this.
If anyone out there has been able to successfully use it, I would
sure appreciate hearing from you as to how you invoke the
HTML -> text routines in particular!    

I'm trying to use a slightly modified version of his testhtml, and
keep getting the following

process_html: Ignoring tag 'html'
process_html: Ignoring tag 'head'
process_html: Ignoring tag 'title'
process_html: Ignoring tag '/title'
process_html: Ignoring tag 'link'
process_html: Ignoring tag '/head'
process_html: Ignoring tag 'body'
Memory fault(coredump)
bcn>


?!
Could someone please suggest what might be the problem?  Or share with me 
a working sample test program with which to invoke the routines, so I  
can maybe figure out why mine isn't working?  (Attached is the test
program I'm using...)

Would sure appreciate any guidance on this!

frustratingly ;-(,
Madeline


---------------------------------------------------------------------
#!/usr/local/bin/perl

##########  test ##########

$url = shift(@ARGV);

unless($url) { die "requires argument of url\n"; }

require 'w3bbc_html.pl';

$libwww_perl_version = '0.30';

unshift(@INC, $ENV{'LIBWWW_PERL'} ||
        "/homes/bcutter/dev/tkperl/libwww/libwww-perl-$libwww_perl_version");
unshift(@INC,"/home/madeline/Perl/parser");
unshift(@INC,"/home/madeline/Perl/libwww-perl-0.30");
# print "INC is @INC\n";

require 'www.pl';

&www'set_def_header('http','User-Agent',"testhtml/0.1 $www'Library");


$response = &www'request('GET',$url,*headers,*content,30);
print "GOT PAGE, content = $content\n";

# set lists to routines that convert html -> text
&w3bbc_html'reset_tags('text');

print "before parse_html...\n";
# break up content into array pointed to by *links 
&w3bbc_html'parse_html(*links,$content);
print "after parse_html, array links is now...\n";
foreach (@links) {
	print;
}
print "\n";

%opts = (
#'_unknown','preserve', # preserve or unknown (def)
'_debug',  1,
);

print "before process_html, array %opts is now...\n";
#foreach (%opts) {
#	print;
#}
while ( ($K, $V) = each(%opts)) {
	print "key=$K, value =$V\n";
}
print "\n";
$processed = &w3bbc_html'process_html(*opts,*links);


print STDERR <<EOF;
                                     HTML
----------------------------------------------------------------------
$content
----------------------------------------------------------------------
                                Processed Text
----------------------------------------------------------------------
$processed
----------------------------------------------------------------------
EOF