2nd try: reusing parsing of LinkExtor for FormatText
Aaron Nabil (nabil@teleport.com)
Tue, 8 Oct 1996 02:16:29 -0700 (PDT)
I'd like to use LinkExtor and then FormatText without re-parsing the html.
Here is a little code-let.
# get the document something like this
$ua = new LWP::UserAgent;
$req = new HTTP::Request 'GET' => $url;
$res = $ua->request($req);
( . . . )
my $le_parser = HTML::LinkExtor->new(\&extor_cb, $base);
$le_parser->netscape_buggy_comment(1);
$le_html = $le_parser->parse($res->content);
# This part above works find and correctly calls extor_cb to process
# this links.
my $formatter = new HTML::FormatText;
# here's what I like to do, reusing the parser run.
# print $formatter->format($le_html);
# But it dies with...
# Can't locate object method "traverse" via package "HTML::LinkExtor" at
# /usr/local/perl/lib/site_perl/HTML/Formatter.pm line 65, <> chunk 1.
# according to the docs, this should work, and it does...
$html = parse_html($res->content);
print $formatter->format($html);
# but ends up parsing the file twice.
# the docs also say...
# parse_html($html, [$obj])
# This function is really just a synonym for $obj->parse($html) and $obj
# is assumed to be a subclass of HTML::Parser.
# The return value from parse_html() is $obj.
# so I thought I'd try this...
# my $parser = HTML::Parser->new;
# $parser->netscape_buggy_comment(1);
# $html = $parser->parse($res->content);
# print $formatter->format($html);
# which also dies with a Can't locate object method "traverse"
# But this code here works! (replacing HTML::Parser with HTML::TreeBuilder)
my $parser = HTML::TreeBuilder->new;
$parser->netscape_buggy_comment(1);
$html = $parser->parse($res->content);
print $formatter->format($html);
# but is of course still parsing the file twice.
# aparently TreeBuilder inherits most of it's brains from Parser, but
# gets things like "traverse" from Element. Is there a way to subclass
# LinkExtor so it knows about traverse as well, without breaking it
# or slowing it down?
--
Aaron Nabil
nabil@teleport.com