2nd try: reusing parsing of LinkExtor for FormatText

Aaron Nabil (nabil@teleport.com)
Tue, 8 Oct 1996 02:16:29 -0700 (PDT)


I'd like to use LinkExtor and then FormatText without re-parsing the html.

Here is a little code-let.

# get the document something like this

       $ua = new LWP::UserAgent;
       $req = new HTTP::Request 'GET' => $url;
       $res = $ua->request($req);

	( . . . )

                my $le_parser = HTML::LinkExtor->new(\&extor_cb, $base);
                $le_parser->netscape_buggy_comment(1);
                $le_html = $le_parser->parse($res->content);

# This part above works find and correctly calls extor_cb to process
# this links.

		my $formatter = new HTML::FormatText;

# here's what I like to do, reusing the parser run.
#		print $formatter->format($le_html);
# But it dies with...
# Can't locate object method "traverse" via package "HTML::LinkExtor" at 
# /usr/local/perl/lib/site_perl/HTML/Formatter.pm line 65, <> chunk 1.


# according to the docs, this should work, and it does...
		$html = parse_html($res->content);
		print $formatter->format($html);
# but ends up parsing the file twice.


# the docs also say...
#   parse_html($html, [$obj])
#   This function is really just a synonym for $obj->parse($html) and $obj
#   is assumed to be a subclass of HTML::Parser. 
#   The return value from parse_html() is $obj.
# so I thought I'd try this...

#		my $parser = HTML::Parser->new;
#		$parser->netscape_buggy_comment(1);
#		$html = $parser->parse($res->content);
#               print $formatter->format($html);

# which also dies with a Can't locate object method "traverse"

# But this code here works!  (replacing HTML::Parser with HTML::TreeBuilder)

		my $parser = HTML::TreeBuilder->new;
		$parser->netscape_buggy_comment(1);
		$html = $parser->parse($res->content);
		print $formatter->format($html);

# but is of course still parsing the file twice.

# aparently TreeBuilder inherits most of it's brains from Parser, but
# gets things like "traverse" from Element.  Is there a way to subclass
# LinkExtor so it knows about traverse as well, without breaking it
# or slowing it down?



-- 
Aaron Nabil
nabil@teleport.com