HTML::Element|Parse add-on

Doug MacEachern (dougm@osf.org)
Wed, 21 Feb 1996 19:04:23 -0500


I just wanted a simple method to pull the title out of a parsed document,
instead of using $html->traverse(sub{...}).   But, I ended up with a little
more.  
This may not be optimal for certain apps, but, it provides methods so you
can say things like:

#parse the doc
$html = parse_html($document);

#fetch the title
$title = "@{ $html->title->content }";

#extract data from tables
foreach $table ( $html->table() ) {
    foreach $row ( $table->tr() ) {
	foreach $data ( $row->td() ) { 
	    print "@{ $data->content }\t";
	}
	print "\n";
    }
    print "\n";
}

The functionality is similar to the what extractLinks() provides, but you
can extract any element.

Of course, you can only grab one link element type at a time this way:
foreach $anchor ( $html->a() ) {
    print $anchor->attr('HREF'), "\n";
}

simple but nice to have...

-Doug

---8<--
#just stuck it in this namespace for now.
package HTML::Element;

#let's do something like HTML::AsSubs...

#it would be nice to share these tags with HTML::AsSubs, hmm...
@TAGS = qw(link meta nextid 
	   h1 h2 h3 h4 h5 h6 p pre address blockquote
	   a img br hr
	   ol ul dir menu li
	   dl dt dd
	   cite code em kbd samp strong var
	   b i u tt
	   table tr td th caption
	   form input select option textarea
	   );

foreach (@TAGS) {
    push(@code, "sub $_ { shift->walkto('$_', \@_); }\n");
}

#there should only be one of these tags in each doc,
#need to go over HTML spec...
foreach (qw(html header title base body isindex)) {
    push(@code, "sub $_ { shift->walkto('$_', scalar(\@_) ? \@_ : 1 ); }\n");
} 

eval join('', @code);
die $@ if $@;

sub walkto {
    my($tree,$tag,$num) = @_;
    my(@refs);
    eval {
	$tree->traverse(sub {
	    my($self, $start, $depth) = @_;
	    return 1 unless $start;
	    if($self->tag eq $tag) {
		push(@refs, $self);
		if(defined $num) {
		    die unless --$num > 0; #should we bother aborting?
		}
	    }
	    return 1;
	},1);
    };
    return wantarray ? (@refs) : $refs[0];
}

---8<--