HTML::Element|Parse add-on
Doug MacEachern (dougm@osf.org)
Wed, 21 Feb 1996 19:04:23 -0500
I just wanted a simple method to pull the title out of a parsed document,
instead of using $html->traverse(sub{...}). But, I ended up with a little
more.
This may not be optimal for certain apps, but, it provides methods so you
can say things like:
#parse the doc
$html = parse_html($document);
#fetch the title
$title = "@{ $html->title->content }";
#extract data from tables
foreach $table ( $html->table() ) {
foreach $row ( $table->tr() ) {
foreach $data ( $row->td() ) {
print "@{ $data->content }\t";
}
print "\n";
}
print "\n";
}
The functionality is similar to the what extractLinks() provides, but you
can extract any element.
Of course, you can only grab one link element type at a time this way:
foreach $anchor ( $html->a() ) {
print $anchor->attr('HREF'), "\n";
}
simple but nice to have...
-Doug
---8<--
#just stuck it in this namespace for now.
package HTML::Element;
#let's do something like HTML::AsSubs...
#it would be nice to share these tags with HTML::AsSubs, hmm...
@TAGS = qw(link meta nextid
h1 h2 h3 h4 h5 h6 p pre address blockquote
a img br hr
ol ul dir menu li
dl dt dd
cite code em kbd samp strong var
b i u tt
table tr td th caption
form input select option textarea
);
foreach (@TAGS) {
push(@code, "sub $_ { shift->walkto('$_', \@_); }\n");
}
#there should only be one of these tags in each doc,
#need to go over HTML spec...
foreach (qw(html header title base body isindex)) {
push(@code, "sub $_ { shift->walkto('$_', scalar(\@_) ? \@_ : 1 ); }\n");
}
eval join('', @code);
die $@ if $@;
sub walkto {
my($tree,$tag,$num) = @_;
my(@refs);
eval {
$tree->traverse(sub {
my($self, $start, $depth) = @_;
return 1 unless $start;
if($self->tag eq $tag) {
push(@refs, $self);
if(defined $num) {
die unless --$num > 0; #should we bother aborting?
}
}
return 1;
},1);
};
return wantarray ? (@refs) : $refs[0];
}
---8<--