>>>>> WWW projekt <wwwproj@dna.lth.se> writes:
> This talk concerns HTML::TreeBuilder and HTML::Element.
> Fri, 19 Jul 1996 I wrote:
>> I would like to have a method that can remove a a child and all of its
>> subchilds in an elements contents.
>>
>> Is there an easy way of doing this today accessing $e->{'_content'}?
I missed that question when you first posted it. I'm doing it this way:
my $page = parse_html($text);
$page->traverse(sub {
my($ele,$flag,$depth) = @_;
if ($depth > 7) {
$ele->delete;
return;
}
if ($self->{HTML_OK}{$ele->tag}) {
return 1;
} else {
$ele->tag("noop");
1;
}
}, 1);
$htmlstr = $page->as_HTML;
$page->delete;
What %{$self->{HTML_OK}} is, is determined somewhere else. The program
in question is working for about a year now. I have definitely deleted
quite a lot of trees with it ;-)
> I would like to extend this discussion to how one should be able to
> manipulate the HTML-tree in a comfortable way.
> Today there is no way of inserting a new element in the tree, i.e.
> inserting an element into a certain position in the contents of another
> element.
perl -MHTML::Parse -e '
$page = parse_html("<HEAD><TITLE>forgot the base</HEAD><BODY>Reached the end");
$page->traverse(sub {
my($ele,$flag,$depth) = @_;
if ($ele->tag eq "head") {
$ele->insert_element(HTML::Element->new("base", HREF=>"rtrtr"));
return 0;
}
return 1;
},1);
print $page->as_HTML;
'
Reached the end > Neither is there a way of deleting an element, as explained above. See above :-) > If I want to do this today, I have to make a copy of the parents > contents, go through each element there to see if it is the element to > be deleted, and copy back those element that are not to be deleted. > It can be done with a 'grep' command, quite easily, even though it is > not very efficent. > "Whats the fuzz", you say, "you just said it can be done!" > Yes, I managed to do it alright, but as I was searching elements to > delete using the 'traverse' method, things messed up. The reason seems > to be that traverse cannot handle if the callback method manipulates the > contents. When _I_ tried for the first time, things messed up too ;-) andreas