Re: Manipulating The HTML Tree

Andreas Koenig (k@anna.in-berlin.de)
Mon, 22 Jul 1996 14:48:19 +0200


>>>>> WWW projekt <wwwproj@dna.lth.se> writes:

  > This talk concerns HTML::TreeBuilder and HTML::Element.
  > Fri, 19 Jul 1996 I wrote:

 >> I would like to have a method that can remove a a child and all of its
 >> subchilds in an elements contents.
 >> 
 >> Is there an easy way of doing this today accessing $e->{'_content'}?

I missed that question when you first posted it. I'm doing it this way:

    my $page = parse_html($text);
    $page->traverse(sub {
			my($ele,$flag,$depth) = @_;
			if ($depth > 7) {
			    $ele->delete;
			    return;
			}
			if ($self->{HTML_OK}{$ele->tag}) {
			    return 1;
			} else {
			    $ele->tag("noop");
			    1;
			}
		    }, 1);
    
    $htmlstr = $page->as_HTML;
    $page->delete;

What %{$self->{HTML_OK}} is, is determined somewhere else. The program
in question is working for about a year now. I have definitely deleted
quite a lot of trees with it ;-)

  > I would like to extend this discussion to how one should be able to
  > manipulate the HTML-tree in a comfortable way.

  > Today there is no way of inserting a new element in the tree, i.e.
  > inserting an element into a certain position in the contents of another
  > element.

perl -MHTML::Parse -e '
$page = parse_html("<HEAD><TITLE>forgot the base</HEAD><BODY>Reached the end");
$page->traverse(sub {
        my($ele,$flag,$depth) = @_;
        if ($ele->tag eq "head") {
            $ele->insert_element(HTML::Element->new("base", HREF=>"rtrtr"));
            return 0;
        }
        return 1;
},1);
print $page->as_HTML;
'
forgot the base

Reached the end > Neither is there a way of deleting an element, as explained above. See above :-) > If I want to do this today, I have to make a copy of the parents > contents, go through each element there to see if it is the element to > be deleted, and copy back those element that are not to be deleted. > It can be done with a 'grep' command, quite easily, even though it is > not very efficent. > "Whats the fuzz", you say, "you just said it can be done!" > Yes, I managed to do it alright, but as I was searching elements to > delete using the 'traverse' method, things messed up. The reason seems > to be that traverse cannot handle if the callback method manipulates the > contents. When _I_ tried for the first time, things messed up too ;-) andreas