Re: Manipulating The HTML Tree

WWW projekt (wwwproj@dna.lth.se)
Mon, 22 Jul 1996 16:48:10 +0200


Andreas Koenig wrote:
> 
> >>>>> WWW projekt <wwwproj@dna.lth.se> writes:
> >> I would like to have a method that can remove a a child and all of > >> its subchilds in an elements contents.

> 
> I missed that question when you first posted it. I'm doing it this way:
> 
>     my $page = parse_html($text);
>     $page->traverse(sub {
>                         my($ele,$flag,$depth) = @_;
>                         if ($depth > 7) {
>                             $ele->delete;
>                             return;
>                         }
>                         if ($self->{HTML_OK}{$ele->tag}) {
>                             return 1;
>                         } else {
>                             $ele->tag("noop");
>                             1;
>                         }
>                     }, 1);
> 
>     $htmlstr = $page->as_HTML;
>     $page->delete;

This is really two methods, isn't it? One where you deletes the element
and one where you make it into a <NOOP ..> element.

I tried the delete method and it seemed to work without messing it up
for traverse, thank you Andreas.

I *had* plans of doing this myself, but I discarded it because I thought
it would leave an empty element in the contents. And it does, doesn't
it? Doesn't seem to mess anything up though...


>> Today there is no way of inserting a new element in the tree, i.e.
>> inserting an element into a certain position in the contents of 
                               ^^^^^^^^^^^^^^^^
>> another element.
> 
> perl -MHTML::Parse -e '
> $page = 
>   parse_html("<HEAD><TITLE>forgot the base</HEAD><BODY>Reached end");
> $page->traverse(sub {
>         my($ele,$flag,$depth) = @_;
>         if ($ele->tag eq "head") {
>             $ele->insert_element(HTML::Element->new("base", 
>                                                     HREF=>"rtrtr"));
>             return 0;
>         }
>         return 1;
> },1);
> print $page->as_HTML;
> '
> <HTML><HEAD><TITLE>forgot the base</TITLE><BASE HREF="rtrtr"></HEAD><BODY><P>Reached the end</BODY></HTML>

This inserts a BASE _last_ in the surrounding HEAD, but what happens if
I have:
 
forgot something else hobbes

Reached the end


And I want to insert something between the link to calvin and the
following text?

% perl -MHTML::Parse 
$page = parse_html("<HEAD><TITLE>forgot something else
</TITLE></HEAD><BODY> <A HREF='calvin'> hobbes </A> <P>Reached the end
</BODY> </HTML>");
 
$page->traverse(sub {
        my($ele,$flag,$depth) = @_;
        if ($ele->tag eq "a") {
            $ele->insert_element(HTML::Element->new("base", 
                                                    HREF=>"rtrtr"));
            return 0;
        }
        return 1;
},1);
print $page->as_HTML;
^D
 
forgot something else

hobbes

Reached the end


This is not what I had in mind. 
(ok, ok the BASE tag cannot be placed here, I know. It shows what is
wrong though...)

I have not tried to insert something after a text and before another
tag, but I can imagine that this might cause problems too.

So, now that I have defined the problem better, does anyone have a
miracle solution to this?

> When _I_ tried for the first time, things messed up too ;-)

Then I'm not alone... ;-)