Re: Deleting images using libwww-perl

Andreas Koenig (k@anna.in-berlin.de)
Fri, 9 Aug 1996 08:55:21 +0200


I just found an unfinished emeil to you to answer your recent
questions which I never came around to complete. Better than nothing,
I suppose, and append it below.

>>>>> WWW projekt <wwwproj@dna.lth.se> writes:

 www> Due to the misfortunate fact that the contents in an element is stored
 www> in an array, there is no way of actually deleting an element in another
 www> elements contents.

 www> What you can do is to exchange the element with something else.

 www> Andreas Koenig suggested Mon, 22 Jul 1996 in message "Re: Manipulating
 www> The HTML Tree", that it should be done like this:
 www> -----
 www> $page->traverse(sub {
 www>                         my($ele,$flag,$depth) = @_;
 www>                         if ($depth > 7) {
 www>                             $ele->delete;
 www>                             return;
 www>                         }
 www>                         if ($self->{HTML_OK}{$ele->tag}) {
 www>                             return 1;
 www>                         } else {
 www>                             $ele->tag("noop");
 www>                             1;
 www>                         }
 www>                     }, 1);
 www> -----

 www> Note the $ele->tag("noop"); line, he sets the tag to be 'noop' and it
 www> will stay in the tree, but as an unknown tag NOOP.

The NOOP was intentional, not an unfortunate effect. $ele->delete
works just fine for real deletes. The NOOP is there, so I do not
censor what the people wrote, but prevent them from breaking the
guestbook page. $self->{HTML_OK} allows e.g. UL, OL, A, LI, and some
such. If they come with IMG, they get what they deserve :-) But I
still can see, what they tried to do.

 www> This was not quite what I wanted, so I exchanged the tag with an empty
 www> string instead. I could not be done the easy way though ($ele = "";) so
 www> I had to add a method exchange to the HTML::Element package.
 www> -----
 www> sub exchange
 www> {
 www>     my($self, $from, $to) = @_;

 www>     # return if nothing to exchange
 www>     return $self unless (defined $self and defined $from);
 www>     $to = "" unless defined $to;

 www>     my $el;
 www>     foreach $el (@{$self->content}) {
 www> 	# if the element is a reference, compare pointers
 www> 	# else compare text
 www> 	if ((ref($el) and $el == $from) or $el eq $from) {
 www> 	    # delete the element if it is a reference
 www>  	    $el->delete if ref($el);
 www> 	    $el = $to;
 www> 	    return 1;
 www> 	}
 www>     }
 www>     return 0;
 www> }
 www> -----

 www> I use it like this:
 www> $ele->parent->exchange($ele, "");
 www> and it seems to work allright.

 www> I still think that a 'remove' method should me implemented that really
 www> removes the element from the tree, but it might be a bit tricky because
 www> of the use of arrays instead of linked lists in the structure.

Sorry, can't follow you. Why does ->delete() not do what you want?

I append my (unfortunately not really complete) answer to your recent
posting.

---snip---

>>>>> WWW projekt <wwwproj@dna.lth.se> writes:

 >> if ($depth > 7) {
 >> $ele->delete;
 >> return;
 >> }
 >> if ($self->{HTML_OK}{$ele->tag}) {
 >> return 1;
 >> } else {
 >> $ele->tag("noop");
 >> 1;
 >> }

  > This is really two methods, isn't it? One where you deletes the element
  > and one where you make it into a <NOOP ..> element.

That's intentional. For documenting purposes I don't want to delete
these elements, just not display them.

[...]

 >>> Today there is no way of inserting a new element in the tree, i.e.
 >>> inserting an element into a certain position in the contents of 
  >                                ^^^^^^^^^^^^^^^^
 >>> another element.
 >> 
 >> perl -MHTML::Parse -e '
 >> $page = 
 >> parse_html("<HEAD><TITLE>forgot the base</HEAD><BODY>Reached end");
 >> $page->traverse(sub {
 >> my($ele,$flag,$depth) = @_;
 >> if ($ele->tag eq "head") {
 >> $ele->insert_element(HTML::Element->new("base", 
 >> HREF=>"rtrtr"));
 >> return 0;
 >> }
 >> return 1;
 >> },1);
 >> print $page->as_HTML;
 >> '
 >> <HTML><HEAD><TITLE>forgot the base</TITLE><BASE HREF="rtrtr"></HEAD><BODY><P>Reached the end</BODY></HTML>

  > This inserts a BASE _last_ in the surrounding HEAD, but what happens if
  > I have:
 
  > <HTML>
  >   <HEAD><TITLE>forgot something else</TITLE></HEAD>
  >   <BODY>
  >     <A HREF="calvin"> hobbes </A>
  >     <P>Reached the end
  >   </BODY>
  > </HTML>

  > And I want to insert something between the link to calvin and the
  > following text?

It's becoming a bit lengthy, but it's feasible. _Where_ exactly did
you have in mind? I insert for you something in three places. The
first requires to know a bit of the source, that's kind of
hackery. The second is fine, the third needs a flag that I can set
myself, but I think, that's no bad style.

% perl -MHTML::Parse -e '
$page = parse_html(q(
forgot something else hobbes

Reached the end

));
$page->traverse(sub {
        my($ele,$flag,$depth) = @_;
        if ($ele->tag eq "a" && $flag) {
            unshift @{$ele->content}, "|||BEFORE|||";
            $ele->push_content("|||AFTER|||");
            return 1;
        } elsif ($ele->tag eq "a" && !$flag){
            $GlobalFlag = "saw_a_href";
        } elsif ($GlobalFlag eq "saw_a_href"){
            $e = HTML::Element->new("HTML");
            $e->push_content("|||OUTSIDE|||");
            $ele->insert_element($e);
            $GlobalFlag=0;
            return 0;
        }
        return 1;
},1);
print $page->as_HTML;
'
forgot something else

|||BEFORE||| hobbes |||AFTER||| |||OUTSIDE|||

Reached the end [...] >> When _I_ tried for the first time, things messed up too ;-) > Then I'm not alone... ;-) HTH, andreas