Patch suggestion for HTML::Element::delete (Was: Deleting images using libwww-perl)
Andreas Koenig (k@anna.in-berlin.de)
Mon, 12 Aug 1996 21:10:45 +0200
>>>>> WWW projekt <wwwproj@dna.lth.se> writes:
www> I still think that a 'remove' method should me implemented that
www> really removes the element from the tree, but it might be a bit
www> tricky because of the use of arrays instead of linked lists in
www> the structure.
>>
>> Sorry, can't follow you. Why does ->delete() not do what you want?
www> Because the tag is not deleted from the tree:
www> ~/Stefan/tmp>perl -w -MHTML::TreeBuilder
www> $html = '<I> Italic </I> <B> Bold </B> <HR>';
www> $p = new HTML::TreeBuilder;
www> $p->parse($html);
www> $p->traverse(sub {
www> my ($ele, $flag, $depth) = @_;
www> if ($ele->tag eq 'b') {
www> $ele->delete();
www> }
www> return 1;
www> }, 1);
www> print "---------\n", $p->as_HTML, "--------\n";
www> $p->traverse(sub {
www> my ($ele, $flag, $depth) = @_;
www> return unless $flag;
www> print $ele->tag, " printing parent tag: ";
www> if ($ele->tag ne 'html') {
www> print $ele->parent->tag;
www> }
www> print "\n";
www> return 1;
www> }, 1);
www> ---------
www> <HTML><BODY><P><I> Italic </I> <B></B> <HR></BODY></HTML>
www> --------
www> html printing parent tag:
www> body printing parent tag: html
www> p printing parent tag: body
www> i printing parent tag: p
www> Can't call method "tag" without a package or object reference at - line
www> 21.
www> b printing parent tag: ~/Stefan/tmp>
Great! An excellent test case. I realize that I was wrong about the
exact semantics of delete(). I'd call it a bug and suggest the
following patch. (Hardly tested)
*** /tmp/Element.pm.5.01 Mon Aug 12 13:46:22 1996
--- /tmp/Element.pm Mon Aug 12 13:46:22 1996
***************
*** 400,408 ****
=head2 $h->delete()
! Frees memory associated with the element and all children. This is
! needed because perl's reference counting does not work since we use
! circular references.
=cut
#'
--- 400,409 ----
=head2 $h->delete()
! Frees memory associated with the element and all children and
! eliminates the pointer to itself from a parent element--provided a
! parent exists. This is needed because perl's reference counting does
! not work since we use circular references.
=cut
#'
***************
*** 410,415 ****
--- 411,428 ----
sub delete
{
$_[0]->delete_content;
+ my $pos_within_parent;
+ no overload;
+ foreach (0..$#{$_[0]->{'_parent'}{'_content'}}) {
+ # looking for myself in parent and splicing me out after
+ if (ref($_[0]->{'_parent'}{'_content'}->[$_]) && "$_[0]->{'_parent'}{'_content'}->[$_]" eq "$_[0]"){
+ $pos_within_parent = $_;
+ last;
+ }
+ }
+ if (defined $pos_within_parent) {
+ splice @{$_[0]->{'_parent'}{'_content'}}, $pos_within_parent, 1;
+ }
delete $_[0]->{'_parent'};
delete $_[0]->{'_pos'};
$_[0] = undef;
You like it, Stefan?
www> Look at the line printed by ->as_HTML: The tag is still there, but it's
www> content is gone.
www> Then Have a look at the error, it's printed when trying to print parent
www> of the 'b' tag, that I really don't want to be a part of the tree
www> anymore.
www> Maybe I have misunderstood you, but was this not the way you wanted to
www> delete elements? Have I done anything wrong?
I don't think so.
[...]
www> The third style was really what I wanted. I'm still not convinced that
www> it is a general method to solve the problem, though, so I challange you
www> to solve this problem:
www> Insert
www> <B> <P> Bold paragraph <B>
www> inbetween the hr and br tags in this tree:
www> <BODY>
www> <HR>
www> <BR>
www> </BODY>
I think, you're right and it can't be done. Maybe some splice method
should be invented.
www> Can you do this in a way that excludes the risk of errors and that does
www> not depend on the type of tags that are inserted or surround the
www> inserted tag?
You got me ;-) I'd love to donate a splice method, but my time's too
limited, sorry. Thanks for the challange, btw!
andreas