Re: Hacking HTML::TreeBuilder and HTML::Element

Sean M. Burke (sburke@spinn.net)
Thu, 01 Feb 2001 18:14:26 -0700


At 09:37 PM 2001-02-01 +1000, Jason Henry Parker wrote:
>[...]
>In short, I don't think I can do everything I want to buy simply
>subclassing or trivially altering HTML::TreeBuilder, I can't subclass
>HTML::Element without at least trivially altering HTML::TreeBuilder,
>and I don't want to have to rewrite the excellent HTML::TreeBuilder
>module's support for parsing not-so-tidy HTML.

Or you could just ask the TreeBuilder author.  CPAN authors have been known
to answer email now and then.


Actually, I've been meaning to solve exactly this problem by providing
either or both of:

1) a method for HTML::Element that reblesses an element (and presumably all
its descendants?) into an arbitrary class.

2) a method for HTML::TreeBuilder that (presumably once the parse is
complete) takes all the TreeBuilder-specific things out of the object and
then reblesses it into HTML::Element (or whatever the element class is).  I
briefly entertained the idea of making that actually a something that
calling $tree->eof would do, but that might just cause confusion all
around.  But having it be a method callable on demand is certainly a decent
idea.

I think it would look like simply this:

#In TreeBuilder.pm
sub elementify {
  # Rebless the current object into the normal element class.
  my $self = $_[0];
  my $to_class = ($self->{'_element_class'} || 'HTML::Element');
  delete @{$self}{ grep {;
    length $_ and substr($_,0,1) eq '_'
    and $_ ne '_tag' and $_ ne '_parent'  and $_ ne '_content'
  } };
  bless $self, $to_class;
}


As to something for the first approach, I'm almost tempted to say that if
you want to rebless HTML::Element objects into a class of your choosing (as
opposed to the nice thing, which is copying from one class into your own,
as HTML::DOMbo does), then you're already breaking encapsulation on
HTML::Element and the destination class.  So while you're being a wild man,
just bear down and make that a method for your destination class:

  $node->rebless( what_class )
and/or
  $node->rebless_down( what_class ) # recurses

or whatever.  But maybe I'll just be nice and put such a method into class
HTML::Element anyway; not much point in people writing their own for each
Element subclass.

--
Sean M. Burke  sburke@cpan.org  http://www.spinn.net/~sburke/