Re: Changes to HTML-Parser callbacks interface

Michael A. Chase (mchase@ix.netcom.com)
Thu, 25 Nov 1999 13:44:07 -0800


----- Original Message -----
From: Gisle Aas <gisle@aas.no>
To: Michael A. Chase <mchase@ix.netcom.com>
Cc: libwww <libwww@perl.org>
Sent: 11/25/99 11:47
Subject: Re: Changes to HTML-Parser callbacks interface


> "Michael A. Chase" <mchase@ix.netcom.com> writes:

> > > Example:
> > >
> > >   $p->callback(start => "self,tagname,line", sub { ... });
> >
>
>  HTML::Parser->new(start_cb => ["self,tagname,line", sub { ... }]);
>
> > How about
> >     $p->callback(start => [qw(self tagname line), sub { ... }]);
>
> Something like this will work.  I still like the stringified attrspec
> better.  Wrapping it up in an array also make it easier to explain the
> $p->callback return value.
>
> > Any keywords found would set options or enable parameters to the
callback, a
> > coderef would be saved as the callback, and an arrayref would be saved
as
> > the accumulator array.  Keywords or references could appear in any
order,
> > but any values
> > passed to the coderef or accumulator array would always be in the same
order
> > when they are present.
>
> Hmm.  Perhaps we need some more syntax to make it clear what is what.
> One possibility is to introduce a ":" and let everything before it be
> arguments and everything after be callback specific options:
>
>    "tagname,attr: keep_case"

I still prefer having the arguments each be a separate parameter.  Perl has
a perfectly good argument parser, so we might as well take advantage it.  It
would also allow literals more easily such as

   $p -> callback( start => [ qw( literal S tagname tokens origtext ),
\&handler ] );

> > Just some more ideas:
> >    v2_compat: qw( self tagname attr_arrayref attr_hashref origtext ),
>
> My idea was that the stuff in &HTML::Parser::new that set up
> compatibility callbacks should just ask for exactly those arguments
> that used to be passed to the old method callbacks.  Is this different
> or did you suggest that something like "v2_compat" was recognized
> directly?

I was thinking of older subclassed parsers, but I guess they'd be covered by
the default callbacks.

> We could probably also special case if we see "self" as the first
> argument and make a direct method call from XS?  The code-ref argument
> should then be just a plain string if you want method resolution to
> take place.

Perhaps this:

   $p -> callback( start => [ qw( method methname self tagname tokens
origtext ) ] );

Maybe 'method' should imply 'self'.

> >    keep_tag: don't force tag names to lowercase
> >    keep_attr: don't force attribute names to lowercase
> >    keep: qw( keep_tag keep_attr )
>
> Perhaps?  Can you think of any reason anybody would want one and not
> the other?

I'm not sure either, I was just throwing out possibilities.  Perhaps just
'lc' and 'no_lc'.

> > If no keywords are given, it should be equivalent to qw( tagname
> > tokens_arrayref origtext ) or whatever is finally agreed on.
>
> I originally wanted to make empty callbacks the default.  Otherwise we
> would have to have different rules for different stuff I think,
> i.e. more confusing documentation.

I'm not sure what you mean by an empty callback.  Maybe something like one
of these?

   $p->callback( 'text' );
   $p->callback( text => []);
   $p->callback( text => [ "", sub {} ] ); # Your syntax
   $p->callback( text => [ (), sub {} ] ); # My syntax

> >    $p->callback( declaration => [ qw(tagname tokens_arrayref),
\@accum ] );
> >
> > This would allow some elements to be handled by callbacks and others by
the
> > array.
>
> This is the way I think we will go.  It should probably also be
> possible to put literals into argspec so we could easily add those
> "S", "E", "C" strings that we used to add to accum before.
>
>   $p->callback( start => qw("S",tagname,attr_hash,origtext), \@accum);

This might not work with qw(), especially the commas.  I'd have phrased it
as:

   $p->callback( start => [ qw( literal S tagname attr_hash origtext),
\@accum ] );

> This is probably also handy if you want to use the same callback
> procedure to handle multiple types of markup:
>
>   $p->callback( start => qw("start",tagname), \&handler);
>   $p->callback( end   => qw("end",tagname),   \&handler);

   $p->callback( start => [ qw( literal start tagname ), \&handler ] );
   $p->callback( end   => [ qw( literal end tagname ),   \&handler ] );

Summary

I think we are agreed on making the handler argument an array reference like

   $p->callback( start => [...] ); or
   my $p = HTML::Parser -> new( start_cb => [...], ...);

but are still discussing the content.  I'd prefer each option be a separate
element inside the array while you are currently leaning toward having all
the options inside a single string argument.  I'm concerned about having to
write a parser for the options rather than just a foreach loop.

I've put off more re-writing of the POD for now.  I'll probably send you
what I've got soon.