Re: Changes to HTML-Parser callbacks interface
Michael A. Chase (mchase@ix.netcom.com)
Thu, 25 Nov 1999 13:44:07 -0800
----- Original Message -----
From: Gisle Aas <gisle@aas.no>
To: Michael A. Chase <mchase@ix.netcom.com>
Cc: libwww <libwww@perl.org>
Sent: 11/25/99 11:47
Subject: Re: Changes to HTML-Parser callbacks interface
> "Michael A. Chase" <mchase@ix.netcom.com> writes:
> > > Example:
> > >
> > > $p->callback(start => "self,tagname,line", sub { ... });
> >
>
> HTML::Parser->new(start_cb => ["self,tagname,line", sub { ... }]);
>
> > How about
> > $p->callback(start => [qw(self tagname line), sub { ... }]);
>
> Something like this will work. I still like the stringified attrspec
> better. Wrapping it up in an array also make it easier to explain the
> $p->callback return value.
>
> > Any keywords found would set options or enable parameters to the
callback, a
> > coderef would be saved as the callback, and an arrayref would be saved
as
> > the accumulator array. Keywords or references could appear in any
order,
> > but any values
> > passed to the coderef or accumulator array would always be in the same
order
> > when they are present.
>
> Hmm. Perhaps we need some more syntax to make it clear what is what.
> One possibility is to introduce a ":" and let everything before it be
> arguments and everything after be callback specific options:
>
> "tagname,attr: keep_case"
I still prefer having the arguments each be a separate parameter. Perl has
a perfectly good argument parser, so we might as well take advantage it. It
would also allow literals more easily such as
$p -> callback( start => [ qw( literal S tagname tokens origtext ),
\&handler ] );
> > Just some more ideas:
> > v2_compat: qw( self tagname attr_arrayref attr_hashref origtext ),
>
> My idea was that the stuff in &HTML::Parser::new that set up
> compatibility callbacks should just ask for exactly those arguments
> that used to be passed to the old method callbacks. Is this different
> or did you suggest that something like "v2_compat" was recognized
> directly?
I was thinking of older subclassed parsers, but I guess they'd be covered by
the default callbacks.
> We could probably also special case if we see "self" as the first
> argument and make a direct method call from XS? The code-ref argument
> should then be just a plain string if you want method resolution to
> take place.
Perhaps this:
$p -> callback( start => [ qw( method methname self tagname tokens
origtext ) ] );
Maybe 'method' should imply 'self'.
> > keep_tag: don't force tag names to lowercase
> > keep_attr: don't force attribute names to lowercase
> > keep: qw( keep_tag keep_attr )
>
> Perhaps? Can you think of any reason anybody would want one and not
> the other?
I'm not sure either, I was just throwing out possibilities. Perhaps just
'lc' and 'no_lc'.
> > If no keywords are given, it should be equivalent to qw( tagname
> > tokens_arrayref origtext ) or whatever is finally agreed on.
>
> I originally wanted to make empty callbacks the default. Otherwise we
> would have to have different rules for different stuff I think,
> i.e. more confusing documentation.
I'm not sure what you mean by an empty callback. Maybe something like one
of these?
$p->callback( 'text' );
$p->callback( text => []);
$p->callback( text => [ "", sub {} ] ); # Your syntax
$p->callback( text => [ (), sub {} ] ); # My syntax
> > $p->callback( declaration => [ qw(tagname tokens_arrayref),
\@accum ] );
> >
> > This would allow some elements to be handled by callbacks and others by
the
> > array.
>
> This is the way I think we will go. It should probably also be
> possible to put literals into argspec so we could easily add those
> "S", "E", "C" strings that we used to add to accum before.
>
> $p->callback( start => qw("S",tagname,attr_hash,origtext), \@accum);
This might not work with qw(), especially the commas. I'd have phrased it
as:
$p->callback( start => [ qw( literal S tagname attr_hash origtext),
\@accum ] );
> This is probably also handy if you want to use the same callback
> procedure to handle multiple types of markup:
>
> $p->callback( start => qw("start",tagname), \&handler);
> $p->callback( end => qw("end",tagname), \&handler);
$p->callback( start => [ qw( literal start tagname ), \&handler ] );
$p->callback( end => [ qw( literal end tagname ), \&handler ] );
Summary
I think we are agreed on making the handler argument an array reference like
$p->callback( start => [...] ); or
my $p = HTML::Parser -> new( start_cb => [...], ...);
but are still discussing the content. I'd prefer each option be a separate
element inside the array while you are currently leaning toward having all
the options inside a single string argument. I'm concerned about having to
write a parser for the options rather than just a foreach loop.
I've put off more re-writing of the POD for now. I'll probably send you
what I've got soon.