Re: HTML::Parser line numbers

Brian Slesinsky (bslesins@best.com)
Fri, 23 Jan 1998 11:19:26 -0800 (PST)


Yeah, I was afraid of that.  I can't try it at the moment, but perhaps
it would help to replace calls to count with:

  $self->count(...) if($counting);

Anyway, my application is a special case so I can live without a merge.

Hmm, if performance matters that much, perhaps it would be good to rewrite
HTML::Parser in C?

----------------------------------------------------------------------
Brian Slesinsky


On 23 Jan 1998, Gisle Aas wrote:

> Using Randal's idea we could make a subclass like the one below.  You
> could extent it to count characters and offsets by calling
> SUPER::parse for each character.  If would not be very efficient
> though.  Perhaps splitting on /([<>\n])/ would do?
> 
> I tested your $parser->count callback patch, and noticed that it
> slowed down the generic parser about 6% for parsing of some random
> HTML code I had laying around.  I don't want this if I can avoid it.
> 
> Regards,
> Gisle
> 
> 
> -----------------------------------------------------------
> package HTML::LineParser;
> 
> require HTML::Parser;
> @ISA=qw(HTML::Parser);
> 
> sub new
> {
>     my $class = shift;
>     my $self = $class->SUPER::new(@_);
>     $self->lineno(1);
>     $self;
> }
> 
> sub parse
> {
>     my $self = shift;
>     return $self->SUPER::parse($_[0]) unless defined $_[0];
> 
>     my @lines = split(/(\n)/, $_[0]);
>     for (@lines) {
> 	$self->SUPER::parse($_);
> 	$self->{_lineno}++ if $_ eq "\n";
>     }
>     $self;
> }
> 
> sub lineno
> {
>     my $self = shift;
>     my $old = $self->{_lineno};
>     $self->{_lineno} = shift if @_;
>     $old;
> }
> 
> 1;
>