Re: HTML::Parser line numbers

Brian Slesinsky (bslesins@best.com)
Thu, 22 Jan 1998 22:08:30 -0800 (PST)


> M> On Thu, 22 Jan 1998, Brian Slesinsky wrote:
> >> Hi, I created a subclass of HTML::Parser that keeps track of which
> >> character and line it's at in the input file.

On 22 Jan 1998, Randal Schwartz wrote:
> I did this already by subclassing.  See column #14 in
> http://www.stonehenge.com/merlyn/WebTechniques/.  No need to hack
> the original parse.

Neat, I didn't think of doing it that way.  However, my version can give
you some additional statistics:

 - the character offset of the HTML tag from the beginning of the file
   (so I can build an index and use seek() to get to the '<' character of
   a tag). 

 - both the line and column number of the tag (e.g. you could report that
   an <a> tag starts at line 20 column 3).

For my project I really only need the character offset from the beginning
of the file (the line counting I threw in because it was easy).  I don't
see any good way of doing that without hacking parse().

----------------------------------------------------------------------
Brian Slesinsky