Weirdness in HTML::TreeBuilder ...

Charlie Stross (charles@fma.com)
Thu, 19 Sep 1996 13:23:38 +0100 (BST)


I suspect the HTML parser in LWP 5.00 (and 5.01) is getting a little
confused when it runs across a <SCRIPT> tag in the head of a document.

Reason: I'm hacking on a homebrew module that recursively descends a
web, building a parse tree of each document as it goes (via 
HTML::TreeBuilder), and spitting out tokens. 

If I feed it a document like this ...

some title When I do the traversal, what comes out is this: some title

Note that (a) the attributes of the META tag are in reverse order, and (b) the SCRIPT tag has somehow been shunted down out of the section! (a) I can live with, but (b) is somewhat disturbing. Am I missing something obvious, or is this a parser 'feature'? (NB: I initially suspected the reversed attributes of the META tag arose from the way I'd built a queue (as a buffer) in my module, but as far as I can tell the queue is innocent: push() at one end, shift() at the other. I'm currently scratching my head ...) Charlie Stross fma Ltd http://www.fma.com/