constantifying $HTML::TreeBuilder::Debug?

Sean M. Burke (sburke@spinn.net)
Sat, 16 Sep 2000 14:04:21 -0600


If the existence of the (undocumented) $HTML::TreeBuilder::Debug variable
is news to you, ignore this message.  If you know of it and use it, read on.

Those of you who have read the source of HTML::TreeBuilder since my rewrite
of it last December, and who survived the ordeal, will recall many bits of
code like:

        print $indent,
          " * ambilocal element \U$tag\E makes an implicit HEAD.\n"
         if $Debug > 1;
or
        print $indent, " * head element \U$tag\E found inside BODY!\n"
         if $Debug;

or
        print $indent,
          " * Text node under \U$ptag\E closes, implicates BODY.\n"
         if $Debug > 1;
or
        print $indent, " (Hit a $_; closing everything up to here.)\n"
         if $Debug > 2;

I added sorts of these lines so that when a chunk of !!DECENT!! HTML parsed
incorrectly, I could set $HTML::TreeBuilder::Debug to 1 or higher (instead
of its default, 0), reparse it, and look at the resulting output from all
the print statements like the above.  That proved absolutely indispensible
when I was doing the rewrite of TreeBuilder in fall of 1999, and it
occasionally proves helpful these days.

But:
While testing $Debug isn't a terribly expensive process, I'm sure, it
happens /rather/ a lot in the course of the normal parsing/treebuilding of
HTML.  I'm thinking of changing them in a basic way, so that that $Debug's
value is used only at compile time to set a constant called, say, DEBUG, so
that the above can read:
        print $indent, " (Hit a $_; closing everything up to here.)\n"
         if DEBUG > 2;
The benefit to this is that since DEBUG is a constant at compile time,
"DEBUG > 2" will be constant-folded.  That is, if DEBUG > 2 is true, then
Perl will IN COMPILATION turn the above into:
        print $indent, " (Hit a $_; closing everything up to here.)\n";
Whereas if DEBUG > 2 is false, then it will be turned into:
        ;

Actually, it's not as if the source code gets filtered; it's a matter of
the code-tree that gets built, visible with Deparse:

% perl -MO=Deparse
BEGIN {$x = 1; eval qq~ sub yo () {$x} ~}; if(yo){ print "lala" }
[ctrl-d]
- syntax OK
sub yo () {
    1;
}
print 'lala';;


versus:

% perl -MO=Deparse
BEGIN {$x = 0; eval qq~ sub yo () {$x} ~}; if(yo){ print "lala" }
[ctrl-d]
- syntax OK
sub yo () {
    ;
}
'???';  # Deparse's sign that something there was optimized away


versus:

% perl -MO=Deparse
BEGIN {$x = 0; sub yo () {$x} }; if(yo){ print "lala" }
# doesn't really produce a constant, as you'll see:
[ctrl-d]
- syntax OK
sub yo () {
    $x;
}
if (yo) {
    print 'lala';
}



The point is that if I do this, TreeBuilder would be rather faster for all
normal uses; but that everyone who currently uses the
$HTML::TreeBuilder::Debug variable would have their code break, and would
have to change things (and it would probably mean that the Debug level
would be set at compile time, and not typically alterable after that, altho
I could use some special value to signal that a constant should not be
generated, so that the value could be freely changed at runtime).  If I'm
the only person using that variable, then I'm quite happy, and I'll change
this to my heart's content.

But if you use $HTML::TreeBuilder::Debug and WOULD mind this change, then
SPEAK NOW OR FOREVER HOLD YOUR PEACE!


Incidentally, my proposed change would also mean that there'd be an eval
line (in a BEGIN block) in TreeBuilder -- the only eval line in all of
TreeBuilder/Element.  I have no idea what effect this might have on people
using any of the (still very experimental?) Perl compilers.


--
Sean M. Burke  sburke@cpan.org  http://www.spinn.net/~sburke/