Entities in HTML::Parse

Thomas Richter (richter@chemie.fu-berlin.de)
Wed, 01 Nov 1995 17:01:00 +0100


Hi,

I introduced a new control variable for HTML::Parse.  It is called
$HTML::Parse::KEEP_ENTITIES.  Its purpose is to disable the decoding
of entities while parsing, so that you can do this more safely:

	print parse_htmlfile("something.html")->asHTML;

What do you think?

Test:
----
#!/bin/perl

use HTML::Element;
use HTML::Parse;

$text = <<EOT

<text in angle brackets>

EOT
;
# $HTML::Parse::KEEP_ENTITIES = 1;
print "original:\n$text\n";
print "as parsed and output by libwww:\n";
print parse_html($text)->asHTML;

Patch:
----
*** Parse.pm.org        Tue Oct 31 13:38:28 1995
--- Parse.pm    Wed Nov  1 16:51:58 1995
***************
*** 61,66 ****
--- 61,71 ----
  all you want is to examine the structure of the document.  Default is
  false.
  
+ =item $HTML::Parse::KEEP_ENTITIES
+ 
+ Setting this variable to true will disable the expansion of entities.
+ Default is false.
+ 
  =back
  
  =head1 SEE ALSO
***************
*** 96,101 ****
--- 101,107 ----
  $IMPLICIT_TAGS  = 1;
  $IGNORE_UNKNOWN = 1;
  $IGNORE_TEXT    = 0;
+ $KEEP_ENTITIES  = 0;
  
  
  # Elements that should only be present in the header
***************
*** 249,255 ****
                    die "This should not happen";
                  }
                # expand entities
!               HTML::Entities::decode($val);
            } else {
                # boolean attribute
                $val = $key;
--- 255,261 ----
                    die "This should not happen";
                  }
                # expand entities
!               HTML::Entities::decode($val) unless $KEEP_ENTITIES;
            } else {
                # boolean attribute
                $val = $key;
***************
*** 401,407 ****
      $pos = $html unless defined($pos);
  
      my @text = @_;
!     HTML::Entities::decode(@text) unless $IGNORE_TEXT;
  
      if ($pos->isInside(qw(pre xmp listing))) {
        return if $IGNORE_TEXT;
--- 407,413 ----
      $pos = $html unless defined($pos);
  
      my @text = @_;
!     HTML::Entities::decode(@text) unless ($IGNORE_TEXT || $KEEP_ENTITIES);
  
      if ($pos->isInside(qw(pre xmp listing))) {
        return if $IGNORE_TEXT;