Re: Entities in HTML::Parse

Gisle Aas (aas@oslonett.no)
Sun, 05 Nov 1995 11:17:16 +0100


> Hi,
> 
> I introduced a new control variable for HTML::Parse.  It is called
> $HTML::Parse::KEEP_ENTITIES.  Its purpose is to disable the decoding
> of entities while parsing, so that you can do this more safely:
> 
> 	print parse_htmlfile("something.html")->asHTML;
> 
> What do you think?

Perhaps it is better to fix HTML::Element::asHTML()??

Regards,
Gisle



> 
> Test:
> ----
> #!/bin/perl
> 
> use HTML::Element;
> use HTML::Parse;
> 
> $text = <<EOT
> <HTML>
>   <BODY>
>     <P> &lt;text in angle brackets&gt;
>   </BODY>
> </HTML>
> EOT
> ;
> # $HTML::Parse::KEEP_ENTITIES = 1;
> print "original:\n$text\n";
> print "as parsed and output by libwww:\n";
> print parse_html($text)->asHTML;
> 
> Patch:
> ----
> *** Parse.pm.org        Tue Oct 31 13:38:28 1995
> --- Parse.pm    Wed Nov  1 16:51:58 1995
> ***************
> *** 61,66 ****
> --- 61,71 ----
>   all you want is to examine the structure of the document.  Default is
>   false.
>   
> + =item $HTML::Parse::KEEP_ENTITIES
> + 
> + Setting this variable to true will disable the expansion of entities.
> + Default is false.
> + 
>   =back
>   
>   =head1 SEE ALSO
> ***************
> *** 96,101 ****
> --- 101,107 ----
>   $IMPLICIT_TAGS  = 1;
>   $IGNORE_UNKNOWN = 1;
>   $IGNORE_TEXT    = 0;
> + $KEEP_ENTITIES  = 0;
>   
>   
>   # Elements that should only be present in the header
> ***************
> *** 249,255 ****
>                     die "This should not happen";
>                   }
>                 # expand entities
> !               HTML::Entities::decode($val);
>             } else {
>                 # boolean attribute
>                 $val = $key;
> --- 255,261 ----
>                     die "This should not happen";
>                   }
>                 # expand entities
> !               HTML::Entities::decode($val) unless $KEEP_ENTITIES;
>             } else {
>                 # boolean attribute
>                 $val = $key;
> ***************
> *** 401,407 ****
>       $pos = $html unless defined($pos);
>   
>       my @text = @_;
> !     HTML::Entities::decode(@text) unless $IGNORE_TEXT;
>   
>       if ($pos->isInside(qw(pre xmp listing))) {
>         return if $IGNORE_TEXT;
> --- 407,413 ----
>       $pos = $html unless defined($pos);
>   
>       my @text = @_;
> !     HTML::Entities::decode(@text) unless ($IGNORE_TEXT || $KEEP_ENTITIES);
>   
>       if ($pos->isInside(qw(pre xmp listing))) {
>         return if $IGNORE_TEXT;
> 
> 
>