Hi,
I introduced a new control variable for HTML::Parse. It is called
$HTML::Parse::KEEP_ENTITIES. Its purpose is to disable the decoding
of entities while parsing, so that you can do this more safely:
print parse_htmlfile("something.html")->asHTML;
What do you think?
Test:
----
#!/bin/perl
use HTML::Element;
use HTML::Parse;
$text = <<EOT
<text in angle brackets>
EOT
;
# $HTML::Parse::KEEP_ENTITIES = 1;
print "original:\n$text\n";
print "as parsed and output by libwww:\n";
print parse_html($text)->asHTML;
Patch:
----
*** Parse.pm.org Tue Oct 31 13:38:28 1995
--- Parse.pm Wed Nov 1 16:51:58 1995
***************
*** 61,66 ****
--- 61,71 ----
all you want is to examine the structure of the document. Default is
false.
+ =item $HTML::Parse::KEEP_ENTITIES
+
+ Setting this variable to true will disable the expansion of entities.
+ Default is false.
+
=back
=head1 SEE ALSO
***************
*** 96,101 ****
--- 101,107 ----
$IMPLICIT_TAGS = 1;
$IGNORE_UNKNOWN = 1;
$IGNORE_TEXT = 0;
+ $KEEP_ENTITIES = 0;
# Elements that should only be present in the header
***************
*** 249,255 ****
die "This should not happen";
}
# expand entities
! HTML::Entities::decode($val);
} else {
# boolean attribute
$val = $key;
--- 255,261 ----
die "This should not happen";
}
# expand entities
! HTML::Entities::decode($val) unless $KEEP_ENTITIES;
} else {
# boolean attribute
$val = $key;
***************
*** 401,407 ****
$pos = $html unless defined($pos);
my @text = @_;
! HTML::Entities::decode(@text) unless $IGNORE_TEXT;
if ($pos->isInside(qw(pre xmp listing))) {
return if $IGNORE_TEXT;
--- 407,413 ----
$pos = $html unless defined($pos);
my @text = @_;
! HTML::Entities::decode(@text) unless ($IGNORE_TEXT || $KEEP_ENTITIES);
if ($pos->isInside(qw(pre xmp listing))) {
return if $IGNORE_TEXT;