Re: HTML::Entities

Gisle Aas (gisle@activestate.com)
11 Apr 2001 10:25:50 -0700


Gisle Aas <gisle@ActiveState.com> writes:

> Given this quick survey, I think it would be unwise to just add it to
> HTML::Entities unless we can make it so that it only affects decoding.
> It seems more correct to continue to encode ' as &#39;

FYI, I just checked in the following patch:

Index: lib/HTML/Entities.pm
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
RCS file: /cvsroot/libwww-perl/html-parser/lib/HTML/Entities.pm,v
retrieving revision 1.21
retrieving revision 1.22
diff -u -p -u -r1.21 -r1.22
--- lib/HTML/Entities.pm	2001/02/23 07:07:01	1.21
+++ lib/HTML/Entities.pm	2001/04/11 17:22:45	1.22
@@ -85,6 +85,7 @@ require HTML::Parser;  # for fast XS imp
 'gt'    =3D> '>',  # greater than
 'lt'    =3D> '<',  # less than
  quot   =3D> '"',  # double quote
+ apos   =3D> "'",  # single quote
=20
  # PUBLIC ISO 8879-1986//ENTITIES Added Latin 1//EN//HTML
  AElig	=3D> '=C6',  # capital AE diphthong (ligature)
@@ -349,6 +350,7 @@ require HTML::Parser;  # for fast XS imp
 while (my($entity, $char) =3D each(%entity2char)) {
     $char2entity{$char} =3D "&$entity;";
 }
+delete $char2entity{"'"};  # only one-way decoding
=20
 # Fill inn missing entities
 for (0 .. 255) {
Index: t/entities.t
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
RCS file: /cvsroot/libwww-perl/html-parser/t/entities.t,v
retrieving revision 1.3
retrieving revision 1.4
diff -u -p -u -r1.3 -r1.4
--- t/entities.t	1997/09/05 09:00:06	1.3
+++ t/entities.t	2001/04/11 17:22:46	1.4
@@ -1,6 +1,6 @@
 use HTML::Entities qw(decode_entities encode_entities);
=20
-print "1..8\n";
+print "1..9\n";
=20
 $a =3D "V&aring;re norske tegn b&oslash;r &#230res";
=20
@@ -65,6 +65,10 @@ print "not " unless decode_entities("abc
                                     "abc&def&ghi&abc;&def;";
 print "ok 8\n";
=20
+# Decoding of &apos;
+print "not " unless decode_entities("&apos;") eq "'" &&
+                    encode_entities("'", "'") eq "&#39;";
+print "ok 9\n";
=20
=20
 __END__