Re: Unregistered charset values in HTTP 1.1, the ISO-8859-* values

Erik van der Poel (erik@netscape.com)
Fri, 12 Jul 1996 11:04:48 -0700


Olle Jarnefors wrote:
> 
> b) It's unclear what charset registration the preferred
>    MIME name "GB2312" shall designate: the ISO-registered
>    character set GB_2312-80 (MIBenum: 57) or the only
>    incompletely described GB2312 (MIBenum: 2025), which
>    of course already has the proposed preferred MIME name
>    as it's principal name. It is also possible that these
>    two registrations actually refer to exactly the same
>    character set, and should be merged in the IANA registry.

No. While I agree that the name "gb2312" is lousy ("euc-cn" would have
been better), the descriptions provided with the IANA registrations for
gb2312 and gb_2312-80 are:

Name: GB_2312-80                                        [RFC1345,KXS2]
MIBenum: 57
Source: ECMA registry
Alias: iso-ir-58
Alias: chinese
Alias: csISO58GB231280

Name: GB2312
MIBenum: 2025
Source: Chinese for People's Republic of China (PRC) mixed one byte, 
        two byte set: 
          20-7E = one byte ASCII 
          A1-FE = two byte PRC Kanji 
        See GB 2312-80 
        PCL Symbol Set Id: 18C
Alias: csGB2312

Since the GB_2312-80 entry mentions iso-ir-58, it is clear that this is
a single character set in the ISO 2022 sense. I.e. this charset does not
include the single-byte ASCII characters.

The gb2312 entry explicitly mentions single-byte ASCII, so it is clear
that this is referring to the charset actually used on the net, whose
name might more properly be "euc-cn" (to follow euc-kr's and euc-jp's
lead).

By the way, there is an RFC (1922) that proposes even more names for
some of the same things:

  ftp://ds.internic.net/rfc/rfc1922.txt

This document mentions "cn-gb" and "cn-big5", which appear to be the
same as "gb2312" and "big5" respectively, but they have not been added
to the IANA registry yet.

Anarchy and chaos rule.


Erik