Re: Unregistered charset values in HTTP 1.1, the ISO-8859-* values
Erik van der Poel (erik@netscape.com)
Fri, 12 Jul 1996 11:04:48 -0700
Olle Jarnefors wrote:
>
> b) It's unclear what charset registration the preferred
> MIME name "GB2312" shall designate: the ISO-registered
> character set GB_2312-80 (MIBenum: 57) or the only
> incompletely described GB2312 (MIBenum: 2025), which
> of course already has the proposed preferred MIME name
> as it's principal name. It is also possible that these
> two registrations actually refer to exactly the same
> character set, and should be merged in the IANA registry.
No. While I agree that the name "gb2312" is lousy ("euc-cn" would have
been better), the descriptions provided with the IANA registrations for
gb2312 and gb_2312-80 are:
Name: GB_2312-80 [RFC1345,KXS2]
MIBenum: 57
Source: ECMA registry
Alias: iso-ir-58
Alias: chinese
Alias: csISO58GB231280
Name: GB2312
MIBenum: 2025
Source: Chinese for People's Republic of China (PRC) mixed one byte,
two byte set:
20-7E = one byte ASCII
A1-FE = two byte PRC Kanji
See GB 2312-80
PCL Symbol Set Id: 18C
Alias: csGB2312
Since the GB_2312-80 entry mentions iso-ir-58, it is clear that this is
a single character set in the ISO 2022 sense. I.e. this charset does not
include the single-byte ASCII characters.
The gb2312 entry explicitly mentions single-byte ASCII, so it is clear
that this is referring to the charset actually used on the net, whose
name might more properly be "euc-cn" (to follow euc-kr's and euc-jp's
lead).
By the way, there is an RFC (1922) that proposes even more names for
some of the same things:
ftp://ds.internic.net/rfc/rfc1922.txt
This document mentions "cn-gb" and "cn-big5", which appear to be the
same as "gb2312" and "big5" respectively, but they have not been added
to the IANA registry yet.
Anarchy and chaos rule.
Erik