Re: proposed HTTP changes for charset
Francois Yergeau (yergeau@alis.ca)
Sun, 7 Jul 1996 21:32:42 -0500
> Date: Fri, 05 Jul 1996 17:25:38 -0700
> From: "Roy T. Fielding" <fielding@liege.ICS.UCI.EDU>
>
> I have already covered these questions ad-nauseum.
>
> 1) HTTP has *always* used a default charset value of ISO-8859-1.
This is wrong. Repeating a falsehood ad nauseam doesn't make it any
truer. The default has stopped being ISO 8859-1 ever since the very
first non-Latin-1 document was transmitted - unlabelled.
Servers have never sent charset, and thus have always "defaulted" to
whatever was in the document being sent, Latin-1 or not.
Clients have always defaulted to whatever was the (generally) unique
encoding they could deal with, Latin-1 or not.
Proxies have never paid any attention, and have never defaulted to
anything.
> All implementations to the contrary had KNOWN failure conditions
> and did not work as intended except within locally controlled
> environments.
All implementations that assume Latin-1 have KNOWN failure conditions
and do not work as intended when presented with unlabelled
non-Latin-1 documents, that is outside of locally controlled Latin-1
environments. ISO 8859-1 is just a locally useful charset, whose
relative importance is shrinking daily.
> 2) The HTTP version defines the communication capability of the
> immediately adjacent client or server -- it NEVER indicates that
> feature capabilities of the user agent.
Who said anything else? The version number indicates what protocol
features can be used. A server receiving 1.0 knows it can reply with
MIME-like headers followed by a blank line and an entity, which it
cannot do with with 0.9.
My point was that a client sending 1.1 should guarantee that it can
deal gracefully with a charset parameter (behave no worse than
without charset), something that is not true today with 1.0, despite
the language in RFC 1945.
> 3) None of the issues you have raised involve a technical problem
> with the HTTP/1.1 protocol -- they are POLITICAL problems that
> are an artifact of historical reality, a reality which the IETF
> is not capable of changing.
By attempting to enforce a default charset with no justification in
current practice or technical, the IETF would be making a political
statement: the western hemisphere wants a free ride, let the rest of
the world deal with the charset labelling problem. You are right
that Latin-1 as a default is an historical artefact; it *was* the
default when it was the only encoding, but has not been ever since.
> 4) Labelling the charset with its real value if it is different than
> iso-8859-1 *always* works, both in old an new practice,...
You're seeing only one side of the coin. The world does not revolve
around ISO 8859-1. Today, lots of people *need* to set their browser
to a default other than Latin-1, because most of the stuff they read
is in languages not representable in Latin-1 and is unlabelled. When
they get a document - unlabelled - in Latin-1 (or anything else but
their current default), they see garbage. No, it doesn't always
work.
> 5) Whether or not a client is capable of understanding the charset
> parameter is NOT a function of the protocol version -- ALL HTTP/1.0
> clients MUST understand charset, even if HTTP/1.0 is not a "standard",
> because that is part of the HTTP/1.0 definition (see RFC 1945).
See above. True in theory, but not (yet) in practice. Hence a
server cannot count on the 1.0 label to mean that the client will
grok charset; I know, I've tried, got burned and had to hack
something based on User-Agent:.
> I see no point in continuing this discussion unless you can demonstrate
> a real problem that needs to be solved and can be solved within the
> constraints of HTTP/1.1.
I have described the problem above. It lies outside of a
Latin-1-centric world, so perhaps you have not encountered it before,
but it nevertheless exists. It really makes interoperability a
pipedream in places where Latin-1 is irrelevant, and even more where
multiple encodings are used for one language.
As for the constraints of HTTP/1.1, there has been a lot of loose
talk about backward compatibility, but no definite problem case
cited, except for a potential one with proxies. The latter was, I
think, fixed if only origin servers are required to send charset. If
that is not the case, please say so. I'd rather hear this than
"Let's not discuss it".
Finally, why is HTTP/1.1 constrained to mandate English and ISO
8859-1 as defaults in the new Warning: header? I have not seen any
justification, so let's please fix that. Has anyone noticed that
mandating a default language amounts to telling those who do not
speak that language that they cannot write complete 1.1 servers, lest
they hire a translator?
Regards,
--
Francois Yergeau <yergeau@alis.com>
Alis Technologies Inc., Montreal
Tel : +1 (514) 747-2547
Fax : +1 (514) 747-2561