Re: html, http, urls and internationalisation
Mike_Spreitzer.PARC@xerox.com
Tue, 30 Jan 1996 22:06:19 PST
I don't understand something about coping with URLs printed in newspapers,
business cards, etc. In Unicode, there are multiple ways to code a given
character. For example, Unicode includes Latin-1, which includes O-umlaut.
Unicode also has an umlaut modifier, so that the same character can be coded as
the two-code sequence "umlaut, O". Do people who enter URLs have to be careful
to do so in a certain canonical way? Does a server have to canonicalize URLs
it receives? What about the other parts of a URL (e.g., FQDN --- does the DNS
have to canonicalize lookups)? What about characters that appear similar
enough that the printing quality --- and the expertise of the reader --- might
not be enough to make the distinction? What about distinctions --- such as
that between the Greek letter pi and the math symbol pi --- that are not
manifest in a printed glyph?