Network Working Group D. Connolly
Request for Comments: nnnn World Wide Web Consortium (W3C)
Obsoletes: 1866, 2070, 1980, 1867, 1942 L. Masinter
Category: Informational Xerox Corporation
draft-connolly-text-html-02.txt November 16, 1999
The 'text/html' Media Type
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC 2026.
This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups. Note that other groups may also distribute
working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-Drafts
as reference material or to cite them other than as ``work in
progress''.
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Copyright Notice
Copyright (C) The Internet Society (1999). All Rights Reserved.
Abstract
This document summarizes the history of HTML development, and
defines the "text/html" MIME type by pointing to the relevant W3C
recommendations; it is intended to obsolete the previous IETF
documents defining HTML, including RFC 1866, RFC 1867, RFC 1980,
RFC 1942 and RFC 2070, and to remove HTML from IETF Standards
Track.
This document was prepared at the request of the W3C HTML working
group. Please send comments to www-html@w3.org, a public mailing
list with archive at
.
1. Introduction and background
HTML has been in use in the World Wide Web information
infrastructure since 1990, and specified in various informal
documents. The text/html media type was first officially defined
by the IETF HTML working group in 1995 in [HTML20]. Extensions to
HTML were proposed in [HTML30], [UPLOAD], [TABLES], [CLIMAPS], and
[I18N].
The HTML working group closed Sep 1996, and work on defining HTML
moved to the World Wide Web Consortium (W3C). The proposed
extensions were incorporated to some extent in [HTML32], and to a
larger extent in [HTML40]. The definition of multipart/form-data
from [UPLOAD] was described in [FORMDATA]. In addition, a
reformulation of HTML 4.0 in XML 1.0 is being developed [XHTML1].
[HTML32] notes "This specification defines HTML version 3.2. HTML
3.2 aims to capture recommended practice as of early '96 and as
such to be used as a replacement for HTML 2.0 (RFC 1866)."
Subsequent specifications for HTML describe the differences in each
version.
In addition to the development of standards, a wide variety of
additional extensions, restrictions, and modifications to HTML were
popularized by NCSA's Mosaic system and subsequently by the
competitive implementations of Netscape Navigator and Microsoft
Internet Explorer; these extensions are documented in numerous
books and online guides.
2. Registration of MIME media type text/html
MIME media type name: text
MIME subtype name: html
Required parameters: none
Optional parameters:
charset
The optional parameter "charset" refers to the character
encoding used to represent the HTML document as a sequence of
bytes. Any registered IANA charset may be used, but UTF-8 is
preferred. Although this parameter is optional, it is strongly
recommended that it always be present. See Section 6 below
for a discussion of charset default rules.
Note that [HTML20] included an optional "level" parameter; in
practice, this parameter was never used and has been removed from
this specification. [HTML30] also suggested a "version"
parameter; in practice, this parameter also was never used and
has been removed from this specification.
Encoding considerations:
See Section 4 of this document.
Security considerations:
See Section 7 of this document.
Interoperability considerations:
HTML is designed to be interoperable across the widest possible
range of platforms and devices of varying capabilities. However,
there are contexts (platforms of limited display capability, for
example) where not all of the capabilities of the full HTML
definition are feasible. There is ongoing work to develop both a
modularization of HTML and a set of profiling capabilities to
identify and negotiate restricted (and extended) capabilities.
Due to the long and distributed development of HTML, current
practice on the Internet includes a wide variety of HTML
variants. Implementors of text/html interpreters must be prepared
to be "bug-compatible" with popular browsers in order to work
with many HTML documents available the Internet.
Typically, different versions are distinguishable by the DOCTYPE
declaration contained within them, although the DOCTYPE
declaration itself is sometimes omitted or incorrect.
Published specification:
The text/html media type is now defined by W3C Recommendations;
the latest published version is [HTML40]. As of this writing, a
revision, HTML 4.01 [HTML401], is being developed as a revision.
In addition, [XHTML1], also a work in progress, defines a profile
of use of XHTML which is compatible with HTML 4.0 and which may
also be labeled as text/html.
Applications which use this media type:
The first and most common application of HTML is the World Wide
Web; commonly, HTML documents contain URI references [URI] to
other documents and media to be retrieved using the HTTP protocol
[HTTP]. Many gateway applications provide HTML-based interfaces
to other underlying complex services. Numerous other applications
now also use HTML as a convenient platform-independent multimedia
document representation.
Additional information:
Magic number:
There is no single initial string that is always present for
HTML files. However, Section 5 below gives some guidelines
for recognizing HTML files.
File extension:
The file extensions 'html' or 'htm' are commonly used, but
other extensions denoting file formats for preprocessing are
also common.
Macintosh File Type code: TEXT
Person & email address to contact for further information:
Dan Connolly
Larry Masinter
Intended usage: COMMON
Author/Change controller:
The HTML specification is a work product of the World Wide Web
Consortium's HTML Working Group. The W3C has change control over
the HTML specification.
Further information:
HTML has a means of including, by reference via URI, additional
resources (image, video clip, applet) within the base
document. In order to transfer a complete HTML object and the
included resources in a single MIME object, the mechanisms of
[MHTML] may be used.
3. Fragment Identifiers
The URI specification [URI] notes that the semantics of a fragment
identifier (part of a URI after a "#") is a property of the data
resulting from a retrieval action, and that the format and
interpretation of fragment identifiers is dependent on the media
type of the retrieval result.
For documents labeled as text/html, the fragment identifier
designates the correspondingly named element; any element may be
named with the "id" attribute, and A, APPLET, FRAME, IFRAME, IMG
and MAP elements may be named with a "name" attribute. This is
described in detail in [HTML40] section 12.
4. Encoding considerations
Because of the availability within HTML itself for using character
entity references, documents that use a wide repertoire of
characters may still be represented using the US-ASCII charset and
transported without encoding. However, transport of text/html
using a charset other than US-ASCII may require base64 or
quoted-printable encoding for 7-bit channels.
As with all MIME text subtypes, the canonical form of "text/html"
MUST always represent a line break as a sequence of a CR byte value
(0x0D) followed by an LF (0x0A) byte value. Similarly, any
occurrence of such a CRLF sequence in "text/html" MUST represent a
line break. Use of CR byte values and LF byte values outside of
line break sequences is also forbidden. This rule applies
regardless of the character encoding ('charset') involved.
Note, however, that the HTTP protocol allows the transport of data
not in canonical form, and, in particular, with other end-of-line
conventions; see [HTTP] section 3.7.1. This exception is commonly
used for HTML.
HTML sent via email is still subject to the MIME restrictions; this
is discussed fully in [MHTML] Section 10.
5. Recognizing HTML files
Almost all HTML files have the string ".
[HTML30] Raggett, D. "HyperText Markup Language Specification Version
3.0" (Available at ).
September 1995.
[HTML32] Raggett, D., "HTML 3.2 Reference Specification", W3C
Recomendation, January 1997. Available at
.
[HTML40] Raggett, D., et al., "HTML 4.0 Specification", W3C
Recommendation, December 1997. Available at
.
[HTML401] Raggett, D., et al., "HTML 4.01 Specification", W3C Proposed
Recommendation (work in progress), August 1999. Available at
.
[HTTP] Gettys, J., et al., "Hypertext Transfer Protocol -- HTTP/1.1",
RFC 2616, June 1999.
[I18N] Yergeau, F., et al. "Internationalization of the Hypertext
Markup Language", RFC 2070, January 1997.
[MHTML] Palme, J., et al., "MIME Encapsulation of Aggregate Documents,
such as HTML (MHTML)", RFC 2557, March 1999.
[MIME] Freed, N., and Borenstein, N., "Multipurpose Internet Mail
Extensions (MIME) Part Two: Media Types", RFC 2046, November
1996.
[TABLES] Raggett, D., "HTML Tables", RFC 1942, May 1996.
[UPLOAD] Nebel, E., and Masinter, L. "Form-based File Upload in HTML",
RFC 1867, November 1995.
[URI] Berners-Lee, T., Fielding, R., and Masinter, L., "Uniform
Resource Identifiers (URI): Generic Syntax", RFC 2396, August
1998.
[XHTML1] "XHTML 1.0: The Extensible HyperText Markup Language: A
Reformulation of HTML 4.0 in XML 1.0", W3C Proposed
Recommendation (work in progress). August 1999. Available at
.