libwww-ada95 library design notes ================================= The purpose of a WWW protocol library is to simplify the task of performing a request on a resource (as identified by a URI) and processing the response. This is done by abstracting all of the possible resource access mechanisms (filesystem, FTP, HTTP, NNTP, WAIS, z39.50, ...) into a single request interface that looks much like HTTP, implementing a client-side network interface for each access mechanism that converts the underlying protocol into our HTTP-like interface, and providing utility functions for handling the interface's response. Our initial library will only implement the file and HTTP access mechanisms, but will do so in a way that will make it easy to load other mechanisms as they become available. The interface provided by libwww is a parsed (or tokenized) form of HTTP, since acting as a gateway to other protocols is one of HTTP's design goals. Our internal library processing works in much the same way as a traditional web browser might use a network proxy: the application gives the interface a request to make on its behalf, in the form of an HTTP method, a URI, a list of request options/metadata, and an optional payload (the request message body). The interface returns a status code, a list of response options/metadata, and an optional payload (the response message body). A basic library can be implemented simply by storing the method and URI as strings, the options/metadata as a table, opening a network connection, writing the combined request to the connection, and reading the response from the connection while parsing it into status, options/metadata, and body. However, such a library would be extremely inefficient due to the the amount of data copying between applications, the duplication of parsing that would take place every time the application reads or sets a message header field, the inability to use a single connection for multiple requests, and the amount of memory and time needed to read the entire body before passing control to the caller. An efficient library needs to manage connections for use by multiple requests, include data structures that incorporate both parsed and unparsed data for as-needed and only-once parsing, pass each message body as a stream object that can be read/written on-the-fly, and do all of the above such that multiple requests can be processed concurrently. And, let's not forget, the resulting interface needs to remain simple, since simplicity is the key enabling feature of the Web. Primary Library Components ========================== A (somewhat misleading) uses diagram, excluding things like hash tables, lists, priority queues, etc. Application | .-------------+-------------------. | | | | | | .---------------. | .----------------------------------. | URI Reference | | | Satisfy[Request] | `---------------' | `----------------------------------' | | | | | .---+--------+--. .---------+----------. | | | | | | | .---------. .----------. .------. .------. .-------. | | Request | | Response | | file | | http | | Cache | | `---------' `----------' `------' `------' `-------' | | | | | | | .--------+-----+-----. .----+-----. `---------+----------' | | | | | | | .-----. .--------. .---------. .--------. | | URI | | Method | | Message | | Status | | `-----' `--------' `---------' `--------' | | | .-------+-------. .--------------+ | | | .--------------. .-----------------. | Header_Field | | Onions.Xstreams | `--------------' `-----------------' | .-----------------+--------+-----------. | | | .------------. .-------------. .--------------------. | Dir_Stream | | File_Stream | | Connection_Manager | `------------' `-------------' `--------------------' | .----------------. | Channel_Stream | `----------------' | .---------. | Sockets | `---------' | Internet The library might also include a dispatcher for associating viewers with retrieved media types, but that would be independent of the basic request interface above. Uniform Resource Identifiers (URI) ================================== Uniform Resource Identifiers (URIs) provide a simple and extensible means for identifying a resource. A full definition of what that means is provided in the specification under http://www.ics.uci.edu/~fielding/url/ The uniform syntax is composed of five main components ://?#fragment and the "//" and "?" parts are optional for some schemes. From libwww's perspective, the important components of the URI are the scheme, which is used to select the access mechanism, and the site, which is used to identify the network connection needed if a network access is required to satisfy the request. The path and query components are used only by the protocol translation part of the access mechanism. The fragment component is only used by viewer applications to select or jump to some portion of the retrieval response. The URI object consists of uri_string uri_parsed scheme site host port path query URI Reference ============= A URI Reference is what you might see embedded in an HTML anchor href. It might be a relative reference, so we need to associate it with a base URI for resolution. It consists of ref_string: UB_String; absolute_form: URI_object; base_URI: access URI_object; Yet to be described ... ======================= Method Status Message Header_Field Request Response Satisfy (Request) Cache file http ftp