Roy T. Fielding,
Maintaining Distributed Hypertext Infostructures: Welcome to MOMspider's Web.


3. Limitations of Existing Maintenance Methods

Maintenance of World-Wide Web infostructures currently relies upon the diligence of each document owner and a small stream of maintenance information. The primary source of that information is from users, commonly in the form of complaints from those who encountered broken links or malformed documents. However, information providers cannot rely on user complaints. Most users of the Web are very tolerant of the errors encountered -- after all, since the information is often provided by volunteers, many users feel that it would be impolite or disrespectful to point out any problems. Furthermore, many documents do not explicitly indicate to whom a complaint should be sent. Even when a complaint is received, it is often directed to the wrong person (i.e. the one maintaining the source of a link instead of its destination).

A second source of maintenance information is provided by the logfiles of each server. The server records (or attempts to record) each document request and, if an error occurred, the nature of that error. Such information can be extremely useful for identifying requests for documents that have moved and those that have misspelled URLs. However, only the server managers have access to that information. The error is often never relayed to the document maintainer, either because it is not recognized as a document error (users frequently mistype document URLs when accessing them via an "Open..." dialog) or because the origin of the error is not apparent from the error message. Although this situation will improve as WWW clients begin using the Referer header in requests [HTTP], log information will never be sufficient to cover maintenance needs. Logs cannot reveal failed requests that never made it to the server, nor can they support preventive maintenance and problems of changed document content.

Although rarely applied, static analyzers of document infostructures can be a third source for maintenance information. One such tool is the html_analyzer [Pitkow93]. It can examine a local infostructure and validate the document links for accessibility, completeness, and consistency. This type of information could be very useful for infostructure maintenance, but tends to be applied more as a means for one-time verification than as a regular maintenance process. Also, it fails to provide adequate support across distributed infostructures and for situations in which the document contents are outside the control of the program user.

Given these limitations, the only existing method for performing adequate maintenance of WWW infostructures is the brute-force one of periodic manual traversals, by each owner, of all the webs for which they are responsible. Such traversals are repetitive, time-consuming, and boring -- a guaranteed recipe for human inattentiveness. They also require a great deal of duplication of effort for overlapping infostructures, as each owner retests the same link destinations. The result is that maintenance is rarely or inconsistently performed and the infostructure eventually becomes corrupted. What is needed is a means for automating this traversal process such that a human maintainer (owner) of an infostructure need only investigate documents which are likely to require maintenance effort -- those that are known to have changed, expired, or which contain broken links.

[Continue to Automated Traversal as a Maintenance Solution or Up to Contents]


Roy Fielding <fielding@ics.uci.edu>
Department of Information and Computer Science
University of California, Irvine, CA 92717-3425
Last modified: Wed Jun 15 06:23:52 1994