Roy T. Fielding,
Maintaining Distributed Hypertext Infostructures: Welcome to MOMspider's Web.


This paper was presented at the First International World-Wide Web Conference (WWW94) in Geneva, Switzerland, May 25-27, 1994. This preliminary version is also available in A4 PostScript and US 8.5x11 PostScript.

Contents

  1. Introduction
  2. The Maintenance Problem
  3. Limitations of Existing Maintenance Methods
  4. Automated Traversal as a Maintenance Solution
  5. MOMspider Design
    1. Functionality
    2. Efficient Use of Network Resources
    3. Being Friendly to Service Providers
  6. The Need for Visible Metainformation
  7. Conclusions and Future Research
Acknowledgements
References
About the author
Appendix A: An example instruction file
Appendix B: An example generated index
Appendix C: An example avoid file
Appendix D: An example sites file

Abstract

Most documents made available on the World-Wide Web can be considered part of an infostructure -- an information resource database with a specifically designed structure. Infostructures often contain a wide variety of information sources, in the form of interlinked documents at distributed sites, which are maintained by a number of different document owners (usually, but not necessarily, the original document authors). Individual documents may also be shared by multiple infostructures. Since it is rarely static, the content of an infostructure is likely to change over time and may vary from the intended structure. Documents may be moved or deleted, referenced information may change, and hypertext links may be broken.

As it grows, an infostructure becomes complex and difficult to maintain. Such maintenance currently relies upon the error logs of each server (often never relayed to the document owners), the complaints of users (often not seen by the actual document maintainers), and periodic manual traversals by each owner of all the webs for which they are responsible. Since thorough manual traversal of a web can be time-consuming and boring, maintenance is rarely or inconsistently performed and the infostructure eventually becomes corrupted. What is needed is an automated means for traversing a web of documents and checking for changes which may require the attention of the human maintainers (owners) of that web.

The Multi-Owner Maintenance spider (MOMspider) has been developed to at least partially solve this maintenance problem. MOMspider can periodically traverse a list of webs (by owner, site, or document tree), check each web for any changes which may require its owner's attention, and build a special index document that lists out the attributes and connections of the web in a form that can itself be traversed as a hypertext document. This paper describes the design of MOMspider and how it was influenced by the nature of distributed hypertext maintenance and requirements for the good behavior of any web-traversing robot. It also includes discussion of the efficiency requirements for maintaining world-wide webs and proposed changes to HTML and HTTP to support distributed maintenance. The paper concludes with a short description of MOMspider's future and pointers to its freeware distribution site.

[Continue to Introduction or Back to Contents]


Roy Fielding <fielding@ics.uci.edu>
Department of Information and Computer Science
University of California, Irvine, CA 92717-3425
Last modified: Thu Jun 16 05:46:56 1994