libwww, a different approach

Graham Barr (bodg@tiuk.ti.com)
Wed, 15 Mar 95 09:41:17 GMT


Those of you who are also on the perl5-porters mailing list will probably 
have noticed that I have been doing a fair bit of work on netwotk based 
packages. The reason for this is that for quite a while I have been working 
on my own perl5 WWW library. But as this version WWW library is becomming 
very popular I intend to scrap what I have done. Although I would like to 
pass on ideas about what I did.

My class hierarchy was completely different and I feel it has several 
advantages over the hierarchy adopted by libwww-perl, but the main one 
being that protocols are loaded on demand and nothing about any protocol
is 'hard-coded' into any package. This made the adding of a protocol as 
simple as writing it and placing the file in the correct place.

My other goal was to write a few specific packages as possible and use those 
from perl5-porters (eg IPC::Chat, Net::FTP etc) which some people will know
that I have shown a great interest in the development of these.

Here is a desctiption of the main WWW classes/packages I used

WWW::

  Top level package. provides a function for registering callback routines
  used by various schemes (eg 'mailto' call callback to invoke an editor
  or do something else)

WWW::Response
  
  Provide support for the WWW response codes. Each code is represented as
  a blessed object. Uses OVERLOAD to allow easy access to both the code
  number and string.

  provides one function (find) which allows the user to locate a code or create
  a new one.

WWW::Scheme

  This is the base package for all the schemes.
  The package performs the following tasks:

   - Locates and AUTOLOAD's schemes when required
   - Provides a method (SchemeClass) for scheme registration, all inherited
      schemes should call this. This is similar to nTk::Widget
   - Keeps track of all loaded schemes
   - Provides user interface for all schemes to the WWW requests get,put etc.
   - Provides base functions for schemes (GET,PUT etc) which all call fail
   - Provides a fail method for schemes to inherit

WWW::URL

  This package is the base URL object which inherits from  WWW::Scheme
  and adds the following
    - URL Creation via WWW::URL->new(...);
    - useful functions and variables to aid parsing
    - url_encode & url_decode functions

WWW::Proxy

  Provides method to perform proxy-ing
    proxy() Given a url return a reference to a WWW::URL object which
            represent the server to be connected to. Could be just the URL
            passed in

WWW::http WWW::ftp WWW::mailto WWW::telnet WWW::url WWW::file

  Implementations of specific WWW protocol schemes. All inherit from WWW::URL
  and some from WWW::Proxy

Current Inheritance tree
========================

                    WWW::Scheme
                        |
                    WWW::URL     
                        |        
      +----------+------+----+   
      |          |           |   
  WWW::file  WWW::mailto     |  WWW::Proxy
                             |      |
           +---------+-------+--+---+-----+
           |         |          |         |
       WWW::url  WWW::http  WWW::ftp  WWW::telnet  

The interface required by each scheme is

 ->parse($url)  parse the URL into segments
 ->stringify    reconstruct the URL
 ->GET($object, ...)
 ->PUT($object, ...)
   etc          The methods are called on the server object and the requested
                object passed as the first parameter. They will be the same if
                proxy-ing is not being performed.

How it works
============

A new URL is created by

  WWW::URL->new('ftp://ftp.ti.com/pub/');

WWW::URL->new extracts the scheme from this url and calls the WWW::Scheme
method ->scheme to set the scheme. This setting involves locating the package
WWW::scheme (eg WWW::http), blessing the object into this package and calling
the method parse.

Each package is responsible for parsing its own url as there is no generic
format for a url (eq mailto:bodg@tiuk.ti.com and http://....). Each package
,for the same reasons, is also responsible for stringifying the url.

The user then call methods on this object to retreive/send the object.

the user interface functions (get,put etc) call the method proxy which returns
a WWW::URL object to represent the server to connect to and scheme to use.
This could be the same URL as the object. the appropriate method (GET,PUT etc)
is then called on the server object passing the object as the first argument.

The main advantage I found is that if an unknown scheme it specified it
is blessed into the package WWW::url. This package does not define any methods
except parse and stringify but it does allow for any currently unknown scheme 
to be proxied via a different scheme (eq http)

The methods GET PUT etc create a MIME message and set the content and response
via methods ->content and ->status. Then return the contents or undef.

If people are interested in seeing what I have done then I will put it on a 
ftp site

Regards,
Graham.

--
        .-----------------------------------------------------------.  
  ////  | Graham Barr                Email: bodg@tiuk.ti.com        |  \\\\ 
 |  00  | VLSI Cell Designer            or: bodg@ti.com             |  00  |
 O   ^  | MOS Design                TI MSG: BODG                    |  ^   O
  \ ~/  | Texas Instruments Ltd      Phone: +44 (0)1234 22 3419     |  \~ / 
        | ENGLAND                      Fax: +44 (0)1234 22 3331     |
        `-----------------------------------------------------------'