Note on wwwbot.pl in libwww-perl 0.20

bcutter@pdn.paradyne.com
Thu, 21 Jul 94 14:16:37 EDT


Was just looking through the code in the latest v0.20 and wanted to
mention that the wwwbot.pl code that I submitted does not take full
advatnage of the global setting of HTTP headers, like User-Agent...

While the wwwbot routines will use &www'request (which propogates the
headers), currently &wwwbot'allowed requires as it's second argument
the program's User-Agent value.  this routine should be modified so
if no second argument is passed, and User-Agent has been set globally,
then it should use the global value..

Here's the top of the &wwwbot'allowed routine..

sub allowed
{
    local($url, $user_agent) = @_;
 ...
}

and It should be like:

sub allowed
{
    local($url) = shift(@_);
    local($user_agent);
    if (@_) 
    { 
        $user_agent = shift(@_);
    } 
    else
    {
        $user_agent = &www'get_def_header('http','User-Agent');
	unless($user_agent)
        {
           warn "wwwbot'allowed: requires 2nd argument of User-Agent header";
           return;
        }
    }
 ...
}

...Which would require the routine &www'get_def_header()
(rather than accessing the global variables direct)

sub get_def_header
{
  local($scheme,$name) = @_;
  local($pos); 
  undef $pos;
  for ($[ ... $#DefaultHeaders)
  {
       $pos = $_ if (($name   eq $DefaultHeaders[$_]) &&
                    ($scheme eq $DefHeaderSchemes[$_]));
  }
  if (defined($pos))
  {
       return($DefaultValues[$pos]);
  }
  return;
}

...looking at the warn "" statement in wwwbot'allowed, it brings up
the question of how to handle "standard" errors...

For example, it would be nice if there was a standard parameter passed
back with a standard error message, like those found in errno.h ...

With the above example, I'd like to return a error value like
'Incorrect number of Arguments' ... following errno.h, this would
be 'EINVAL' for "Invalid Argument"

...The alternative might be to set a global error handling routine,
like onexit() that would be called when there is a error with the
details of the error.  The routine would have the option of 
shutting down the program nicely (closing/syncing open files) or
returning some value that tells the www library to ignore the
error (if possible) and keep on going...

Thoughts?

-Brooks