Note on wwwbot.pl in libwww-perl 0.20
bcutter@pdn.paradyne.com
Thu, 21 Jul 94 14:16:37 EDT
Was just looking through the code in the latest v0.20 and wanted to
mention that the wwwbot.pl code that I submitted does not take full
advatnage of the global setting of HTTP headers, like User-Agent...
While the wwwbot routines will use &www'request (which propogates the
headers), currently &wwwbot'allowed requires as it's second argument
the program's User-Agent value. this routine should be modified so
if no second argument is passed, and User-Agent has been set globally,
then it should use the global value..
Here's the top of the &wwwbot'allowed routine..
sub allowed
{
local($url, $user_agent) = @_;
...
}
and It should be like:
sub allowed
{
local($url) = shift(@_);
local($user_agent);
if (@_)
{
$user_agent = shift(@_);
}
else
{
$user_agent = &www'get_def_header('http','User-Agent');
unless($user_agent)
{
warn "wwwbot'allowed: requires 2nd argument of User-Agent header";
return;
}
}
...
}
...Which would require the routine &www'get_def_header()
(rather than accessing the global variables direct)
sub get_def_header
{
local($scheme,$name) = @_;
local($pos);
undef $pos;
for ($[ ... $#DefaultHeaders)
{
$pos = $_ if (($name eq $DefaultHeaders[$_]) &&
($scheme eq $DefHeaderSchemes[$_]));
}
if (defined($pos))
{
return($DefaultValues[$pos]);
}
return;
}
...looking at the warn "" statement in wwwbot'allowed, it brings up
the question of how to handle "standard" errors...
For example, it would be nice if there was a standard parameter passed
back with a standard error message, like those found in errno.h ...
With the above example, I'd like to return a error value like
'Incorrect number of Arguments' ... following errno.h, this would
be 'EINVAL' for "Invalid Argument"
...The alternative might be to set a global error handling routine,
like onexit() that would be called when there is a error with the
details of the error. The routine would have the option of
shutting down the program nicely (closing/syncing open files) or
returning some value that tells the www library to ignore the
error (if possible) and keep on going...
Thoughts?
-Brooks