|
wwwstat
HTTPd Logfile Analysis Software
|
|
The wwwstat program will process a sequence of
HTTPd common logfile format (CLF) access_log files and output a log
summary in HTML format suitable for publishing
on a website.
The splitlog program will process a sequence of CLF
(or CLF with a prefix) access_log files and split the entries into
separate files according to the requested URL and/or vhost prefix.
Both programs are written in Perl and, once customized for
your site, should work on any UNIX-based system with Perl 4.036, 5.002,
or better.
Qiegang Long, formerly at UMass, has released a program called
gwstat
that takes the output from wwwstat and generates a set of graphs to
illustrate your httpd server traffic by hour, day, week or calling
country/domain.
A mailing list, now shut down,
was created for discussion and support of wwwstat
development.
The wwwstat package is available as a
gzip'd tar file via both
HTTP and
FTP.
The distribution consists of the following files:
- wwwstat manual
-- available in HTML,
postscript, and
nroff formats;
- splitlog manual
-- available in HTML,
postscript, and
nroff formats;
- README
-- general intro and pointers to more information;
- Changes
-- the complete list of changes and version information;
- FAQ
-- frequently asked questions;
- INSTALL
-- installation instructions;
- LICENSE
-- Licensing and Redistribution information;
- example.html
-- an example wwwstat output;
- wwwstat.pl
-- the main perl script for analyzing the access_log;
- wwwstat.rc
-- an example system/user configuration file;
- domains.pl
-- a mapping of Internet domains to country/org names (the old
country-codes file is still
available for users of older stats programs);
- oldlog2new.pl
-- A program for converting NCSA httpd 1.0 and 1.1 access_log files to the
common logfile format;
- splitlog.pl
-- the main perl script for splitting the access_log;
- splitlog.rc
-- an example user configuration file for splitlog;
- wwwerrs.pl
-- an example perl script for reading the error_log;
- monthly.pl
-- an example perl script for monthly log rotation;
- Makefile
-- the makefile (for creating executables).
One of the nicest things about wwwstat is that it does not make any
changes to or write any files in the server directories. Thus, this
program can be safely run by any user with read access to the httpd
server's access_log. This allows people to do
specialized summaries of just the things they are interested in.
A plethora of options for creating customized reports and for making
it easier for webmasters to maintain their server are provided. See
the wwwstat manual and
splitlog manual for further description of
the options and examples of their use.
wwwstat can be used with any server that supports the common logfile
format, and on any system with a working perl 4.036 or 5.002+ interpreter.
However, I prefer the Apache server
and have only tested wwwstat on SunOS and Solaris-based systems.
Version History
- Patch 2.01
--- November 12, 1996
- Fixed bug in parsing -daily and -nodaily options. [Jesus Calle Vaquero]
- Version 2.0
--- November 4, 1996
- Added splitlog script for splitting logfile by virtual host or URL path.
Added manual for splitlog in all three formats.
Changed manpage.* to wwwstat.html and wwwstat.ps.
Changed wwwstatrc to wwwstat.rc to be more PC-friendly (yuck).
Changed mechanism for finding the configuration files on @INC.
Made timestamp parser slightly more lenient (for phttpd brokenness).
Removed unnecessary split on address.
- Version 2.0b1
--- October 7, 1996
- Added user and system config files [suggested by everyone].
Replaced old getopts with hand-coded function, which means
that multiple search options are allowed (they get OR'd together).
Added ability to read in any number of old summary files.
Rewrote inclusion mechanism to parse by section heading.
Added options to enable/disable output of CGI headers.
Added options to enable/disable output by section.
Added options to change sort ordering function for each section.
Added options to display only top N for each section.
Added option to display sections for both sorted top N and all entries.
Added options to enable/disable creating link to each archive entry.
Added options to truncate archive URL by level and/or filename.
Added -R option for displaying daily stats in reverse order [Reinier Post].
Added -m|M options for selection based on the HTTP method.
Added option to lookup DNS (with cached results) on unresolved addresses.
Added option to disable escaping of "+" and "." in -aAnN regexps.
Added config ability to exclude or replace any URL matching
pattern with a special string in the archive listing.
Added config ability to exclude or replace any subdomain match with
a special string in the domain listing (overrides country-code).
Removed parsing of srm.conf and -s option for greater portability.
Removed -i include option (now we just look at first line of file).
Added "--" as last option indicator (to avoid treating files as options).
Added "-" as filename to indicate standard input.
Added "+" as filename to indicate the default logfile.
Added summary and estimates for all HTTP/1.1 status codes.
Added %Y pattern for placing year in link to last summary.
Added -X option for setting last summary URL on command-line.
Added -H option for setting HTML title and heading text.
Made the DirectoryIndex a perl regex so that it can match multiple
forms of index/overview/... file or script names.
Replaced country-codes file and initialization with the %DomainMap
table in domains.pl, which will make it easier to override names.
Now displays empty tables rather than error on no matching data.
Added workaround for perl4 bug of overflowing %12d in printf.
Stopped reversing of already-reversed unresolved subdomains.
Forced parsing of timestamp to be more discriminating [Bob Kieronski].
Improved efficiency of matches containing variable patterns [Dan Klein].
Added example perl script for monthly log rotation.
Added wwwerrs perl script for analyzing the error_log.
Changed distribution URL for dual http/ftp access to my site.
Added a new man page.
Added the ability to exclude today via the -D 'today' option
(or include only today via -d 'today' option). This vastly
simplifies nightly runs to generate the previous day's summary.
Removed NULLs from the logfile entry before processing [Terry West].
Assumed 200 response if "-" (unknown) is in logfile.
Replaced any %7E with the original tilde "~" in the archive section.
Fixed dumb browsers' inability to parse relative URLs (on 200 status).
Added example for globbing "hidden" directories.
- Version 1.01
--- April 24, 1994
- the old version, just in case you need it.
See the file Changes for more version information.
Credits
This software has been developed by Roy Fielding as
part of the Arcadia
and WebSoft projects at the University of California, Irvine.
wwwstat was originally based on a multi-server statistics program called
fwgstat-0.035 by Jonathan Magid which, in turn,
was heavily based on xferstats (packaged with the version 17 of the
wuarchive FTP daemon) by Chris Myers.
See the license information for complete details
on use and redistribution of the wwwstat package.
This work was sponsored in part by the Defense Advanced Research Projects
Agency under Grant Number MDA972-91-J-1010 and F30602-94-C-0218.
This software does not necessarily reflect the position or policy of the
U.S. Government and no official endorsement should be inferred.
Roy Fielding
Department of Information and Computer Science,
University of California, Irvine,
CA 92697-3425
Last modified: 15 Feb 2001