UCI/ICS

wwwstat
HTTPd Logfile Analysis Software

The wwwstat program will process a sequence of HTTPd common logfile format (CLF) access_log files and output a log summary in HTML format suitable for publishing on a website.

The splitlog program will process a sequence of CLF (or CLF with a prefix) access_log files and split the entries into separate files according to the requested URL and/or vhost prefix.

Both programs are written in Perl and, once customized for your site, should work on any UNIX-based system with Perl 4.036, 5.002, or better.

Qiegang Long, formerly at UMass, has released a program called gwstat that takes the output from wwwstat and generates a set of graphs to illustrate your httpd server traffic by hour, day, week or calling country/domain.

A mailing list, now shut down, was created for discussion and support of wwwstat development.

The wwwstat package is available as a gzip'd tar file via both HTTP and FTP. The distribution consists of the following files:

One of the nicest things about wwwstat is that it does not make any changes to or write any files in the server directories. Thus, this program can be safely run by any user with read access to the httpd server's access_log. This allows people to do specialized summaries of just the things they are interested in.

A plethora of options for creating customized reports and for making it easier for webmasters to maintain their server are provided. See the wwwstat manual and splitlog manual for further description of the options and examples of their use.

wwwstat can be used with any server that supports the common logfile format, and on any system with a working perl 4.036 or 5.002+ interpreter. However, I prefer the Apache server and have only tested wwwstat on SunOS and Solaris-based systems.

Version History

Patch 2.01 --- November 12, 1996
Fixed bug in parsing -daily and -nodaily options. [Jesus Calle Vaquero]

Version 2.0 --- November 4, 1996
Added splitlog script for splitting logfile by virtual host or URL path.
Added manual for splitlog in all three formats.
Changed manpage.* to wwwstat.html and wwwstat.ps.
Changed wwwstatrc to wwwstat.rc to be more PC-friendly (yuck).
Changed mechanism for finding the configuration files on @INC.
Made timestamp parser slightly more lenient (for phttpd brokenness).
Removed unnecessary split on address.

Version 2.0b1 --- October 7, 1996
Added user and system config files [suggested by everyone].
Replaced old getopts with hand-coded function, which means that multiple search options are allowed (they get OR'd together).
Added ability to read in any number of old summary files.
Rewrote inclusion mechanism to parse by section heading.
Added options to enable/disable output of CGI headers.
Added options to enable/disable output by section.
Added options to change sort ordering function for each section.
Added options to display only top N for each section.
Added option to display sections for both sorted top N and all entries.
Added options to enable/disable creating link to each archive entry.
Added options to truncate archive URL by level and/or filename.
Added -R option for displaying daily stats in reverse order [Reinier Post].
Added -m|M options for selection based on the HTTP method.
Added option to lookup DNS (with cached results) on unresolved addresses.
Added option to disable escaping of "+" and "." in -aAnN regexps.
Added config ability to exclude or replace any URL matching pattern with a special string in the archive listing.
Added config ability to exclude or replace any subdomain match with a special string in the domain listing (overrides country-code).
Removed parsing of srm.conf and -s option for greater portability.
Removed -i include option (now we just look at first line of file).
Added "--" as last option indicator (to avoid treating files as options).
Added "-" as filename to indicate standard input.
Added "+" as filename to indicate the default logfile.
Added summary and estimates for all HTTP/1.1 status codes.
Added %Y pattern for placing year in link to last summary.
Added -X option for setting last summary URL on command-line.
Added -H option for setting HTML title and heading text.
Made the DirectoryIndex a perl regex so that it can match multiple forms of index/overview/... file or script names.
Replaced country-codes file and initialization with the %DomainMap table in domains.pl, which will make it easier to override names.
Now displays empty tables rather than error on no matching data.
Added workaround for perl4 bug of overflowing %12d in printf.
Stopped reversing of already-reversed unresolved subdomains.
Forced parsing of timestamp to be more discriminating [Bob Kieronski].
Improved efficiency of matches containing variable patterns [Dan Klein].
Added example perl script for monthly log rotation.
Added wwwerrs perl script for analyzing the error_log.
Changed distribution URL for dual http/ftp access to my site.
Added a new man page.

Added the ability to exclude today via the -D 'today' option (or include only today via -d 'today' option). This vastly simplifies nightly runs to generate the previous day's summary.
Removed NULLs from the logfile entry before processing [Terry West].
Assumed 200 response if "-" (unknown) is in logfile.
Replaced any %7E with the original tilde "~" in the archive section.
Fixed dumb browsers' inability to parse relative URLs (on 200 status).
Added example for globbing "hidden" directories.

Version 1.01 --- April 24, 1994
the old version, just in case you need it.
See the file Changes for more version information.

Credits

This software has been developed by Roy Fielding as part of the Arcadia and WebSoft projects at the University of California, Irvine. wwwstat was originally based on a multi-server statistics program called fwgstat-0.035 by Jonathan Magid which, in turn, was heavily based on xferstats (packaged with the version 17 of the wuarchive FTP daemon) by Chris Myers.

See the license information for complete details on use and redistribution of the wwwstat package.

This work was sponsored in part by the Defense Advanced Research Projects Agency under Grant Number MDA972-91-J-1010 and F30602-94-C-0218. This software does not necessarily reflect the position or policy of the U.S. Government and no official endorsement should be inferred.


Roy Fielding
Department of Information and Computer Science,
University of California, Irvine, CA 92697-3425
Last modified: 15 Feb 2001