Re: Using Extnded Common Log Format
Mark Avnet (mavnet@banta-im.com)
Wed, 30 Jun 1999 16:19:18 -0400
Joi Ellis wrote:
>
> On Wed, 30 Jun 1999, Mark Avnet wrote:
>
> > Hi. I am in the process of modifying wwwstat 2.0 to accept extended
> > common log format. To the original information, I am adding referrer,
> > browser, and platform. Right now, I am trying to get referrer on
> > there. I have added everything necessary to the code, but I think there
> > is a problem in the way I am parsing the log file, as I am now getting
> > 0s for all of the entries. Whereas the original line of code for common
> > logs is:
> >
> > ($host, $rfc931, $authuser, $timestamp, $request, $status, $bytes) =
> > /^(\S+) (\S+) (\S+) \[([^\]]*)\] \"([^"]*)\" (\S+) (\S+)/;
> >
> > my new linw for parsing the extended log format is:
> >
> > ($host, $rfc931, $authuser, $timestamp, $request, $status, $bytes,
> > $ref,
> > $null1, $null2, $browser, $platform) =
> > /^(\S+) (\S+) (\S+) \[([^\]]*)\] \"([^"]*)\" (\S+) (\S+)
> > \"([^"]*)\ (\S+) (\S+) (\S+) (\S+)/;
> >
> > An example of the logs that I am parsing is:
> >
> > 1Cust105.tnt17.dfw5.da.uu.net - - [20/Jun/1999:11:31:11 -0400]
> "GET /home.htm HTTP/1.1" 200 3455
> "http://www.dickinson.com/"
> "Mozilla/4.0 (compatible; MSIE 4.01; Windows 98)"
>
> Oh, you didn't look for the " characters around your browser info.
>
> I don't think you can safely split the last quoted thing into
> null, null, browser, platform. That string is provided by the
> browser itself, and you can't trust them all to provide it in that
> one format.
>
> Try:
> ($host, $rfc931, $authuser, $timestamp, $request, $status, $bytes,
> $ref, $browser) =
> /^(\S+) (\S+) (\S+) \[([^\]]*)\] \"([^"]*)\" (\S+) (\S+) \"([^"]*)\" \"([^"]*\)"/;
>
> Once you get $browser out of there you can try parsing out the platform
> and browser identifier separately.
>
> > There is doubtless a mistake in how I have done this. Any ideas as to
> > wait the problem could be?
> >
> > Thanks a lot.
> >
> > Mark S. Avnet
> > Banta Integrated Media
> > Cambridge, MA
> >
>
> --
> Joi Ellis joi.ellis@cdc.com
> Outsourcing Solutions
> Control Data Systems gyles19@visi.com, http://www.visi.com/~gyles19/
>
> No matter what we think of Linux versus FreeBSD, etc., the one thing I
> really like about Linux is that it has Microsoft worried. Anything
> that kicks a monopoly in the pants has got to be good for something.
> - Chris Johnson
Thank you. It is properly parsing now. For some reason my totals are
coming out different than before, and the referrer stats are not showing
up right, but that's probably just some simple debugging. At least I'm
getting the referrer now from each log.
Mark