Re: Visitors/visits vs Requests (hits)
Mike Whitaker (mike@cricket.org)
Sat, 19 Jul 97 09:16:38 +0100
On 18/07/1997 10:54 pm, David Welton said:
>To avoid duplication of effort, I have a few questions regarding current
>development of this program.
Time for wwwstat-dev@ics, and the SSH/CVS tree?
>The big advantage that the nasty proprietary software we are currently
>using (hitlist for windoze) has is its ability to count visitors and
>visits. Visitors is pretty simple - the total amount of different ip
>addresses. Visits is not so simple - "a visit is a collection of requests
>that represent all the pages and graphics seen by a particular visitor at
>one time". Ie, they are using some kind of timeout - if no more requests
>from a particular user are received within say 15 minutes, then that
>'visit' is over.
Look at www.ipro.com for the stuff the 'pro's expect.
>This shouldn't be a tremendous amount of work - I think
>a hash with the time in it will work.
Methinks an early step perhaps needs to be a little Perl library for
doing maths on times in a logfile (including timezone stuff, and concepts
like 'yesterday', 'last Monday', 'last month','a day ago'). I'm working
on cleaning up mine.
>Optimization might be trickier (but
>the yucky windoze program takes more than an *hour*, so anything under
>that is good - currently wwwstat takes about 10 minutes for our logs, and
>analog takes around 27 *seconds*).
wwwstat on a day's logs from CricInfo (around 20Mb) runs in around half
an hour on a fairly loaded P100 (logs from 4 different sites). I suspect
it'll be nearer an hour on a day with a live game.
>Other issues, of lesser importance, include some sort of database or text
>file for storing old information so that old logs can be dispensed with,
wwwstat will read its own reports for that. As long as you make sure you
do any filtering by archive name or any other criterion *when* you
generate the report. What we do at CI is run splitlog (with a *big*
user_path_map() function) on a day's logs, dividing it into the different
areas we report on, then wwwstat on each different area, with the default
options so it generates a full (but largely useless for *human*
consumption) report. We then generate daily human reports off these.
As an aside, it would be nice if, for example, -n/-N worked IF you were
working from a summary file AND the report you were generating was ONLY
-archive.
>tracking how many pages each visit includes, which pages are the final
>ones seen during a visit, and similiar things...
I'd love this!
--
Mike Whitaker, Technical Manager, CricInfo Ltd
Phone: +44 1733 766619 (work/fax) +44 1733 894928 (home)
+44 976 271866 (mobile)
Email: mike@cricket.org (work) mike@altrion.org (home)