Re: Pipelining and compression effect on HTTP/1.1 proxies
Benjamin Franz (snowhare@netimages.com)
Wed, 23 Apr 1997 10:04:58 -0700 (PDT)
On Wed, 23 Apr 1997, Henrik Frystyk Nielsen wrote:
> At 03:44 PM 4/22/97 -0700, Benjamin Franz wrote:
>
> >That is an *exceptionally large* HTML document - about 10 times the size
> >of the average HTML document based on the results from our webcrawling
> >robot here (N ~= 5,000 HTML documents found by webcrawling). Very few web
> >designers would put that much on a single page because they are aiming for
> >a target of 30-50K TOTAL for a page - including graphics.
>
> It would be interesting to elaborate a bit on getting a better impression
> on what the distribution of web pages is. A sample of 5000 is not big
> enough to put *'s around your conclusions. I know that there are many cache
> maintainers and maybe even indexers on this mailing list. Benjamin, what if
> you tried getting these people to take a snapshot of their caches and get
> the sizes of the HTML pages? It would be very useful information to a lot
> of us!
Dammit, you are just begging for me to create a Squid store.log analysis
tool. :)
> >As noted: deflate and other compression schemes do much better on large
> >text/* documents than small ones. Using an overly large document gives a
> >misleading comparision against the short window compression that modems
> >perform by basically allowing deflate a 'running start'. You should do the
> >comparision using 3-4K HTML documents: The whole test document should be
> >only 3-5K uncompressed and 1-2K compressed.
>
> I tried to do this with the page
>
> http://www.w3.org/pub/WWW/Protocols/HTTP/Performance/Compression/PPP.html
>
> which is 4312 uncompressed and 1759 compressed. It still gives a 30%
> increase in speed and a 35% gain in packets. Below that size, the number of
> TCP packets begin to be the same and therefore little difference is to be
> expected.
But remember this is _only_ on the text/* document portion of the traffic
- which is itself only around 13% of the total traffic. So, basically you
save 30% in time on 13% of the traffic - or about a net 3.9% savings. By
your own figures this is even worse than the figures I gave - which
estimated a net 7.5% savings (based on a much higher compression estimate
of 57% for text/*).
> Note, this is using default compression _including_ the dictionary.
> Intelligent tricks can be played by making a pre-defined HTML-aware
> dictionary in which case the win will be bigger.
Even if it doubles the compression efficiency, you would not crack 8% net
savings.
> >Again, the document used was around 10 times the size of the typical HTML
> >document. This should be re-done with more typical test documents. In
> >fact, it would probably be a good idea to test multiple sizes of documents
> >as well as realistic mixes of text/* and image/* to understand how
> >document size and mix affect the results of compression and pipelining.
>
> My point here was that the size may not be that bad after all - considering
> the effect of style sheets. As style sheets may be included in the HTML
> document this may cause the overall size of HTML documents to increase.
> Likewise, it will make a lot of graphics go away, as it gets replaced by
> style sheets.
I doubt it. The graphics load is not determined by things that stylesheets
will affect ultimately: Designers would put more and higher quality
graphics in than they do today if it wouldn't slow the load to
unacceptable levels. Byte hungry designers will implement *external*
stylesheets and scripting to get the cache win. They will then use the
freed bytes will add *more* graphics and multi-media to do things
stylesheets still can't.