cache-busting document
Martin Hamilton (martin@mrrl.lut.ac.uk)
Sat, 07 Jun 1997 16:57:00 +0100
--==_Exmh_-1869128596P
Content-Type: multipart/mixed ;
boundary="==_Exmh_-18719327050"
This is a multipart MIME message.
--==_Exmh_-18719327050
Content-Type: text/plain; charset=us-ascii
Hi,
This is some work which has been going on in TERENA's task force on
WWW caching (see <URL:http://www.terena.nl/projects/choc/>) to
document how HTTP servers may be friendly towards caches. We're
looking for comments and feedback, and were also wondering whether
people thought it might be a good idea to put this out as an RFC.
Cheerio,
Martin
--==_Exmh_-18719327050
Content-Type: text/plain ; name="cache-busting.txt"; charset=us-ascii
Content-Description: cache-busting.txt
Content-Disposition: attachment; filename="cache-busting.txt"
Cache-busting - cause and prevention
Martin Hamilton,
Loughborough University
Andrew Daviel,
Vancouver Webpages
$Revision: 1.4 $
Abstract
Cache-Busting is the sometimes deliberate, sometimes inadvertant,
practice of defeating caching. This document explains the nature of
the problem, with relation to proxy caches using the World-Wide Web's
HTTP protocol and outlines some simple measures which may be taken to
make a WWW service more cache friendly.
1. The rationale for caching
A large number of Internet sites have elected to run proxy HTTP [1,2]
servers. These act as intermediaries between end users' World-Wide
Web browsers and the (predominantly) HTTP servers they connect to.
Proxies are typically set up in order that :-
users behind firewalls can have access to WWW services
and/or
commonly requested objects can be cached
Proxy caches offer additional functionality above and beyond the WWW
browser's own built-in cache, since cached objects may be shared with
the entire population of users and with cooperating proxy cache
servers. By contrast - browser caches are typically private to the
individual, or can only be shared with those browsers which have
access to the filesystem on which the cached objects are found.
A cache's effectiveness is usually measured in terms of its "hit rate"
- the ratio of requests which may be satisfied using cached objects.
The goal of the cache administrator is to make this figure as high as
possible, whilst simultaneously maintaining a large cache of objects.
Cache hit rates of 40% to 50% for WWW related traffic are common, for
example. Caching also helps to make more effective use of the
available bandwidth by allowing TCP congestion control algorithms to
work properly - conventional HTTP traffic takes the form of a very
large number of short lived TCP connections, which often defeats TCP
"slow-start" [3] on busy lines.
It follows that proxy caching is highly attractive to Internet Service
Providers and organisations which buy connectivity from them, on a
cost/benefit basis. Cache hits are typically delivered and order of
magnitude faster than cache misses, for leaf node caches at least.
This means that a site which encourages caching will provide the end
user with a much higher perceived quality of service.
2. The cache-busting problem
Support in the HTTP protocol and its implementations for proxies and
caching is something which has essentially been retro-fitted. As a
result, there are many common practices which are incompatible with
it, and either defeat caching completely or reduce the benefits which
derive from it. This is primarily an educational issue involving
developers of WWW services and implementors of HTTP.
It is also the case that caching at the HTTP level can cause problems
for services which make heavy use of usage statistics - e.g. to
provide "hit counts" for advertisers. Users of cached copies of an
object are effectively invisible to the provider of the original
service. This may provide a strong motivation to defeat caching.
3. How to avoid cache-busting
There are a number of positive steps which developers of HTTP based
WWW services may take to be cache-friendly :-
3.1 Things to try
Use a server which supports HTTP 1.1 - this has a number of
additional features to support caching.
Use the Expires header on documents and images where feasible
- this will help caches to decide when your objects are stale.
Use an HTTP server which supports the GET method with the
If-Modified-Since header - this will help browsers and proxy
caches to figure out whether their cached copy of a file is
out of date.
Make CGI programs cacheable where practical :-
Use GET instead of POST for simple queries, since POST results
aren't cached.
Use the path component of the URL to pass information instead of
QUERY_STRING - caches may treat objects with a ? in their URL
as uncacheable.
Use a directory name other than "cgi-bin", since caches can be
expected to treat URLs containing this as uncacheable.
Generate valid Last-Modified and Expires headers.
Handle If-Modified-Since requests.
Ensure that the time is set correctly on the server machine, e.g.
via NTP [4], so that the timestamp information carried in the
HTTP headers makes sense.
Encourage the sharing of links to common graphics and applets, so
that only one URL is used for a given object.
Use client-side imagemaps (USEMAP - [5]) where feasible, since
server-side imagemaps generate HTTP Redirects which are typically
uncacheable.
Use applet and scripting technologies such as Javascript or Java
instead of CGI for form validation, where feasible.
Use trailing slashes (/) for directory names to avoid extra
redirects.
If you use cookies, try to restrict them to the portions of your
server where they're essential, since objects returned with a
Set-Cookie header are uncacheable. Be aware that cookies may
not interact well with proxy cache severs.
Try to use a single name for a server in the hostname part of the
URL - both in the anchors of your html and when using your
browser.
3.2 Things to avoid, except where strictly necessary
Don't use CGI programs which generate uncacheable results.
Don't parse USER_AGENT to switch on browser capabilities, since
the cached HTML will be browser specific. Use features like
<NOFRAMES> instead.
Don't use server-side includes unless your server can send the
Last-Modified HTTP header with them.
Don't use redirects, since their results are uncacheable.
Don't use secure servers to serve images and other non-sensitive
objects, since these will be uncacheable and may not be passed
through a cache hierarchy.
Don't rename files to age them - give them unique names in the
first place and update the links which point to them.
Don't set the objects your server returns to expire immediately, or
at some time in the recent past, unless you want to be held up to
public ridicule!
Don't use content-negotiation until HTTP 1.1 is more widely
deployed, since in HTTP/1.0 it interacts badly with proxy caches.
Avoid specifying port 80 in the URL, e.g. when generating URLs
programatically.
Don't use numeric representation of server address in urls if you
have a choice.
Don't use server modules or scripts to convert document's character
set on the server side. Leave it to the client.
4. Security considerations
Cache-busting is clearly justified in those cases where the use of
caching has, in itself, security and privacy implications.
Proxy servers tend to subvert firewalls and access controls based
on IP addresses and/or domain names.
5. Acknowledgements
Thanks to Duane Wessels, Vinod Valloppilli, George Michaelson, Donald
Neal, Ernst Heiri and Wojtek Sylwestrzak for their contributions and
comments on previous versions of this document.
6. References
[1] A. Luotonen and K. Altis, "World-Wide Web proxies", In WWW94
Conference Proceedings (Elsevier), 1994.
[2] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, T. Berners-Lee,
"Hypertext Transfer Protocol -- HTTP/1.1", RFC 2068 (Proposed
Standard), 01/03/1997.
[3] W. Stevens, "TCP Slow Start, Congestion Avoidance, Fast Retransmit,
and Fast Recovery Algorithms", RFC 2001 (Proposed Standard),
01/24/1997.
[4] D. Mills, "Network Time Protocol (v3)", RFC 1305 (Proposed Standard),
04/09/1992.
[5] J. Seidman, "A Proposed Extension to HTML: Client-Side Image Maps",
RFC 1980 (Informational), 08/14/1996.
--==_Exmh_-18719327050--
--==_Exmh_-1869128596P
Content-Type: application/pgp-signature
-----BEGIN PGP MESSAGE-----
Version: 2.6.3i
iQCVAwUBM5mEydZdpXZXTSjhAQGmYAP+N5E3MEUCAdh2KIe2U6RQ+bWm9P0XLAHw
dPH3koUY7ljrcqdJoi2IwrkN+gj5Kg1jMRYn9dcizvXPfQHeS6EcWubKB5511fKq
yW35Plms0Jn5fTox7iPZDj5TA1CT2ZBGQb5cl8+U9eIl+o/SffaOx0ZgnGaKR2JE
9XxlJV9ArnI=
=60/n
-----END PGP MESSAGE-----
--==_Exmh_-1869128596P--