Advogato: HTTP Caching for Personalized Content

Latest version of this article should always be on my site.

HTTP caching allows us to save the browser from having to download new content if the page hasn't changed, and the server from the having to generate the page. Mark Nottingham's excellent guide to HTTP caching is a good summary, if you don't feel like reading the RFC.

A short introduction to the specific kind of caching I'm discussing here is in order. Specific web resource may set a Last-Modified header (or ETag, which is opaque validation data rather than a date). The client then caches the page, and when requesting it sends a If-Modified-Since (or for ETag If-Match) header with the validation info it got. If the server sees it matches, i.e. the page hasn't changed, it sends back a response code for Not Modified with an empty body, thus telling the client to use its cached version of the page. If the client's version is out of date, the server will render the page as usual.

Thus, to make a specific resource cacheable, with validation based on modification time, we do something like this:

lastModified = self.getLastModified()
request.setHeader("last-modified", formatDate(lastModified))
if parseDate(request.getHeader("if-modified-since")) == lastModified:
    request.setResponseCode(NOT_MODIFIED)
    request.finish()
else:
    # render the page as normal...

Unfortunately, caching content that is customized is more difficult. By customized I mean rendering a page that is different for each "user", a user being identified by HTTP basic authentication or by a session cookie. For example, if you're logged in to my.yahoo.com you will see a version of the page that is customized for you and you alone. The problem is that caching HTTP proxies might cache one user's content and display it for another, if the last modified times happen to match.

One solution is the to use the Cache-Control HTTP header with a value of "private", thus telling the proxy not to cache. However, this header is new in HTTP/1.1, so a proxy that doesn't support this will still potentially break.

The solution then is to include the user's session id as part of the validation info for the content. The server can then check both that the content hasn't changed for a specific user, and that the caching validation is being done on behalf of the user for whom the page was originally generated. We will do this using the ETag header, since it is designed to contain opaque data. Since this is an ETag, we can use any data about the page content for validation of changes, but for simplicity's sake this example also uses modification time:

lastModified = self.getLastModified()
etag = str(lastModified) + "," + str(request.getSessionId())
request.setHeader("cache-control", "private")
request.setHeader("etag", etag)
if request.getHeader("if-match") == etag:
    request.setResponseCode(NOT_MODIFIED)
    request.finish()
else:
    # render the page as normal...

If the user's session id is different from the one encoded in the etag, it will be re-rendered, otherwise the browser will get a not modified response and load from the cache. I have verified with Mozilla 1.3.1 on Debian GNU/Linux that this technique does indeed work, and I'm pretty certain that it will work with any HTTP caching proxy, even those that don't support the Cache-Control header.

HTTP Caching for Personalized Content

Posted 16 Jun 2003 at 14:17 UTC by itamar

Very limited caching of personalized content, posted 17 Jun 2003 at 01:41 UTC by Mysidia » (Journeyer)