HTTP Caching for Personalized Content

Posted 16 Jun 2003 at 14:17 UTC by itamar Share This

This article is intended to explain a simple technique that allows browsers to cache dynamically generated pages, even if the page is specific to a cookie based session or basic HTTP authentication user. That is, allow caching of personalized pages in the face of caching proxies.

Latest version of this article should always be on my site.

HTTP caching allows us to save the browser from having to download new content if the page hasn't changed, and the server from the having to generate the page. Mark Nottingham's excellent guide to HTTP caching is a good summary, if you don't feel like reading the RFC.

A short introduction to the specific kind of caching I'm discussing here is in order. Specific web resource may set a Last-Modified header (or ETag, which is opaque validation data rather than a date). The client then caches the page, and when requesting it sends a If-Modified-Since (or for ETag If-Match) header with the validation info it got. If the server sees it matches, i.e. the page hasn't changed, it sends back a response code for Not Modified with an empty body, thus telling the client to use its cached version of the page. If the client's version is out of date, the server will render the page as usual.

Thus, to make a specific resource cacheable, with validation based on modification time, we do something like this:

lastModified = self.getLastModified()
request.setHeader("last-modified", formatDate(lastModified))
if parseDate(request.getHeader("if-modified-since")) == lastModified:
    request.setResponseCode(NOT_MODIFIED)
    request.finish()
else:
    # render the page as normal...

Unfortunately, caching content that is customized is more difficult. By customized I mean rendering a page that is different for each "user", a user being identified by HTTP basic authentication or by a session cookie. For example, if you're logged in to my.yahoo.com you will see a version of the page that is customized for you and you alone. The problem is that caching HTTP proxies might cache one user's content and display it for another, if the last modified times happen to match.

One solution is the to use the Cache-Control HTTP header with a value of "private", thus telling the proxy not to cache. However, this header is new in HTTP/1.1, so a proxy that doesn't support this will still potentially break.

The solution then is to include the user's session id as part of the validation info for the content. The server can then check both that the content hasn't changed for a specific user, and that the caching validation is being done on behalf of the user for whom the page was originally generated. We will do this using the ETag header, since it is designed to contain opaque data. Since this is an ETag, we can use any data about the page content for validation of changes, but for simplicity's sake this example also uses modification time:

lastModified = self.getLastModified()
etag = str(lastModified) + "," + str(request.getSessionId())
request.setHeader("cache-control", "private")
request.setHeader("etag", etag)
if request.getHeader("if-match") == etag:
    request.setResponseCode(NOT_MODIFIED)
    request.finish()
else:
    # render the page as normal...

If the user's session id is different from the one encoded in the etag, it will be re-rendered, otherwise the browser will get a not modified response and load from the cache. I have verified with Mozilla 1.3.1 on Debian GNU/Linux that this technique does indeed work, and I'm pretty certain that it will work with any HTTP caching proxy, even those that don't support the Cache-Control header.


Very limited caching of personalized content, posted 17 Jun 2003 at 01:41 UTC by Mysidia » (Journeyer)

Unfortunately etag and entire-document caching are likely of very limited utility as far as dynamic content is concerned. Documents built personalized for a user cannot in general be usefully and safely cached quite this easily, because content modification times are harder often harder to identify than as characterized. Changes in personalized content more often than not depend on external factors like changes to the contents of an outside database.

The etag header is nice and all; however, I tend to think that it is useful mostly just for documents built from static components, but not for more complex things, unless 'last modified time' records are carefully taken in places they probably wouldn't ordinarily be available. In a way it also seems undesirable: the web server and cache should have to worry about caching and save each individual script from having to deal with that problem repeatedly.

Perhaps the best way to deal with personalized content, is for web servers and clients to become a lot more like rsync.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page