This semester at Berkeley, I'm taking CS286, the graduate-level data management course. In today's class, we discussed a paper that I thought might be of particular interest to Postgres hackers: "Serializable Isolation for Snapshot Databases", by Cahill, Rohm and Fekete from SIGMOD 2008.
The paper addresses a well-known problem with snapshot isolation (SI), which is the isolation level that Postgres actually provides when you ask for "SERIALIZABLE" isolation. SI basically means that a transaction sees a version of database state that corresponds to the effects of all the transactions that were committed before it began; it also sees the effects of its own updates. This is not equivalent to true serializability, however: that is, the database system can provide snapshot isolation, and yet still allow a concurrent transaction schedule that is not equivalent to some serial (one-at-a-time) transaction schedule.
To see why this is true, consider two concurrent transactions that both examine the state of the database, and then perform a write operation that reflects the values that they just read. The paper provides a simple example: suppose we have a database that describes the doctors in a hospital. We have a program that wants to move doctors from "on-call" to "off-duty", as long as there is at least one other doctor that is on-call. It's easy to see that if there are two doctors and we run two instances of the program concurrently, under SI rules we could end up with zero doctors on duty. This violates serializability: there's no serial schedule of these two transactions that could yield this erroneous database state.
The paper proposes a relatively simple modification to snapshot isolation that avoids this situations, by detecting a superset of the dangerous situations and aborting one of the transactions involved. I'll leave the details of their technique and the underlying theory to the paper, but it's very readable.
So, should we implement their technique in Postgres? It's an interesting idea, but the implementation cost would be very non-trivial. Despite the paper's claim that it imposes relatively little overhead on a traditional SI implementation, it would require basically tracking the set of rows each transaction has read, and keeping that information around for a bounded time period after the transaction has committed. I think that the performance costs of doing that naively would be too expensive for this to be feasible. Perhaps a cheaper implementation is possible (e.g. by tracking page-level reads rather than record-level reads)?
As an aside, it is somewhat bogus for PostgreSQL to provide snapshot isolation when the user asks for serializability; it is also a violation of the SQL standard, I believe. That said, Oracle does the same thing, so at least we're not alone, and it's hard to see a practical improvement. The relevant section of the docs could certainly make this point clearer, however.