We said that TxCache can provide whatever the underlying database's isolation level is -- serializability or snapshot isolation -- even though our current implementation builds on a snapshot database. That's true, but it depends somewhat on how concurrency control is done in the database. For one, TxCache requires being able to access recent snapshots in the database, which is not likely a reasonable requirement for a database that isn't using multiversion concurrency already. For another, certain implementations can place restrictions on what data can be cached. - Strict two-phase locking The classic way to implement serializability, but not one that's really useful to us. We need to be able to run queries on past snapshot (to deal with cache misses when running in the past), and by the time you've implemented support for that in the database, you might as well be using one of the concurrency control schemes below. More fundamentally, S2PL is trying to ensure serializability using only a single copy of data, so even a read-only transaction needs to prevent writers from modifying data it's read. In terms of our cache, that translates to requiring cache accesses to acquire read locks on the database, which is entirely infeasible. - S2PL plus r/o snapshot isolation Suppose we used two-phase locking for our read-write transactions, but ran the read-only transactions using snapshot isolation. This combination is possible in MySQL InnoDB and MS SQL Server (assuming they can be coerced into that mode). Note that this is serializable, even for the read-only transactions. The read-only transactions aren't going to cause new anomalies (because they don't write), and they aren't going to see any inconsistent states. S2PL has the property that any read-only transaction run on any snapshot of the database can be serialized. That *isn't* true of all the schemes described below, notably not SSI. - Optimistic concurrency A rather different way to implement serializability is to use OCC. Implementation-wise, this is a pretty small delta over snapshot isolation: we'd just have to have our read/write transactions keep track of their read set and check on commit whether the versions they read are still current. (SI already requires doing essentially this for write sets.) The pessimistic equivalent of this is promoting all read locks to write locks, which is a pretty common approach for people who want to avoid particular anomalies on their snapshot isolation DB. That gets you relatively close to the approach above. It continues to be fine to run read-only transactions, or use cached objects, without using read locks or validating their freshness. Again, we have the property that any read-only transaction run on any snapshot of the database can be serialized. I'd previously figured that this would be the way to support serializability for applications that care and use TxCache, largely because it would be the simplest to implement. But the other schemes described in this note are probably better (lower conflict rate) if they are available. - Serializable Snapshot Isolation (SSI) This is a relatively new idea (SIGMOD '08) about how to provide serializability on a snapshot isolation database. The basic idea is that we track the dependencies between transactions in the serialization graph, e.g. "X wrote an update but Y didn't see it, so X must be serialized before Y", and abort transactions to prevent cycles. (To be precise, we track part of the graph, and conservatively abort transactions that *could* form cycles.) This approach is of interest because it preserves a lot of the advantages of snapshot isolation (e.g. read-only transactions do not generally block writers), builds on the existing implementation, and can provide fewer conflicts than an OCC approach. It is also of interest because I just built an implementation that will appear in the next release of Postgres, with Kevin Grittner of the Wisconsin Court System. (My involvement was motivated in part by a general interest in serializability but mainly because the implementation has a lot of commonalities with some of the invalidation work I've been doing.) What is particularly interesting about this approach is that, unlike S2PL, the commit time order does not always match the serialization order. SSI can allow one transaction to commit after another even if it has to be serialized earlier. Of course, this can only happen if no other transaction observed the fact that it committed later; if one did, something has to be aborted. For this reason, it does not have the property that any read-only transaction run on any snapshot of the database can be serialized, which has some implications for TxCache. The database needs to keep track of which data is read by read-only transactions to prevent conflicts, which sounds pretty alarming for our cache (especially considering they can not only be forced to abort, but also abort other transactions). Fortunately, the reality is less grim. Without going too deep into the details, the problem is that certain reads by a read-only transaction may expose anomalies involving transactions that are currently running. It's always possible to avoid these anomalies simply by forbidding the read, and it's possible to ensure that a read-only transaction never has to be aborted after the fact. For TxCache, this means that certain r/o database queries might fail on some snapshots (which means we have to relax our requirement that r/o transactions never block or abort). But once data is read from the DB, it can be safely stored in the cache and used without any further checks. We have also devised a way to identify "safe" snapshots on which any read-only transaction can run without risk of aborting (this is a new contribution from the Postgres work). If one of these is available, we could modify our timestamp selection algorithm to prefer it.