We said that TxCache can provide whatever the underlying database's
isolation level is -- serializability or snapshot isolation -- even
though our current implementation builds on a snapshot
database.

That's true, but it depends somewhat on how concurrency control is
done in the database. For one, TxCache requires being able to access
recent snapshots in the database, which is not likely a reasonable
requirement for a database that isn't using multiversion concurrency
already. For another, certain implementations can place restrictions
on what data can be cached.

- Strict two-phase locking 

  The classic way to implement serializability, but not one that's
  really useful to us. We need to be able to run queries on past
  snapshot (to deal with cache misses when running in the past), and
  by the time you've implemented support for that in the database, you
  might as well be using one of the concurrency control schemes
  below.

  More fundamentally, S2PL is trying to ensure serializability using
  only a single copy of data, so even a read-only transaction needs to
  prevent writers from modifying data it's read. In terms of our
  cache, that translates to requiring cache accesses to acquire read
  locks on the database, which is entirely infeasible.

- S2PL plus r/o snapshot isolation

  Suppose we used two-phase locking for our read-write transactions,
  but ran the read-only transactions using snapshot isolation. This
  combination is possible in MySQL InnoDB and MS SQL Server (assuming
  they can be coerced into that mode).

  Note that this is serializable, even for the read-only
  transactions. The read-only transactions aren't going to cause new
  anomalies (because they don't write), and they aren't going to see
  any inconsistent states. S2PL has the property that any read-only
  transaction run on any snapshot of the database can be
  serialized. That *isn't* true of all the schemes described below,
  notably not SSI.

- Optimistic concurrency

  A rather different way to implement serializability is to use
  OCC. Implementation-wise, this is a pretty small delta over snapshot
  isolation: we'd just have to have our read/write transactions keep
  track of their read set and check on commit whether the versions
  they read are still current. (SI already requires doing essentially
  this for write sets.)

  The pessimistic equivalent of this is promoting all read locks to
  write locks, which is a pretty common approach for people who want
  to avoid particular anomalies on their snapshot isolation DB. That
  gets you relatively close to the approach above.

  It continues to be fine to run read-only transactions, or use cached
  objects, without using read locks or validating their
  freshness. Again, we have the property that any read-only
  transaction run on any snapshot of the database can be serialized.

  I'd previously figured that this would be the way to support
  serializability for applications that care and use TxCache, largely
  because it would be the simplest to implement. But the other schemes
  described in this note are probably better (lower conflict rate) if
  they are available.

- Serializable Snapshot Isolation (SSI)

  This is a relatively new idea (SIGMOD '08) about how to provide
  serializability on a snapshot isolation database. The basic idea is
  that we track the dependencies between transactions in the
  serialization graph, e.g. "X wrote an update but Y didn't see it, so
  X must be serialized before Y", and abort transactions to prevent
  cycles. (To be precise, we track part of the graph, and
  conservatively abort transactions that *could* form cycles.)

  This approach is of interest because it preserves a lot of the
  advantages of snapshot isolation (e.g. read-only transactions do not
  generally block writers), builds on the existing implementation, and
  can provide fewer conflicts than an OCC approach. It is also of
  interest because I just built an implementation that will appear in
  the next release of Postgres, with Kevin Grittner of the Wisconsin
  Court System. (My involvement was motivated in part by a general
  interest in serializability but mainly because the implementation
  has a lot of commonalities with some of the invalidation work I've
  been doing.)

  What is particularly interesting about this approach is that, unlike
  S2PL, the commit time order does not always match the serialization
  order. SSI can allow one transaction to commit after another even if
  it has to be serialized earlier. Of course, this can only happen if
  no other transaction observed the fact that it committed later; if
  one did, something has to be aborted. 

  For this reason, it does not have the property that any read-only
  transaction run on any snapshot of the database can be serialized,
  which has some implications for TxCache. The database needs to keep
  track of which data is read by read-only transactions to prevent
  conflicts, which sounds pretty alarming for our cache (especially
  considering they can not only be forced to abort, but also abort
  other transactions). Fortunately, the reality is less grim.

  Without going too deep into the details, the problem is that certain
  reads by a read-only transaction may expose anomalies involving
  transactions that are currently running. It's always possible to
  avoid these anomalies simply by forbidding the read, and it's
  possible to ensure that a read-only transaction never has to be
  aborted after the fact. For TxCache, this means that certain r/o
  database queries might fail on some snapshots (which means we have
  to relax our requirement that r/o transactions never block or
  abort). But once data is read from the DB, it can be safely stored
  in the cache and used without any further checks.

  We have also devised a way to identify "safe" snapshots on which any
  read-only transaction can run without risk of aborting (this is a
  new contribution from the Postgres work). If one of these is
  available, we could modify our timestamp selection algorithm to
  prefer it.