1. Why invalidations? Our system currently avoids the need for invalidations by making the assumption that all still-valid results become invalid immediately after they are received from the database. Such results are still useful, because running transactions in the past allows them to be used until they become too stale for the freshness limit. However, this limits the scalability of the system. Note the distinction between requiring the programmer to provide explicit invalidations -- which is what memcached does and we'd like to avoid for usability reasons -- and using invalidations at all. We'd like to have invalidations automatically generated by the database as things change. The need for invalidations can be seen from some of our benchmark results: - cache effectiveness is ultimately limited by the freshness requirement rather than the cache size (after about 64MB) - we see cache hit rates in the 25-75% range depending on freshness requirement (don't have the actual numbers handy). Users of memcached strive for hit rates well into the 90s; 99% is not uncommon. - we see little to no measurable performance benefit on workloads where the database does not fit in memory. The first observation reflects the fact that cached data becomes useless after the freshness requirement (e.g. 30 seconds) elapses. Thus, only so much useful data can be stored in the cache: the data generated by the application in the last 30 seconds is not that large. Thus, for read-only workloads with large data sets, it's not surprising that memcached does better. With invalidations, a cache entry is valid for however long it's up to date (plus the freshness requirement, in our system). This will often be much longer. Competition between our cache and the database's buffer cache explains why we don't see a benefit for disk-bound workloads. If we assume that the workload is I/O-bound, then only reducing the number of I/Os will improve performance. Our cache only stores data for ~30 seconds, so it's likely that a database with a reasonably-sized buffer cache will have the tuples in cache. We do save some computation costs -- especially for application-level objects derived from multiple queries -- but if database I/O is the bottleneck that doesn't help much. 2. Implementing invalidations: the cache server side Regardless of whether our implementations are happening explicitly or being generated automatically, the server side will look roughly the same. Recall our existing cache interface: - put(key, value, validity interval) stores a mapping in the cache with a given validity interval (derived from the database's results) - get(key, timestamp) -> value returns the value for the key in the cache, valid at the given timestamp The latter is actually get(key, interval) -> value, interval which returns any value valid during the given interval; lazy timestamp selection requires this, but it's OK to think about the simpler version above that just takes a timestamp. We'll extend this slightly. When inserting into the cache, the inserter needs to specify whether the item is still valid (as far as it knows), and a set of *invalidation tags*: put(key, value, validity interval, stillValid, invalTags) There's also a new invalidation operation: invalidate(timestamp, invalTags) This provides the latest timestamp and the set of invalidation tags such that any entry that is no longer valid has at least one of these tags. Invalidate operations need to arrive in timestamp-order. in practice this means that they need to come from/through the database and be multicast to all nodes; more on this later. For queries, the cache treats any item flagged as still-valid as having its upper validity bound equal to the timestamp of the last invalidation received. When it receives an invalidation request for a particular timestamp and set of invalidation tags, it sets the upper bound of any entry matching one of the invalidation tags, clearing its still-valid flag, and also extends the effective bound of any still-valid transaction to that timestamp. For a more visual explanation, see Figure 2 of our paper, but imagine that some of the intervals extend out to the dotted 'now' line. These represent items that are still valid. When an invalidation is received, the dotted line moves right, and the upper bounds of any items not invalidated move with it. (I realize that describing a figure through email does not exactly qualify as a "more visual explanation"!) Invalidation tags can be hierarchical in order to support multiple granularities. This probably won't make much sense until we discuss where these tags come from, and some examples of their use. But the rule is that invalidating a tag invalidates any entry tagged with that tag, any of its ancestors, or any of its descendants. 3. Explicit invalidations Suppose we want to implement explicit invalidations in the style of memcached. (This isn't great, in that our usability argument centers on no requiring them, but it's no worse than existing systems.) We have a bit of a naming challenge. With memcached, you'd say lookup(key), so it's natural to say invalidate(key) with the same key. But we are using a memoizing interface and having our library generate the keys itself, so what do we put in as the argument to invalidate(...)? A solution is to associate an invalidation tag with the value produced by a function call, say function:arguments, e.g. 'getItem:id1234'. Then, we can pass the corresponding argument to an invalidation call. Not too dissimilar from how you'd do it in memcached. We can take advantage of the fact that invalidation tags are hierarchical here. If we invalidate 'getItem:id1234', that'll invalidate one specific set of arguments, but we can also just invalidate 'getItem', which would invalidate the call with *any* arguments -- useful for global changes to the system state, or ones too complicated to figure out the dependencies for. It's also possible to tag a cached value with 'getItem', which means it'll be invalidated by any invalidation starting with 'getItem', though it's not especially clear to me how this would be useful in this context. How do we send the invalidations to the cache nodes? I think the only workable way is to send it through the database. This seems strange, because the database isn't involved in generating the invalidations, but it's important to get their ordering right. The invalidation needs to be delivered with the correct timestamp, which is determined by the database at commit time, and invalidations also need to be delivered in-order (or at least can't be processed out of order, so later ones would have to be delayed). Given that not every commit would trigger an invalidation, the only other way I've found to do things correctly is a sort of two-phase commit where database commits are blocked until the client confirms that it's sent the invalidations to the cache nodes. This is obviously a horrible idea. One final note is that, although this approach shares with memcached the need for the programmer to identify places to insert invalidations, we have an important benefit. We can always fall back on the current no-invalidation approach (assuming the data is invalid immediately) for certain cached values. This might be useful if it's too hard for the programmer to figure out all the places it needs to be invalidated, or if we know it changes so frequently that invalidating isn't going to give much benefit. 4. Semi-automatic invalidations I expect to have the cache server-side part of invalidations implemented shortly. I also have a plan for automatically generating fine-grained invalidations based on revocable read locks (I will write another note about this soon), but it requires significant work on the database side. Though we've made exciting progress on it, I'm skeptical about whether we'll have something usable by OSDI. So, is there a middle ground? Something that would preserve the performance benefits of invalidations while being easier for a programmer to use than explicit invalidations, and practical to implement in the next two weeks? One obvious possibility would be to do invalidations at a table level. This is pretty easily doable. We need to know what tables a query looks at, which is not hard to get (we could even get away with asking the programmer to specify this, I think, not that we couldn't figure out ourselves). And we need to know what tables an update transaction modifies, which is also pretty obvious. Table-level invalidations aren't sufficiently fine-grained to satisfy me as a real solution, but it's worth noting that we would get some benefit from it. The RUBiS benchmark has some tables that *never* change, some of them quite large, and these would effectively be cacheable forever. A refinement on this would be to use two levels of granularity: tables and primary-keys. This is a hackish solution based largely on the observation that a lot of applications, including our benchmark, spend a lot of time looking up objects by their key. If we can identify such queries (possibly by programmer annotation), we can tag them with a table:key invalidation tag. Then, when inserting or modifying a single row, we can simply generate an invalidation for the associated key. For more complex queries, or more complex modifications that touch many rows, we can fall back on table-level invalidations. I have some ideas about how to throw this together quickly, and I think it's a viable option for performance/usability. In my next note, I'll describe the more elegant scheme I came up with for automatically generating fine-grained invalidations. I'll also describe some related work I'm doing on making Postgres provide serializability, and we'll take a brief foray into the criminal justice system of Wisconsin (for reasons that will become clear later).