* Transactional Caching of Application Data using Recent Snapshots ** Caching of Application Data *** Basic motivation: make database-driven websites faster *** Solution: use a cache like memcached *** What is an application data cache? **** Caches application-level objects ***** Not: database cache, database replica, query cache *** Interface **** In memcached etc.'s world: provide a hash table for the app **** In our world: caches function calls (memoization) **** Some examples: queries, objects, generated pages, parts of pages *** Why cache application data? **** General theme: cache higher-level stuff closer to what the app really wants **** Reduce DB load ***** DB load is the hardest bottleneck to remove ***** One cache hit can eliminate >1 database queries **** Reduce application server load by caching computations ***** This matters too! [Rubis] ***** Easier to remove bottleneck but still expensive [Facebook] ** Transactional Caching *** Define **** Within a transaction block, accesses to both cache & DB see same view of DB **** Not possible in memcached (others?) *** Why transactions? **** Avoid exposing anomalies to the user (=> correctness) **** Avoid violating internal invariants (=> simpler code) ** Recent Snapshots *** Can't have all 3: freshness, transactional consistency, non-blocking **** Concurrent update between two reads in a xaction **** And a blocking cache is a non-starter, so choose freshness or consistency **** Most caches try for freshness **** But we want consistency *** Embracing staleness **** Transactional consistency requires us to have some staleness ***** e.g. snapshot isolation: snapshot current at transaction start **** We go a bit further: r/o transactions can run on *previous* snapshots ***** Improves cache hit rate ***** Avoids blocking ***** Avoids invalidations *** Is this safe? **** Stale data is already everywhere ***** No guarantee results are current w/ concurrent updates ***** Especially if there's caching ***** Just need to make sure things are causally consistent **** Avoiding anomalies ***** Ensure that users see their own data ***** More generally, can ensure causal consistency w/ Lamport clocks ***** Apps can specify fixed time bound on staleness ***** R/W transacitons bypass cache * Building a TxCache ** Architecture *** Basic picture [from SOSP] *** Validity intervals *** Data flow ** Cache *** it's a DHT **** but versioned **** and without complexity *** interface: **** put: key, value @ interval **** get: key, (ts or interval) -> value, interval, miss ** Database *** Need two main things: **** Computing validity intervals for query results **** Access to previous snapshots ***** Because of cache misses **** Reuse existing mechanisms: < 1 KLOC *** Background: concurrency control in Postgres **** Multiversion: snapshot isolation **** No read locks, write locks as usual **** Every tuple has birth, death timestamps **** Queries are assigned timestamps, see only "live" tuples **** Stale tuples are periodically "vacuumed" away once truly dead. *** Exposing multiversion concurrency **** Recall: need to run on old DB snapshot if cache miss **** Snapshot isolation supports this internally (but no interface) **** Save a transactions snapshot and restore it for a new transaction ***** PIN, UNPIN, BEGIN **** No performance impact: just keep old versions around slightly longer *** Tracking pins -- the pincushion **** Need to keep track of which snapshots are pinned **** Separate from database just for simplicity/overhead **** Interface ***** Insert new pin ***** Look up pins fresher than n seconds **** Automatic expiration *** Generating validity intervals **** Goal: SELECT... -> VALID FROM 5 TO 8 [example throughout] **** Tuple validity ***** Each tuple has birth/death timestamps -- looks like a validity interval ***** Intersect the returned tuples ****** Might need to combine when joining/aggregating/etc ***** Is this enough? **** Invalidity mask ***** Tracks tuples *not* returned: "phantoms" ***** Add lifetime of any tuple discarded due to snapshot invisibility ***** Conservative measure: might have found later they didn't match anyway ** TxCache library *** Overview **** Language-independent part ***** Accessing & updating cache; assigning timestamps; ensuring consistency **** Language-dependent bindings (PHP) ***** Wrapping cacheable functions, assigning names, marshalling values, etc **** All cache accesses made by lib **** All DB access go through lib (but just observing) *** Basic flow (example here?) **** beginRO(staleness) -> library assigns timestamp (magically for now) **** app calls cacheable function -> check cache ***** if hit: return directly to app ***** if miss: invoke app's function implementation ****** start database transaction on correct snapshot, or pin new one ****** run queries on DB as needed, gathering validity intervals ****** marshal result and insert into cache w/ intersection of intervals *** Lazy timestamp selection **** How can we choose a timestamp upfront? We don't know what's in the cache! ***** Choose lazily based on cache availability **** Two things to track: ***** Pin set: set of pinned snapshot timestamps at which transaction can be serialized ***** Validity interval: set of times at which cacheable function result is valid **** Algorithm: ***** At xact begin: ****** Pin set = all sufficiently fresh snapshots [from pincushion] + * ***** At cacheable function call: ****** Check cache for any results whose validity intersects pinset ******* If hit: intersect pin set with result validity ******* If miss: ******** Start DB transaction at some snapshot in pin set ********* If *, reify: pin new snapshot and replace * with its ts ******** Set validity = [-inf, inf] ******** On each query, intersect pin set & fn validity w/ query validity * Evaluation ** RUBiS *** Auction site [screenshot?] w/ client emulator *** Porting experience **** Straightforward (except rubis sucks) **** Cached nearly-complete HTML **** Cached common function calls (authenticate, getItem, etc) ** Configuration *** 1 DB, 5 Apaches, 2 caches, 4 emulators *** ~ 1 GB DB *** obligatory system stats ** Basic evaluation *** Classic congestion collapse (lock/scheduling overhead on DB) *** Caching (30 sec freshness) improves peak by reducing DB load ** Statistics *** Latencies: cache hit/miss, DB query, etc *** Hit rate *** Cache server utilization low ** Varying freshness *** Peak throughput & hit rate vs freshness req *** Determines amount of cacheable data ** Varying cache size *** Only need about 64 MB * Related work ** App-level caches *** memcached *** web caches (many) *** No transactional semantics ** DB replication *** Many examples w/ different consistency levels *** Most closely related: Ganymed/FAS/RC-serializability **** Same idea of using stale data **** But very different -- full database replica vs cache that can miss ***** can't use for app-level replication ** Relaxing freshness *** MVCC: snapshot isolation **** anomalies for r/w transactions *** Using older snapshots *** RC-serializability *** Causal ordering: Lamport clocks, ISIS, etc * Future work ** Invalidations [maybe?] * Conclusions ** Should have some of these