\svnInfo $Id$  

\txcache stores cached data in RAM on a number of cache servers. The
cache presents a hash table interface: it maps keys to associated
values. Applications do not interact with the cache directly; the
\txcache library translates the name and arguments of a function call
into a hash key, and checks and updates the cache itself.


Data is partitioned among cache nodes using a consistent hashing
approach~\cite{karger97:_consis_hashin_and_random_trees}, as in
peer-to-peer distributed hash
tables~\cite{stoica03:_chord,rowstron01:_pastr}. Unlike DHTs, we
assume that the system is small enough that every application node can
maintain a complete list of cache servers, allowing it to immediately
map a key to the responsible node. This list could be maintained by
hand in small systems, or using a group membership
service~\cite{cowling09:_census} in larger or more dynamic
environments.

% The interface and partitioning of our cache are very similar to
% peer-to-peer distributed hash tables, such as
% Chord~\cite{stoica03:_chord}, Pastry~\cite{rowstron01:_pastr}, and
% countless others. However, our system lacks many of their more
% demanding requirements, and so avoids most of their complexity. In
% particular, multi-hop routing is unnecessary at the scale of perhaps
% hundreds of nodes in a data center. Data replication is also
% unnecessary; if a cache node fails, attempts to access it can simply
% be treated as cache misses until the node is repaired or removed from
% the membership list.

\subsection{Versioning}
\label{sec:cache:versioning}

Unlike a simple hash table, our cache is \emph{versioned}. In addition
to its key, each entry in the cache is tagged with its \emph{validity
  interval}, as shown in Figure~\ref{fig:cache-intervals}. This
interval is the range of time at which the cached value was
current. Its lower bound is the commit time of the transaction that
caused it to become valid, and its upper bound is the commit time of
the first subsequent transaction to change the result, making the
cache entry invalid. Multiple cache entries with the same key can be
stored in the cache, as long as they have disjoint validity intervals.
Whenever the \txcache library puts the result of a cacheable function
call into the cache, it includes the validity interval of that
result (derived using information obtained from the database).

\begin{figure}[tp]
  \centering
  \includegraphics[scale=0.38]{cache-intervals.pdf}
  \caption{An example of versioned data in the cache at one point in
    time. Each rectangle is a version of a data item. For example, the
    data for key 1 became valid with commit 51 and invalid with commit
    53, and the data for key 2 became valid with commit 46 and is
    still valid.}
  \label{fig:cache-intervals}
\end{figure}

To look up a result in the cache, the \txcache library sends both the
key it is interested in and a timestamp or range of acceptable
timestamps. The cache server returns a value consistent with the
library's request, \ie one whose validity interval intersects the
given range of acceptable timestamps, if any exists. The server
also returns the value's associated validity interval. If multiple
such values exist, the cache server returns the most recent one.
% The reason the library might want to specify a range of
% values is that, rather than assign a transaction's timestamp
% immediately when it begins, it can choose the timestamp lazily based on
% what data is in the cache; Section~\ref{sec:library} explains this
% process in detail.

When a cache node runs out of memory, it evicts old cached values to
free up space for new ones. Cache entries are never pinned and can
always be discarded; if an evicted value is later needed, it can be
recomputed using the normal cache miss procedure. A cache eviction
policy can take into account both the time since an entry was last
looked up, and its staleness. Our cache server uses a
least-recently-used replacement policy, with preference for retaining
items that are still valid.

\subsection{Invalidation Tags and Streams}
\label{sec:cache:invalidations}

When an object is inserted into the cache, it can be flagged as
\emph{still-valid} if it reflects the latest state of the database,
like Key 2 in Figure~\ref{fig:cache-intervals}. For such objects, the
database provides \emph{invalidation} notifications when they change.

Every still-valid object has an
associated set of \emph{invalidation tags} that describe which parts
of the database it depends on. Each invalidation tag has two parts: a
table name and an optional index key description. The database
identifies the invalidation tags for a query based on the access
methods used to access the database. A query that uses an index
equality lookup receives a two-part tag, \eg{a search for users with
  name Alice would receive tag \invtag{users:name=alice}}. A query
that performs a sequential scan or index range scan has a wildcard for
the second part of the tag, \eg{\invtag{users:$\star$}}. Queries that
access multiple tables or multiple keys in a table receive multiple
tags. The object's final tag set will have one or more tags for each
query that the object depends on.

\drkp{Maybe mention that invalidation tags are powerful even w/o auto invals}

The database distributes invalidations to the cache as an
\emph{invalidation stream}. This is an ordered sequence of messages,
one for each transaction that modified the database, containing the
transaction's timestamp and all invalidation tags affected by it. Each
message is delivered to all cache nodes by a reliable
application-level multicast mechanism~\cite{cowling09:_census}, or by
link-level broadcast if possible. The cache servers process the
messages in order, truncating the validity interval for any affected
object at the transaction's timestamp.

For cache lookup purposes, items that are still valid are treated as
though they have an upper validity bound equal to the timestamp of the
last invalidation received. This ensures that there is no race
condition between an item being changed on the database and
invalidated in the cache, and that multiple items modified by the same
transaction are invalidated atomically.


%%% Local Variables: 
%%% mode: latex
%%% TeX-command-default: "Make"
%%% TeX-PDF-mode: t
%%% TeX-master: "paper.tex"
%%% End: 

% LocalWords:  cacheable multi versioned invalidations timestamp
