\svnInfo $Id$  

\txcache is designed for systems consisting of one or more application
servers that interact with a database server. These application
servers could be web servers running embedded scripts (\eg{with
  \texttt{mod\_php}}), or dedicated application servers, as with Sun's
Enterprise Java Beans. The database server is a standard relational
database; for simplicity, we assume the application uses a single
database to store all of its persistent state.

\begin{figure}[tp]
  \centering
  \includegraphics{arch3.pdf}
  \caption{Key components in a \txcache deployment. The system
    consists of a single database, a set of cache nodes, and a set of
    application servers. \txcache also introduces a application
    library, which handles all interactions with the cache server.}
  \label{fig:architecture}
\end{figure}

\txcache introduces two new components, as shown in
Figure~\ref{fig:architecture}: a cache and an application-side cache
library, as well as some modifications to the database server. The
cache is partitioned across a set of cache nodes, which may run on
dedicated hardware or share it with other servers. The application
never interacts with the cache servers; the \txcache library
transparently translates an application's cacheable functions into
cache accesses.

\subsection{Programming Model}
\label{sec:model:programming}

Our goal is to make it effortless to incorporate caching into a new or
existing application. Towards this end, \txcache provides an
application library with a simple programming model, shown in
Figure~\ref{fig:library-api}, based on \emph{cacheable
  functions}. Applications developers can cache computations simply by
designating functions to be cached.
% XXX Library also provides consistency.

\begin{figure}[t]
  \newcommand{\cmdarg}[1]{\textit{#1}}
  \addtolength{\leftmargini}{1.5em}
  \begin{itemize}
  \itemindent=-1.5em   
  \item \command{begin-ro}(\cmdarg{staleness}) : Begin a read-only
    transaction. The transaction sees a consistent snapshot from
    within the past \cmdarg{staleness} seconds.
  \item \command{begin-rw}() : Begin a read/write transaction.
  \item \command{commit}() / \command{abort}() : End a transaction
  \end{itemize}
  \vspace{0.2em}

  \begin{itemize}
  \itemindent=-1.5em   
  \item \command{wrap}(\cmdarg{fn}) $\to$ \cmdarg{cached-fn} :
    Makes a function cacheable. \cmdarg{cached-fn} a new function that first
    checks the cache for the result of another call with the same
    arguments. If not found, executes \cmdarg{fn} and stores
    its result in the cache.
  \end{itemize}
  \vspace{0.2em}
  
  \begin{itemize}
  \itemindent=-1.5em
  \item \command{query}(\cmdarg{SQL}) $\to$ \cmdarg{result} :
    Sends a SQL query to the database server.
  \end{itemize}

  \caption{TxCache library API}
  \label{fig:library-api}
\end{figure}

Programs group their operations into transactions. \txcache requires
applications to specify whether their transactions are read-only or
read/write by using either the \command{begin-ro} or
\command{begin-rw} function. Transactions are ended by calling
\command{commit} or \command{abort}. Within a transaction block,
\txcache ensures that, regardless of whether the application gets it's
data from the database or the cache, it sees a view consistent with
the state of the database at a single point in time.

% through the
% \txcache library's \command{begin} and \command{commit}
% functions. When starting a transaction, the program declares whether
% it will be a read-only or read/write transaction. If read-only, the
% application can also specify any requirements for how fresh the data
% must be; Section~\ref{sec:stale:anomalies} discusses how applications
% can use these freshness requirements.  We focus on optimizing
% read-only transactions, as these are most common in most
% workloads.\edatnote{DRKP}{Citation for this?}  Read/write transactions
% do not take advantage of caching; \txcache's library forwards them
% directly to the database, so they execute exactly as they would on an
% unmodified system.

Within a transaction, operations can be grouped into \emph{cacheable
  functions}. These are actual functions in the program's code,
annotated to indicate that their results can be be cached.  A
cacheable function can consist of database queries and computation,
and can also make calls to other cacheable functions. To
be suitable for caching, these functions must be deterministic, not
have side effects, and depend only on their arguments and the database
state. For example, it would not make sense to cache a function that
returns the current time. % We believe it is reasonable
%for programmers to identify such cacheable functions.

Cacheable functions are essentially memoized. \txcache's library
provides a \command{wrap} function that takes an implementation of a
cacheable function and returns a wrapper function. When called, this
wrapper function first checks if the cache contains the result of a
previous call to the function with the same arguments that is
consistent with the current transaction's snapshot. If so, it returns
it. Otherwise, it invokes the implementation function and stores the
returned value in the cache.

A cacheable function interface is easier to use than the
\command{get}/\command{put} interface provided by existing caches like
\memcached. It does not require programmers to manually assign keys to
cached values and keep them up to date. We found that several
\memcached-related MediaWiki bugs stemmed from choosing insufficiently
descriptive keys, causing two different objects to overwrite each
other. This problem is not possible with \txcache.


\txcache's programming model has another significant benefit: it does
not require applications to explicitly invalidate cached results when
modifying the database, unlike other application data caches such as
\memcached. Adding explicit invalidations requires global reasoning
about the entire application, hindering modularity: adding caching for
an object requires knowing every place it could possibly change. For
example, consider placing a new bid on an item in an eBay-like auction
site. Clearly, any cached copies of the item's page must be
invalidated, because the price has changed. Some other objects that
must be invalidated are less obvious: the item's price also appears on
various search result pages, and the home pages of users who bid on
it. Finding all of these cached objects is not straightforward,
especially in applications so complex that no single developer is
aware of the whole of the application.

% \txcache instead tracks each application data object's dependencies on
% particular database state, and automatically generates invalidations
% when it is updated. To achieve this, the \txcache library interposes
% on an application's database queries, as noted in
% Figure~\ref{fig:library-api}. However, it does so only to receive
% metadata from the database server, as noted in
% Section~\ref{sec:library}. It does not attempt to parse or rewrite the
% SQL queries or results themselves.

\subsection{Consistency Model}
\label{sec:model:consistency}

\txcache provides \emph{transactional consistency}: all requests
within a transaction see a consistent view of the system as of a
specific timestamp. That is, requests see only the effects of other
transactions that committed prior to that timestamp. It is not
practical to simultaneously guarantee both transactional consistency
and \emph{freshness}, where the cache reflects the latest state of the
system. To do so would require a concurrency control scheme like
two-phase locking~\cite{gray77:_granul_of_locks_and_degrees} or
optimistic concurrency
control~\cite{kung81:_optim_method_for_concur_contr} Then, read-only
transactions, even ones that access only the cache, could be blocked
or force other transactions to block or retry, negating the benefits
of the cache.

In order to provide consistency without requiring transactions to
block, \txcache relaxes its freshness guarantee for read-only
transactions. It achieves this using a multiversion concurrency
control approach similar to snapshot isolation
databases~\cite{berenson95:_critiq_of_ansi_sql_isolat_level}, where
read-only transactions are assigned a timestamp, and then are only
allowed to read values that are current as of that timestamp.

However, \txcache goes further by allowing applications to specify an
explicit \emph{staleness limit} to \command{begin-ro}, indicating that
they are willing to accept data up to a certain age. This is motivated
by the observation that many applications can tolerate a certain
amount of staleness, and using slightly stale cached data can improve
the cache's hit rate. \drkp{citations.}  Applications can specify
their staleness limit on a per-transaction basis, allowing them, for
example, to avoid anomalies by ensuring that a user always sees his
latest update. Regardless of the age of the snapshot, each
transaction always sees a consistent view.

Read/write transactions bypass the cache entirely, so \txcache does
not introduce new anomalies. The application can expect the same
guarantees (and anomalies) of the underlying database. For example, if
the underlying database uses snapshot isolation, the system will still
have the same anomalies as snapshot isolation, but \txcache will never
introduce snapshot isolation anomalies into a system that does not use
snapshot isolation.


%%% Local Variables: 
%%% mode: latex
%%% TeX-PDF-mode: t
%%% TeX-master: "paper.tex"
%%% End: 

% LocalWords:  versioned versioning php timestamp Cacheable cacheable memoized
% LocalWords:  invalidations