\svnInfo $Id$  

\txcache is designed for systems consisting of one or more application
servers that interact with a database server. These application
servers could be web servers running embedded scripts (\eg{with
  \texttt{mod\_php}}), or dedicated application servers, as with Sun's
Enterprise Java Beans. The database server is a standard relational
database; for simplicity, we assume the application uses a single
database to store all of its persistent state.

\begin{figure}[tp]
  \centering
  \includegraphics{arch3.pdf}
  \caption{Key components in a \txcache deployment. The system
    consists of a single database, a set of cache nodes, and a set of
    application servers. \txcache also introduces an application
    library, which handles all interactions with the cache server.}
  \label{fig:architecture}
\end{figure}

\txcache introduces two new components, as shown in
Figure~\ref{fig:architecture}: a cache and an application-side cache
library, as well as some minor modifications to the database
server. The cache is partitioned across a set of cache nodes, which
may run on dedicated hardware or share it with other servers. The
application never interacts with the cache servers; the \txcache
library transparently translates an application's cacheable functions
into cache accesses.

\subsection{Programming Model}
\label{sec:model:programming}

Our goal is to make it easy to incorporate caching into a new or
existing application. Towards this end, \txcache provides an
application library with a simple programming model, shown in
Figure~\ref{fig:library-api}, based on \emph{cacheable
  functions}. Applications developers can cache computations simply by
designating functions to be cached.
% XXX Library also provides consistency.

\begin{figure}[t]
  \newcommand{\cmdarg}[1]{\textit{#1}} \addtolength{\leftmargini}{1.5em} 
\begin{itemize} 
\itemindent=-1.5em 
\item \command{begin-ro}(\cmdarg{staleness}) :
  Begin a read-only transaction. The
  transaction sees a consistent snapshot from within the
  past \cmdarg{staleness} seconds.  
\item \command{begin-rw}() : Begin
  a read/write transaction.  
\item \command{commit}() $\to$ \cmdarg{timestamp} : Commit a transaction and
  return the timestamp at which it ran
\item \command{abort}() : Abort a transaction 
\end{itemize} \vspace{0.2em}

  \begin{itemize}
  \itemindent=-1.5em   
  \item \command{make-cacheable}(\cmdarg{fn}) $\to$ \cmdarg{cached-fn} :
    Makes a function cacheable. \cmdarg{cached-fn} is a new function
    that first
    checks the cache for the result of another call with the same
    arguments. If not found, it executes \cmdarg{fn} and stores
    its result in the cache.
  \end{itemize}
  \vspace{0.2em}
  
\if 0
  \begin{itemize}
  \itemindent=-1.5em
  \item \command{query}(\cmdarg{SQL}) $\to$ \cmdarg{result} :
    Sends a SQL query to the database server.
  \end{itemize}
\fi

  \caption{TxCache library API}
  \label{fig:library-api}
\end{figure}

Programs group their operations into transactions. \txcache requires
applications to specify whether their transactions are read-only or
read/write by using either the \command{begin-ro} or
\command{begin-rw} function. Transactions are ended by calling
\command{commit} or \command{abort}. Within a transaction block,
\txcache ensures that, regardless of whether the application gets its
data from the database or the cache, it sees a view consistent with
the state of the database at a single point in time.

% through the
% \txcache library's \command{begin} and \command{commit}
% functions. When starting a transaction, the program declares whether
% it will be a read-only or read/write transaction. If read-only, the
% application can also specify any requirements for how fresh the data
% must be; Section~\ref{sec:stale:anomalies} discusses how applications
% can use these freshness requirements.  We focus on optimizing
% read-only transactions, as these are most common in most
% workloads.\edatnote{DRKP}{Citation for this?}  Read/write transactions
% do not take advantage of caching; \txcache's library forwards them
% directly to the database, so they execute exactly as they would on an
% unmodified system.

Within a transaction, operations can be grouped into \emph{cacheable
  functions}. These are actual functions in the program's code,
annotated to indicate that their results can be cached.  A cacheable
function can consist of database queries and computation, and can also
make calls to other cacheable functions. To be suitable for caching,
functions must be pure, \ie{} they must be deterministic, not have
side effects, and depend only on their arguments and the database
state. For example, it would not make sense to cache a function that
returns the current time. \txcache currently relies upon programmers
to ensure that they only cache suitable functions, but this
requirement could also be enforced using static or dynamic
analysis~\cite{guo10:_towar_pract_increm_recom_scien, salcianu05:_purit_side_effec_analy_java_progr}.
%We believe it is reasonable
%for programmers to identify such cacheable functions.

Cacheable functions are essentially memoized. \txcache's library
provides a \command{make-cacheable} function that takes an
implementation of a cacheable function and returns a wrapper function
that can be called to take advantage of the cache.  When called, the
wrapper function checks if the cache contains the result of a previous
call to the function with the same arguments that is consistent with
the current transaction's snapshot. If so, it returns it. Otherwise,
it invokes the implementation function and stores the returned value
in the cache. With proper linguistic support (\eg{Python decorators}),
marking a function cacheable can be as simple as adding a tag to its
existing definition.

Our cacheable function interface is easier to use than the
\command{get}/\command{put} interface provided by existing caches like
\memcached. It does not require programmers to manually assign keys to
cached values and keep them up to date. Although seemingly
straightforward, this is nevertheless a source of errors because
selecting keys requires reasoning about the entire application and how
the application might evolve. Examining MediaWiki bug reports, we
found that several \memcached-related MediaWiki bugs stemmed from
choosing insufficiently descriptive keys, causing two different
objects to overwrite each other~\cite{mediawiki-keys}. In one case, a
user's watchlist page was always cached under the same key, causing
the same results to be returned even if the user requested to display
a different number of days worth of changes.


\txcache's programming model has another crucial benefit: it does not
require applications to explicitly update or invalidate cached results
when modifying the database. Adding explicit invalidations requires
global reasoning about the application, hindering modularity: adding
caching for an object requires knowing every place it could possibly
change.  This, too, has been a source of bugs in
MediaWiki~\cite{mediawiki-invalidations}. For example, editing a wiki
page clearly requires invalidating any cached copies of that page. But
other, less obvious objects must be invalidated too. Once MediaWiki
began storing each user's edit count in their cached \textsc{User}
object, it became necessary to invalidate this object after an
edit. This was initially forgotten, indicating that identifying all
cached objects needing invalidation is not straightforward, especially
in applications so complex that no single developer is aware of the
whole of the application.

% \txcache instead tracks each application data object's dependencies on
% particular database state, and automatically generates invalidations
% when it is updated. To achieve this, the \txcache library interposes
% on an application's database queries, as noted in
% Figure~\ref{fig:library-api}. However, it does so only to receive
% metadata from the database server, as noted in
% Section~\ref{sec:library}. It does not attempt to parse or rewrite the
% SQL queries or results themselves.

\subsection{Consistency Model}
\label{sec:model:consistency}

\txcache provides \emph{transactional consistency}: all requests
within a transaction see a consistent view of the system as of a
specific timestamp. That is, requests see only the effects of other
transactions that committed prior to that timestamp. For read/write
transactions, \txcache supports this guarantee by running them
directly on the database, bypassing the cache entirely.  Read-only
transactions use objects in the cache, and \txcache ensures that
nevertheless they view a consistent state.

\if 0
 It is not
practical to simultaneously guarantee both transactional consistency
and \emph{freshness}, where the cache reflects the latest state of the
system. To do so would require a concurrency control scheme like
two-phase locking~\cite{gray77:_granul_of_locks_and_degrees} or
optimistic concurrency
control~\cite{kung81:_optim_method_for_concur_contr}. Then, read-only
transactions, even ones that access only the cache, could be blocked
or force other transactions to block or retry, negating the benefits
of the cache.


In order to provide consistency without requiring transactions to
block, \txcache relaxes its freshness guarantee for read-only
transactions. It achieves this using a multiversion concurrency
control approach similar to snapshot isolation
databases~\cite{berenson95:_critiq_of_ansi_sql_isolat_level}, where
read-only transactions are assigned a timestamp, and then are only
allowed to read values that are current as of that timestamp.
\fi

Most caches return slightly stale data simply because modified data
does not reach the cache immediately. \txcache goes further by
allowing applications to specify an explicit \emph{staleness limit} to
\command{begin-ro}, indicating that that the transaction can see a
view of data from that time or later. However, regardless of the age
of the snapshot, each transaction always sees a consistent view.  This
feature is motivated by the observation that many applications can
tolerate a certain amount of staleness~\cite{keeton09:_lazyb}, and
using stale cached data can improve the cache's hit
rate~\cite{liskov04:_trans_file_system_can_be_fast}.

Applications can specify their staleness limit on a per-transaction
basis. Additionally, when a transaction commits, \txcache provides the
user with the timestamp at which it ran. Together, these can be used
to avoid anomalies. For example, an application can store the
timestamp of a user's last transaction in its session state, and use
that as a staleness bound so that the user never observes time moving
backwards.  More generally, these timestamps can be used to ensure a
causal ordering between related
transactions~\cite{lamport78:_time_clock_and_order_of}.

We chose to have read/write transactions bypass the cache entirely so
that \txcache does not introduce new anomalies. The application can
expect the same guarantees (and anomalies) of the underlying
database. For example, if the underlying database uses snapshot
isolation, the system will still have the same anomalies as snapshot
isolation, but \txcache will never introduce snapshot isolation
anomalies into the read/write transactions of a system that does not
use snapshot isolation. Our model could be extended to allow read/write
transactions to read information from the cache, if applications are
willing to accept the risk of anomalies. One particular challenge is
that read/write transactions typically expect to see the effects of
their own updates, while these cannot be made visible to other
transactions until the commit point.


%%% Local Variables: 
%%% mode: latex
%%% TeX-PDF-mode: t
%%% TeX-master: "paper.tex"
%%% End: 

% LocalWords:  versioned versioning php timestamp Cacheable cacheable memoized
% LocalWords:  invalidations