\svnInfo $Id$  

\txcache is designed for systems consisting of a number of application
servers that interact with a database server. These application
servers might be web servers running embedded scripts (\eg{with
  \texttt{mod\_php} or the like}), or they might be dedicated
application servers, as with Sun's Enterprise Java Beans. The database
server is a standard relational database; we assume that there is a
single database and the applications use it to store all of their
persistent state.

\begin{figure}[tp]
  \centering
  \includegraphics{arch3.pdf}
  \caption{Key components in a \txcache deployment. The system
    consists of a single database, a set of cache nodes, and a set of
    application servers. \txcache also introduces a application
    library, which handles all interactions with the cache server.}
  \label{fig:architecture}
\end{figure}

\txcache introduces two new components, as shown in
Figure~\ref{fig:architecture}: a cache and an application-side cache
library for interfacing with it. In addition, \txcache requires some
modifications to the database server, introducing new features to
support transactional caching. The cache is partitioned across a set
of cache nodes, which may run on dedicated hardware or share it with
other servers. These nodes store cached data as key-value mappings,
keeping the data entirely in memory. The \txcache library
transparently translates cacheable function calls in the application
to cache accesses: it is responsible for inserting data to and
retrieving from the cache, including assigning keys.

A key characteristic of the design is that the cache does not lie
directly between the application and the database. Unlike query caches
or other middle-tier database caches~\cite{dbcache,csql,timesten},
\txcache does not cache database results directly. Instead, it caches
the result of application computations, which may be derived from
database queries. Applications typically perform some processing on
data they obtain from the database, perhaps converting it into an
internal object representation or generating an HTML page. \txcache
can cache the results of these computations, reducing the load on the
application server as well as the database. This property is
important, as the application server load is significant in many web
applications and can become a
bottleneck~\cite{amza02:_bottl_charac_of_dynam_web_site_bench}.

As Sections~\ref{sec:cache}--~\ref{sec:library} describe in detail,
\txcache ensures consistency because the cache is versioned. It can
store multiple versions of the same cached object, tagged with the
range of time over which the cached value accurately reflects the
state of the database. This \emph{validity interval} is computed
automatically by the database, and attached to the cache entry by the
\txcache library. \txcache uses these intervals to provide consistency
by ensuring that, within a read-only transaction, the \txcache library
only retrieves values from the cache and database that were valid at
the same time. Thus, logically, each transaction only reads cached
values from a snapshot of the database taken at a particular time.


\subsection{Programming Model}
\label{sec:architecture:model}

In addition to providing consistency guarantees, one of our main goals
was to make it effortless to incorporate caching into a new or
existing application. \txcache's library makes it possible to cache
computations simply by designating functions that should be
cached. In this section, we describe the interface it presents to the
programmer and the requirements for using it.

Programs group their operations into transactions, through the
\txcache library's \command{begin} and \command{commit}
functions. When starting a transaction, the program declares whether
it will be a read-only or read/write transaction. If read-only, the
application can also specify any requirements for how fresh the data
must be; Section~\ref{sec:stale:anomalies} discusses how applications
can use these freshness requirements.  We focus on optimizing
read-only transactions, as these are most common in most
workloads.\edatnote{DRKP}{Citation for this?}  Read/write transactions
do not take advantage of caching; \txcache's library forwards them
directly to the database, so they execute exactly as they would on an
unmodified system.

Within a transaction, operations can be grouped into \emph{cacheable
  functions}. These are actual functions in the program's code,
annotated to indicate that their results can be cached.
A cacheable function can consist of database queries and
computation. Caching imposes some fundamental restrictions on these
functions: they must not have side-effects, and they must not depend
on any inputs other than their arguments and the state of the
database. For example, it would not make sense to cache a function
that returns the current time. We believe it is reasonable for
programmers to identify such cacheable functions.

Cacheable functions are essentially memoized: \txcache's library
replaces them with a wrapper function that, when called, checks if the
result of another call to the same function with the same arguments is
in the cache at an acceptable timestamp. If so, it returns the cached
value.  Otherwise, the function's actual implementation is executed
and the returned value placed in the cache.


The application must perform all of its database accesses through the
\txcache library interface. However, the library interposes on
database queries only to monitor them for dependency-tracking
purposes, as described in Section~\ref{sec:library}. It does not
attempt to parse or rewrite the SQL queries themselves.

Notably, \txcache does not require applications to explicitly
invalidate cached results when they modify the database, in contrast
to other application data caches such as \memcached. This was an
important design goal, because adding explicit invalidations requires
global reasoning about the entire application, hindering modularity:
adding caching for an object requires knowing every place it could
possibly change.  For example, consider placing a new bid on an item
in our example auction site. Clearly, any cached copies of the item's
page must be invalidated, because the price has changed. Some other
objects that must be invalidated are less obvious: the item's price
also appears on various search result pages, and on the home pages of
all users who bid on it. Finding all of these cached objects is not
straightforward, especially in applications so complex that no single
developer is aware of all of them.


%%% Local Variables: 
%%% mode: latex
%%% TeX-PDF-mode: t
%%% TeX-master: "paper.tex"
%%% End: 

% LocalWords:  versioned versioning php timestamp Cacheable cacheable memoized
% LocalWords:  invalidations
