\svnInfo $Id$  

%Problem: database become bottleneck in web app because they do not
%scale well
Today's web applications are used by millions of users and demand
implementations that scale accordingly. In a typical implementation,
client machines contact web servers that implement the application
logic, which in turn rely on an underlying database for storing and
managing persistent state. Either the database or application server
layers can become a
bottleneck~\cite{amza02:_bottl_charac_of_dynam_web_site_bench}.
Both types of bottlenecks can be addressed separately --- by
partitioning or replicating the database, or by adding more web
servers --- but these approaches can be costly.

\emph{Application-level data caches}, such as
\emph{\memcached}~\cite{memcached}, are an alternate and popular
option for improving performance.  These caches store arbitrary
application-level objects, ranging from database query results to full
or partial generated web pages, in a lightweight, distributed in-memory cache.
Thus, application-level caches can address both potential bottlenecks:
they reduce both query load on the database server and computation
load on the application servers. As a result, they are currently in
widespread use on well-known web applications such as LiveJournal and
Facebook, which find then to be the most scalable and cost-effective way
to improve application performance.

However, existing application-level caches have a critical
drawback. Unlike database-level approaches, they do not provide
transactional consistency.  That is, there is no way to ensure that
two accesses to the cache (or one access to the cache and one to the
database) return values that reflect a view of the database at a
single point in time. While the backing database goes to great length
to ensure that all queries performed in a transaction reflect a
consistent view of the database, \ie{it can ensure serializable
  isolation}, it is nearly impossible to maintain these consistency
guarantees while using a cache that operates on application objects
and has no notion of transactions.

Accordingly, application-level caches have seen use primarily for
applications for which consistency is not necessary
(\eg{social networking websites}). In applications with stricter
consistency requirements (\eg{online banking or e-commerce}), exposing
inconsistent values to the user can have undesirable consequences. For
example, in an eBay-like auction site, the item's current price and
highest bidder might be cached separately. An inconsistency between
the two could lead a user to observe his own top bid attributed to
someone else, causing that user to then attempt to outbid himself.
Furthermore, even in applications where correctness is not critical,
the potential for inconsistencies leads to more complex application
logic, because the application must be able to cope with violated
invariants.

Developers are thus forced to choose abandoning transactional
consistency or sacrificing the benefits of application-level
caching. In this paper, we resolve this tension by extending
transactional semantics to an application-level cache. We introduce a
transactional cache, \emph{\txcache{}}, which guarantees that all data
seen by the application reflects a consistent snapshot of the
database, whether it comes from application-level objects in the cache
or from database queries.

\txcache provides a simple programming model. Applications designate
functions as \emph{cacheable} and the \txcache library automatically
handles inserting the result of the function into the cache,
retrieving the value next time the function is called with the same
arguments, and ensuring transactional consistency of results. In
particular, and unlike the commonplace application-level cache
\memcached, \txcache does not require
applications to explicitly \emph{invalidate} cached data when the
database changes. Correctly identifying the values to invalidate is
difficult, requiring global reasoning about the application to track
down every cached application computation whose value may change as a
result of changed values in the database.

To ensure consistency, \txcache maintains multiple versions in the
cache and, using minor modifications to the database, automatically
tracks the range of times at which each cached value is valid based on
the range of times at which each database query result is valid. It
assigns a timestamp to each transaction and retrieves only cached
values valid at the chosen time.

Because \txcache uses a versioned cache, it allows read-only
transactions to run on snapshots slightly in the past, using slightly
stale data to improve performance by increasing cache
utilization. Furthermore, instead of using invalidations, \txcache can
make the most conservative assumption that query results become
invalid immediately after they are returned, and yet still cache and
reuse them. Eliminating the need for invalidations, in turn, simplifies
programming. We argue in Section~\ref{sec:stale} that running
transactions on slightly stale data fits well with the model
of web applications.

The use of past snapshots has been proven valuable by previous
work on replicated databases that execute queries on stale replicas,
such as FAS~\cite{rhm_fas:freshness-sensitive_2002} and
Ganymed~\cite{plattner04:_ganym}. \txcache uses similar
techniques. However, as an application-level cache, it faces
additional challenges because it cannot maintain a complete snapshot
of every object the application might request, let alone one
consistent at a single point in time. Instead, it must cache objects
that might be valid at different, possibly overlapping intervals. To
do so, \txcache introduces several new mechanisms including a
multi-version cache, database modifications for tracking result
validity, and an algorithm for lazily assigning a transaction's
timestamp.  Furthermore, \txcache is far less resource-intensive than
replicated database scheme because \txcache servers require very
little computation or I/O.

We tested \txcache using the \rubis auction website benchmark~\cite{amza02:_specif_and_implem_of_dynam}. We measured
throughput for differing numbers of client sessions and found that
\txcache increases peak throughput by 77\% while providing only
30-second stale data.
%Our experimental evaluation shows\HERE

The next section gives an overview of the system architecture,
including a more detailed explanation of the programming
model. Section~\ref{sec:stale} discusses the consistency properties of
our system and the implications of running on recent snapshots. The next
three sections discuss the implementation of our system, with
Section~\ref{sec:cache} covering the architecture of the cache,
Section~\ref{sec:database} covering the database modifications and
Section~\ref{sec:library} covering the application
library. Section~\ref{sec:evaluation} presents our experimental
results. Section~\ref{sec:related-work} reviews the related work and
Section~\ref{sec:conclusion} concludes.
% Adding cache support to an application can also be challenging,
% because most application-level caches, including \memcached, require
% the application to manage the cache. In particular, applications must
% invalidate the cache whenever 

% LocalWords:  memcached DRKP lookups versioned timestamp online invalidations
% LocalWords:  serializable Facebook

%%% Local Variables: 
%%% mode: latex
%%% TeX-command-default: "Make"
%%% TeX-PDF-mode: t
%%% TeX-master: "paper.tex"
%%% End: 
