\svnInfo $Id$  

%Problem: database become bottleneck in web app because they do not
%scale well
Today's web applications are used by millions of users and demand
implementations that scale accordingly. A typical implementation
follows the three-tier model, in which client machines contact web
servers that implement the application logic, and the web servers rely
on an underlying database for storing and managing persistent
state. Frequently, the database layer becomes the bottleneck. Whereas
web server capacity can often be increased simply by adding new nodes,
increasing database capacity is typically a difficult and costly
proposition, requiring careful partitioning or complex distributed
database systems.
 
% how people solve it: use application caching
Application data caches are an effective way to reduce the load on the
database. These caches have the added benefit that they reduce
computational load on the application servers as well as query load on
the database server. A promising new approach for building such a
cache has recently emerged: a distributed in-memory shared cache.
This approach, exemplified by \emph{\memcached}~\cite{memcached}, has
seen widespread use on well-known web applications such as LiveJournal
and Facebook, which find it to be the most scalable and cost-effective
way to improve application performance.

% explain how it works
\memcached provides a straightforward interface: it appears to the
application as a hash table that can store application-level data
(typically derived from DBMS queries).  Applications can use this hash
table in many different ways, from simply caching the results of
database queries to caching application objects or entire generated
web pages. \memcached achieves excellent performance by storing these
mappings in RAM, partitioned across a cluster of machines. If the data
requested is no longer in the cache, \memcached simply notifies the
application of a cache miss, forcing the application to recompute the
value, \eg{by performing a query on the database}.

%what's wrong with it
Although fast and flexible, \memcached has some drawbacks. Most
notably, it does not provide transactional consistency: there is no
way to ensure that two accesses to the cache (or one access to the
cache and one to the database) return values that reflect a view of
the database at one point in time. While the backing database can
ensure that all queries performed in a transaction reflect a
consistent view of the database, \ie{it can ensure serializable
  isolation}, it is nearly impossible to maintain these consistency
guarantees while using a cache like \memcached that has no notion of
transactions.

Accordingly, \memcached has seen use primarily for applications for
which consistency is not particularly critical (\eg{social networking
  websites}). In applications with stricter consistency requirements
(\eg{online banking or e-commerce}), exposing inconsistent values to
the user can have undesirable consequences. For example, in an
eBay-like auction site, the item's current price and highest bidder
might be cached separately. An inconsistency between the two could
lead a user to observe his own top bid attributed to someone else,
causing that user to then attempt to outbid himself.  Furthermore,
even in applications where correctness is not critical, the potential
for inconsistencies leads to more complex application logic, because
the application must be able to cope with violated invariants.

%how we're going to fix it
%more here
In this paper, we argue that providing transactional semantics is not
incompatible with cache performance and scalability. We introduce a
transactional cache (\emph{\txcache{}}), which guarantees that all
values retrieved from the cache or database during a transaction
reflect a consistent snapshot of the database. \txcache maintains
versions in the cache and, using minor modifications to the database,
automatically tracks the range of times at which each cached value is
valid. Then, by retrieving only cached values valid at a particular
time, \txcache ensures transactional consistency.

In addition to providing transactional consistency, \txcache also
strives to reduce the complexity of the application by helping manage
the cache. With \memcached, applications are responsible for inserting
new values, checking for cached values and updating cached data. In
particular, when the database changes, applications must explicitly
\emph{invalidate} cached data that might no longer be valid. Correctly
identifying the values to invalidate is difficult because it requires
global reasoning about the application.

\txcache provides a simpler programming model. Applications designate
functions as cacheable and the \txcache library handles inserting the
result of the function into the cache, retrieving the value next time
the function is called with the same arguments, and most importantly,
invalidating the value when the result has changed.

\txcache uses a novel technique to improve performance and avoid
explicit invalidations: it allows read-only transactions to run on
snapshots slightly in the past. Permitting the use of slightly stale
data improves performance by increasing cache
utilization. Furthermore, instead of using invalidations, \txcache can
then make the most conservative assumption that query results become
invalid immediately after they are returned, and yet still cache and
reuse them. Eliminating the need for invalidations then simplifies
programming. We argue in Section~\ref{sec:stale} that running
transactions on slightly stale data actually fits well with the model
of web applications.

We tested \txcache using the \rubis auction website benchmark~\cite{amza02:_specif_and_implem_of_dynam}. We measured
throughput for differing numbers of client sessions and found that
\txcache increases peak throughput by 2.5x.
%Our experimental evaluation shows\HERE

The next section gives an overview of the system architecture,
including a more detailed explanation of the programming
model. Section~\ref{sec:stale} discusses the consistency properties of
our system and the implications of running on recent snapshots. The next
three sections discuss the implementation of our system, with
Section~\ref{sec:cache} covering the architecture of the cache,
Section~\ref{sec:database} covering the database modifications and
Section~\ref{sec:library} covering the application
library. Section~\ref{sec:evaluation} presents our experimental
results. Section~\ref{sec:related-work} reviews the related work and
Section~\ref{sec:conclusion} concludes.
% Adding cache support to an application can also be challenging,
% because most application-level caches, including \memcached, require
% the application to manage the cache. In particular, applications must
% invalidate the cache whenever 

% LocalWords:  memcached DRKP lookups versioned timestamp online invalidations

%%% Local Variables: 
%%% mode: latex
%%% TeX-command-default: "Make"
%%% TeX-PDF-mode: t
%%% TeX-master: "paper.tex"
%%% End: 
% LocalWords:  serializable