\svnInfo $Id$  


High performance web applications use many different techniques to
improve their throughput. These range from lightweight
application-level caches which typically do not provide transactional
consistency, to database replication systems that improve database
performance while providing the same consistency guarantees, but do not address application server load.

\subsection{Application-Level Caching}
\label{sec:relwork:applevel}

Applying caching at the application layer is an appealing option
because it can improve performance of both the application servers and
the database. Dynamic web caches operate at the highest layer, storing
entire web pages produced by the application, requiring them to be
regenerated in their entirety when any content changes. These caches
need to invalidate pages when the underlying data changes, typically
by requiring the application to explicitly invalidate
pages~\cite{yu99:_scalab_web_cache_consis_archit} or specify data
dependencies~\cite{challenger99:_scalab_system_for_consis_cachin,
  zhu01:_class_based_cache_manag_for}. \txcache obviates this need by
integrating with the database to automatically identify dependencies.

However, full-page caching is becoming less appealing to application
developers as more of the web becomes personalized and
dynamic. Instead, web developers are increasingly turning to
application-level data
caches~\cite{
  bakalova04:_websp_dynam_cache,
  jboss, memcached,
  oracl_web_cache,
  sampathkumar09:_introd_cachin_window_server_appfab_beta}
for their flexibility. These caches allow the application to choose
what to store, including query results, arbitrary application data
(such as Java or .NET objects), and fragments of or whole web pages.

These caches present to applications a
\command{get}/\command{put}/\command{delete} hash table interface, so
the application developer must choose keys and correctly invalidate
objects. As we argued in Section~\ref{sec:model:programming}, this can
be a source of unnecessary complexity and software bugs. Most
application object caches have no notion of transactions, so they
cannot ensure even that two accesses to the cache return consistent
values. Some support transactions \emph{within} the cache, allowing
applications to atomically update objects in the
cache~\cite{sampathkumar09:_introd_cachin_window_server_appfab_beta,jboss},
but none maintain transactional consistency with the database.


% Application-level data caches give the application a minimal
% \command{put}/\command{get}/\command{invalidate} interface, so the
% application developer must choose keys and correctly invalidate cached
% objects. This requirement frequently leads to software bugs, as both
% require the programmer to have global knowledge of the
% system. Wikipedia, which also uses \memcached, has at least two bugs
% in their Bugzilla bug tracker that are caused by incorrect key
% selections and they often have issues with stale data in the cache
% obscuring updates because of incorrect invalidations.  With \txcache's
% programming model, these problems would be avoided because the
% \txcache library selects keys and automatically handles invalidations
% for the application.  Facebook also avoids these problems by using a
% custom library that handles some cache management, but it is probably
% not general-purpose enough for other applications to use and does not
% provide consistency.

% Most application object caches have no notion of consistency, so there
% is no way to ensure even that two accesses to the cache return
% consistent values. Some have a notion of transactions within the
% cache, so the application can run transactions on a view of the cache
% at a single point in
% time~\cite{sampathkumar09:_introd_cachin_window_server_appfab_beta,jboss},
% but none maintain transactional consistency with the database. In
% fact, most application-level caches are completely unaware of the
% underlying storage system. In contrast, \txcache offers consistency
% across the entire system, so the application can expect a consistent
% view regardless of whether the data is from the cache or the database.


%database stuff on relaxed consistency, i.e. snapshot isolation
%relaxed consistency distributed systems i.e. dynamo sinfonia
%relaxed consistency replication, database, or otherwise
%replicating the database, with consistency, generally doesn't work
%very well

%dynamic page caching, most have no consistency semantics

%mid tier caches
%application caches

\subsection{Database Replication}
\label{sec:relwork:replication}

Another popular alternative is to deploy a caching or replication
system within the database layer. These systems replicate the data
tuples that comprise the database, and allow replicas to perform
queries on them. Accordingly, they can relieve load on the database,
but offer no benefit for application server load.

Some replication systems guarantee transactional consistency by using
group communication to execute
queries~\cite{elnikety05:_datab_replic_using_gener_snaps_isolat,
  kemme00:_new_approac_to_devel_and}, which can be difficult to scale
to large numbers of
replicas~\cite{gray96:_danger_of_replic_and_solut}. Others offer
weaker guarantees (eventual consistency)~\cite{downing90:_oscar,
  petersen97:_flexib_updat_propag_for_weakl_consis_replic}, which can
be difficult to reason about and use correctly. Still others require
the developer to know the access pattern
beforehand~\cite{amza03:_distr} or statically partition the
data~\cite{cecchet04:_c_jdbc}.


Most replication schemes used in practice take a primary copy
approach, where all modifications are processed at a master and
shipped to slave replicas, usually asynchronously for performance
reasons. Each replica then maintains a complete, if slightly stale,
copy of the database. Several systems defer update processing to
improve performance for applications that can tolerate limited amounts
of staleness~\cite{bernstein06:_relax, plattner04:_ganym,
  rhm_fas:freshness-sensitive_2002}. These protocols assume that each
replica is a single, complete snapshot of the database, making them
infeasible for use in an application object cache setting where it is
not possible to maintain a copy of every object that could be
computed. In contrast, \txcache's protocol allows it to ensure
consistency even though its cache contains cached objects that were
generated at different times.

Materialized views are a form of in-database caching that creates a
view table containing the result of a query over one or more base
tables, and updating it as the base tables change. Most
work on materialized views seeks to incrementally update the view
rather than recomputing it in its
entirety~\cite{gupta93:_maint_views_increm}. This requires placing
restrictions on view definitions, \eg{requiring them to be expressed
  in the select-project-join algebra}. \txcache's application-level
functions, in addition to being computed outside the database, can
include arbitrary computation, making incremental updates
infeasible. Instead, it uses invalidations, which are easier for
the database to
compute~\cite{candan02:_view_inval_dynam_conten_cachin_multit_archit}.

% \subsection{Relaxing Freshness}
% \label{sec:relwork:concurrency}

% \txcache builds on a long history of previous work on multiversion
% concurrency control. Many different approaches to using multiple
% versions to ensure isolation
% exist~\cite{reed78:_namin_and_synch_in_decen_comput_system,
%   adya95:_effic_optim_concur_contr_using,
%   berenson95:_critiq_of_ansi_sql_isolat_level,
%   bernstein81:_concur_contr_in_distr_datab_system}, the most prevalent
% such approach in production systems being snapshot
% isolation~\cite{berenson95:_critiq_of_ansi_sql_isolat_level}.  In
% snapshot isolation, all data read by a transaction comes from a
% snapshot of the database taken at the time the transaction started.
% Snapshot isolation permits some anomalies that make it an isolation
% level less than true serializability. However, these anomalies only
% occur with read-write transactions; \txcache avoids them because it
% does not attempt to optimize read-write transactions at all.

% Unlike snapshot isolation, where transactions use a snapshot current
% as of the time they started, \txcache can assign transactions a
% snapshot slightly earlier than their actual start time. Similar
% approaches have also been proposed in a distributed environment, for
% both transactional file
% systems~\cite{liskov04:_trans_file_system_can_be_fast} and distributed
% databases~\cite{elnikety05:_datab_replic_using_gener_snaps_isolat}. These
% proposals improve cache performance in an environment where each
% client has its own local cache; \txcache has a different model
% consisting of a single shared cache.

% Bernstein~et~al.\ defined a notion of relaxed-currency
% serializability~\cite{bernstein06:_relax} that encompasses \txcache's
% use of stale data, and rely on a similar notion of validity
% intervals. They assume a model similar to that of the replicated
% database systems discussed previously, in which validity intervals are
% more readily available. Our contributions include a technique for
% easily generating validity intervals using existing database
% concurrency control mechanisms, and using them to generate validity
% information for application-level objects.

% As we argued in Section~\ref{sec:stale:anomalies}, assigning
% transactions an earlier timestamp is safe as long as it does not
% permit any causality anomalies. Other systems also allow operations to
% be reordered when there are no causal dependencies. Lamport introduced
% the happens-before relation, wherein transactions are considered
% current if they do not depend on each
% other~\cite{lamport78:_time_clock_and_order_of}. This idea was notably
% used in the ISIS
% system~\cite{birman87:_exploit_virtual_synch_in_distr_system}, whose
% virtual synchrony communication model enforced ordering constraints
% only on causally-dependent messages.

%%% Local Variables: 
%%% mode: latex
%%% TeX-PDF-mode: t
%%% TeX-master: "paper.tex"
%%% End: 

% LocalWords:  multiversion serializability timestamp Lamport