\documentclass{article}
\usepackage{url}

\begin{document}

A common challenge in storage systems such as ours is providing
consistency. Transactional consistency is desirable, as it simplifies
the design of applications that use the storage system, and can
prevent potentially-dangerous anomalies. However, providing
transactional consistency is typically considered expensive in
distributed storage systems, so many users opt instead to sacrifice
consistency for higher performance.

We are developing techniques to allow a storage system to maintain
transactional consistency while still providing high performance. Our
approach, based on an earlier
proposal~\cite{liskov04:_trans_file_system_can_be_fast}, improves
performance by permitting read-only transactions to see slightly stale
system state, but ensures that they see a consistent view.

One application we are developing for this technique is a
transactional application data cache. Application data caches have
emerged as a popular and effective way to improve performance in
distributed systems such as web applications. These caches store
high-level application data objects derived from accesses to an
underlying storage system such as a database.  As a result, they can
both reduce load on the storage service and reduce computational load
on the application servers.  Such caches, exemplified by
\emph{memcached}~\cite{memcached}, have been widely adopted for web
applications because of their scalability and cost-effectiveness.

However, existing application-level caches do not provide
transactional consistency: there is no way to ensure that two accesses
to the cache (or one access to the cache and one to the database)
return values that reflect a view of the database at one point in
time. Thus, even if the underlying storage layer can ensure
transactional consistency (\emph{i.e}~serializable isolation), these
guarantees are violated by introducing the caching layer.

We are developing an alternative in the form of a transactional cache,
TxCache, which guarantees that all values accessed by the application
during a transaction (whether retrieved from the cache or database)
reflect a consistent snapshot of the database. TxCache maintains
versions in the cache and, using minor modifications to the database,
automatically tracks the range of times at which each cached value is
valid. Then, TxCache can ensure transactional consistency by
retrieving only cached values valid at a particular time.

TxCache allows read-only transactions to run on snapshots slightly in
the past. Permitting the use of slightly stale data improves
performance by increasing cache utilization. Furthermore, it will
allow the system to avoid cache invalidations, which require either
significant effort on the part of the application developer to
implement manually, or significant complexity in the storage system to
generate automatically. Instead of using invalidations, TxCache can
make the most conservative assumption that query results become
invalid immediately after they are returned, and yet still cache and
reuse them.

Some of the key components we are developing include:
\begin{itemize}
\item a lightweight multiversion cache server for in-memory storage of
  application data objects
\item techniques for modifying an existing database management system
  to run queries on past snapshots and to efficiently compute validity
  information for queries
\item a client library that provides a simple API for adding caching
  to existing applications by simply designating functions as
  cacheable
\item a protocol for dynamically choosing the timestamp assigned to a
  transaction to maximize the availability of cached data
\item storage server techniques for generating and distributing
  invalidations, notifying clients when their cached data becomes
  stale
\end{itemize}

A paper describing this work has been submitted for
publication~\cite{ports09:_trans_cachin_of_applic_data}. Preliminary
results are promising: our cache prototype can improve the performance
of a standard web application benchmark by a factor of about 2.5.

\bibliographystyle{plain}
\bibliography{pr}

\end{document}
