\chapter{Caching for Read/Write~Transactions}
\label{cha:rw}
\label{sec:extensions:rw}

The system we have presented so far does not use the cache at all for
read/write transactions. Instead, they access data only through the
database; this approach allows the database's concurrency control
mechanism to ensure the serializability of these transactions. This
approach is reasonable for application workloads that have a low
fraction of read/write transactions. However, it would be desirable to
allow read/write transactions to access cached data.

\section{Challenges}

\txcache's consistency protocol for read-only transactions ensures
that they see a consistent view of data that reflects the database's
state at some time within the application's staleness limit. This
guarantee isn't sufficient to ensure the serializability of read/write
transactions -- even with a staleness limit of zero. The problem is
that \emph{write skew} anomalies can occur: two concurrent
transactions that update different values might each read the old
version of the object that the other is updating.
%\drkp{example?}

Furthermore, read/write transactions must be able to see the effects
of their own uncommitted modifications, even in cached objects. For
example, the read/write transaction that places a new bid might update
the auction's price, then read the auction's data in order to display
the result to the user. In doing so, it should not access a cached
version that does not reflect the user's new bid. At the same time,
the changes made by a read/write transaction should not be visible to
\emph{other} transactions until it commits.

\section{Approach}

Here, we describe an approach that allows read/write transactions to
use cached data. This approach differs from the one for read-only
transactions in three ways:

\begin{itemize}
\item A read/write transaction can only see cached data that is still
  current, avoiding the aforementioned write-skew anomalies.
\item A read/write transaction cannot use cached objects that are
  impacted by the transaction's own modifications, ensuring that the
  transaction always sees the effects of its own changes
\item A read/write transaction cannot add new data to the cache,
  preventing other transactions from seeing objects that contain
  uncommitted data
\end{itemize}

We require read/write transactions to see data that is still valid in
order to prevent anomalies that could occur if they saw stale
data. The first step towards achieving this is to have the \txcache
library request only objects from the cache server that have unbounded
validity intervals. However, this only ensures that the objects were
valid \emph{at the time of the cache access}; they might subsequently
be invalidated while the read/write transaction is executing.

To ensure that the cached data seen by a read/write transaction
remains current, we take an optimistic approach. We modify the
database to perform a validation phase before committing the
transaction. In this validation phase, the database aborts the
transaction if any of the data it accessed has been modified. To do
so, the database must know what data the application accessed through
the cache. Invalidation tags, which track the data dependencies of
cached objects, make this possible. The \txcache library keeps track
of the invalidation tags of all the cached objects it has accessed
during a read/write transaction, and provides these to the database
when attempting to commit a transaction. The database keeps a history
of recent invalidations. If any of the tags indicated in the commit
request have been invalidated by a concurrent transaction, it aborts
the read/write transaction, 

We must also ensure that read/write transactions see the effects of
their own changes. Again, invalidation tags make this possible, as
they allow the \txcache library to determine whether a cached object
was affected by a read/write transaction's database
modifications. During a read/write transaction, the \txcache library
tracks the invalidation tracks the transaction's writeset: the set of
invalidation tags that will be invalidated by the transaction's
changes. (Note that this requires the database to notify the
read/write transaction of which tags it is invalidating, whereas
previously we only needed to send this list of tags to the caches via
the invalidation stream.) When the library retrieves an object from
the cache, it also obtains the object's basis -- the set of
invalidation tags reflecting its data dependencies. If the object's
basis contains one of the invalidation tags in the transaction's
writeset, then the application must not use this object. In this case,
the \txcache library rejects the cache object, treating it as a cache
miss and recomputing the cached object.
However, unlike a cache miss in a read-only transaction, the \txcache
library does not insert the results of the computation into the cache,
as these results reflect uncommitted data and should not yet be made
visible to other transactions.

% \section{Replicated Storage}
% \label{sec:extensions:replicated}

%\section{Distributed Databases}
%\section{Avoiding Thundering Herds}

%%% Local Variables: 
%%% mode: latex
%%% TeX-command-default: "Make"
%%% TeX-PDF-mode: t
%%% TeX-master: "main.tex"
%%% End: 

%  LocalWords:  serializability invalidations
