\svnInfo $Id$  

\txcache provides a different consistency guarantee than most
caches. Typical caches strive to guarantee \emph{freshness}: the cache
always reflects the latest state of what it is caching. \txcache
relaxes its freshness guarantee slightly in order to provide
\emph{transactional consistency}: within a transaction block, the
application sees a view consistent with the state of the system (as
captured by the database) at a particular timestamp.  That is, it only
sees the effects of other transactions that had committed prior to
that timestamp.
% In other
%words, the transaction is serializable: it can be assigned a position
%in a global serial ordering of transactions.

For example, revisiting the auction site example from the
introduction, if one user places a bid, other users may not see the
bid immediately, but this is commonly expected behavior on the
internet. However, with \txcache, a user will never outbid himself
because his bid was attributed to someone else. \txcache also ensures
that the user sees his latest bid; this property is explained in
Section~\ref{sec:stale:anomalies}.

Ensuring both freshness and transactional consistency requires using a
concurrency control scheme like two-phase
locking~\cite{gray77:_granul_of_locks_and_degrees} or optimistic
concurrency control
(OCC)~\cite{kung81:_optim_method_for_concur_contr}.
In such schemes, read-only transactions can be forced to abort, block
on other transactions, or cause other transactions to block.
% An alternative cache design might
% discard any data item from the cache when it is updated by a
% transaction, but this also limits performance by placing additional
% load on the database server even in cases where applications are
% tolerant of non-fresh data.
For example, if a read-only transaction reads a value, and then a
read/write transaction attempts to modify it, either one transaction
must abort, or the read/write transaction must wait for the read-only
transaction to finish. If the read/write transaction were allowed to
make its modification, and the read-only transaction reads the value
again, either the freshness requirement or the consistency requirement
will be violated: the old value is no longer fresh, but the new value
is not consistent with its previous read.  Using locking or OCC
resolves these problems, but is not a practical approach for a cache,
as updates done by read/write transactions would prevent read-only
transactions from being able to proceed or force them to abort,
negating much of the scalability advantage of the cache.  A key
performance requirement for \txcache is that read-only transactions
should not be forced to block on another transaction or abort, even if
the other transaction updates data the read-only transaction accesses.

Instead of locks, previous versions of data can be used to ensure
transactional consistency with an application controlled freshness
tolerance.  \txcache uses this approach, as do multiversion
concurrency control algorithms for databases, such as snapshot
isolation~\cite{berenson95:_critiq_of_ansi_sql_isolat_level} (used in
PostgreSQL, Oracle, and others). In this scheme, read-only
transactions are assigned a timestamp, and the transaction only reads
values that are current as of that timestamp. Accordingly, the
transaction only sees data committed prior to that timestamp.
Timestamps thus establish the serialization order of
transactions. This multi-version approach allows \txcache to achieve
its requirement that read-only transactions not be forced to block or
abort.

\txcache does not interfere with read/write transactions, so \txcache
does not introduce any anomalies when used with a database. In
general, the application can expect the same guarantees (and
anomalies) of the underlying database. For example, if the underlying
database uses snapshot isolation, the system will still have the same
anomalies as snapshot isolation, but \txcache will never introduce
snapshot isolation anomalies into a system that does not use snapshot
isolation.

% \drkp{Explain that we don't interfere with r/w transactions at all, so
%   no danger of snapshot isolation anomalies.
%   They get the DB's isolation properties, which might be snapshot
%   isolation or serializability depending on how it does concurrency
%   control.}

\subsection{Running Transactions in the Past}
\label{sec:stale:past}

Traditionally (\eg{in multiversion concurrency control}), a
transaction is assigned the most recent timestamp at the time it
begins, causing it to see a ``snapshot'' of the database as of that
instant. \txcache uses a different approach whereby a read-only
transaction can specify that it is tolerant of some degree of
staleness in the data, causing it to be assigned an \emph{earlier}
timestamp and to be run slightly in the past. This approach provides
two benefits: first, it improves the cache hit rate by taking
advantage of slightly stale cached data.  Second, it makes it possible
to avoid explicit invalidations by tracking the ranges of times for
which a cached result is valid, based on the timestamp of transaction
that generated the result.

By running transactions in the past, \txcache can use cached data even
when it is not the most current version available. In particular,
given an application's staleness tolerance $t$, \txcache assigns a
transaction a timestamp within the past $t$ seconds, chosen so as to
maximize the probability that the data it accesses will be available
in the cache; Section~\ref{sec:library:lazy} describes our algorithm
for making this choice. The benefit here is that if multiple
transactions access the same data within a short interval, they can
use the same cached value, even if that value was updated in the
meantime.

Running transactions in the past is also the mechanism that makes it
possible for \txcache to operate without invalidations. This is an
important benefit, because, as we have argued, requiring explicit
invalidations places a high burden on the programmer. Without
explicit invalidations, all values in the cache are only known to be valid at
times prior to their insertion into the cache.  This is because at any time after a result was cached, 
a subsequent update might overwrite a tuple in the database that was used to
generate the cached result.  If transactions were
limited to running in the present, they would be unable to take
advantage of any cached data: a transaction that runs in the present
will be assigned a timestamp that must be greater than the past
timestamps for which cached values are valid. Because \txcache allows
transactions to run slightly in the past, it can assign the
transaction a timestamp for which cached values are available.


% The approach we
% took in building \txcache is to modify the database server to report
% the interval during which a result is valid
% (Section~\ref{sec:database:validity} describes the necessary
% modifications). \txcache tracks these intervals, and associates them
% with each value stored in the cache.  Then, when a transaction invokes
% a cacheable function in a transaction with a given timestamp, \txcache
% can determine whether any cached values of that function can be reused
% by comparing the transaction's timestamp to the validity intervals in
% the cache.  \edatnote{DRKP}{This is not a great explanation --- maybe
%  when I'm less sleepy...}

\subsection{Avoiding Anomalies}
\label{sec:stale:anomalies}

Because \txcache allows read-only transactions to run in the past,
where they may see stale data, one potential concern is that using
stale data might introduce new anomalies, or be incompatible with some
applications.  We argue here that these consistency semantics do in
fact align well with application requirements and user expectations,
as long as some restrictions are imposed on how far in the past
transactions can run.  \txcache makes it possible for applications to
express these restrictions as freshness requirements.

The main requirement is \emph{causality}: after a user has seen some
data with timestamp $t$, later actions on behalf of that user should
only see data that is valid after timestamp $t$. This requirement can
easily be enforced: the application keeps track of the latest
timestamp the user has seen, and specifies that \txcache should never
allow a read-only transaction to run before that timestamp. The
timestamp can be stored in the user's per-session state, \eg{by
  storing it in a HTTP cookie}.  More generally, in distributed
applications where multiple application servers interact with each
other, causal interactions can be tracked using Lamport
clocks~\cite{lamport78:_time_clock_and_order_of}, as in systems like
Thor~\cite{liskov99:_provid_persis_objec_in_distr_system} and
ISIS~\cite{birman87:_exploit_virtual_synch_in_distr_system}.

The reason that it is safe to run transactions in the past is that,
within a short time period, applications must already be tolerant of
transactions being reordered. If, for example, a read-write
transaction $W$ and a read-only transaction $R$ are sent at nearly the
same time, the database might happen to process them in either order,
and so $W$'s change might or might not be visible to $R$. Either
result must be acceptable to the client executing $R$, unless it knows
that $W$ took place from some external source. In other words, unless
a causal dependency between the two clients exists, their transactions
can be considered concurrent (in the sense of Lamport's
\emph{happens-before}
relation~\cite{lamport78:_time_clock_and_order_of}) and hence either
ordering is acceptable.

This type of consistency also fits well with the model of a web
service. Users expect that the results they see reflect a recent,
consistent state of the system, but there is no guarantee that it is
the \emph{latest} state. By the time the result is returned to the
user, a concurrent update might have made it already stale. Indeed, if
executed on a snapshot isolation database, there is no guarantee that
the results are even current at the time the database returns an
answer; new committed results may have arrived during query
execution. If a transparent web cache is present, these effects are
exacerbated. Even without a cache, to ensure that a result remains
correct until the user acts on it (e.g., a ticket reservation is held
until paid for), some kind of lease or long-duration lock is
necessary; \txcache does not interfere with these techniques, because
it does not affect read-write transactions.

% Each server keeps
% track of the latest timestamp from which it has seen data, and
% includes it in the messages. Whenever a server receives a message
% containing a newer timestamp, it updates its timestamp. When starting
% a read-only transaction, each server specifies its timestamp, and
% \txcache ensures that the new transaction is ordered after it.

Of course, causal interactions external to the system cannot be
tracked. For example, nothing can be done about a user who sees a
result on one computer, then moves to another computer and loads the
same page. To limit the impact of this problem, applications can
request a minimum freshness, \eg{they might never be willing to accept
  data more than 30 seconds old}. The value of this minimum freshness
depends on the application requirements; some applications may be
willing to tolerate significantly stale data.

%%% Local Variables: 
%%% mode: latex
%%% TeX-command-default: "Make"
%%% TeX-PDF-mode: t
%%% TeX-master: "paper.tex"
%%% End: 

% LocalWords:  timestamp serializable transactionally multiversion
% LocalWords:  serializability invalidations