\chapter{Automatic Invalidations}
\label{cha:invalidations}

Until now, we have assumed that we know the precise upper bound of all
objects in the cache. But this assumption is clearly not realistic:
many cached objects are still valid, so the upper bound of their
validity interval lies in the future and we do not know what it
is. Indeed, these cached objects are often the most useful, as they
may remain valid well into the future. To handle these objects
correctly, we need \emph{invalidations}: notifications from the
database to the cache that the data from which a cached object was
computed has changed. The cache uses these notifications to keep the
validity intervals of cached objects up to date as items change.

Invalidations are challenging. Cached objects can have complex,
non-obvious dependencies on multiple data objects from the storage
layer. Existing caches like \memcached leave the question of which
objects to invalidate and when entirely to the application
developer. As we argued in \autoref{sec:programming-model:discussion},
this analysis is challenging for application developers and a frequent
source of bugs. We avoid this and support a programming model where
application developers simply indicate the functions they would like
to cache and the \txcache library handles the rest. Accordingly, our
system is designed to provide \emph{automatic invalidations}: it
tracks data dependencies and automatically updates the cache when the
database is modified.

This chapter describes how the system tracks objects that are
currently valid. It introduces three components. First,
\emph{invalidation streams} (\autoref{sec:inval:inval-stream}) give us
a way to distribute change notifications to the cache and process them
in a way that keeps validity intervals up to date. Second,
\emph{invalidation tags} (\autoref{sec:inval:inval-tags}) allow us to
represent the data dependencies of cached objects. Finally,
\emph{invalidation locks} (\autoref{sec:inval:inval-locks}) provide a
technique for modifying the storage layer to detect data dependencies
and automatically generate invalidations without the application being
involved.

\section{Unbounded Intervals and Invalidation Streams}
\label{sec:inval:inval-stream}

How should the cache represent objects that are currently valid? Other
objects have \emph{bounded} validity intervals: the upper bound
represents a timestamp after which the object may no longer be
valid. Currently-valid objects, however, have \emph{unbounded}
validity intervals: the precise upper bound is unknown. For these
objects, invalidations from the storage layer will allow us to know
when they are no longer valid.

When the database executes a query and reports that its validity
interval is unbounded, \ie the query result is still valid, it so
indicates by setting a still-valid flag on the output. It also
provides a \emph{concrete upper bound}: a timestamp up to which the
result is already known to be valid. The timestamp of the latest
transaction committed on the database can serve as this bound (more
precisely, the latest transaction committed before query execution
\emph{started}, as there could be concurrent changes). We write an
unbounded validity interval as, for example, $[12, 42+)$. Here, the
concrete upper bound of $42$ indicates that the object is valid until
at least timestamp $42$, and the $+$ indicates that it may also be valid
beyond that time.

When the application performs multiple database queries that are
combined to produce a single cacheable object, the \txcache library
combines their validity intervals by taking their
intersection. Computing intersections involving unbounded intervals
follows the obvious rules: the upper bound of the intersection is the
smaller of the concrete upper bounds of the two intervals, and the
still-valid flag is cleared unless it was set on both input
intervals. Thus, for example,
\[[12, 42+) \,\cap\, [15, 37+) = [15, 37+) \qquad \text{and}
\qquad [12, 42+) \,\cap\, [15, 48) = [15, 42)\]

\subsection{Invalidation Streams}

When the storage layer indicates that a query result has an unbounded
validity interval, it assumes responsibility for providing an
invalidation when the result may have changed. It does so by
distributing an \emph{invalidation stream} that identifies which
cached objects may no longer be valid.

The invalidation stream is an ordered sequence of messages generated
by the storage layer. Each message contains a timestamp indicating the
most recently committed read/write transaction at the time that
message was sent, and a list of
identifiers indicating which cached objects are no longer
valid. (\Cref{sec:inval:inval-tags,sec:inval:inval-locks} explain
precisely \emph{how} invalidations identify the affected objects.) If
a long time passes without any read/write transactions committing, the
database also sends an empty invalidation message simply to indicate
that the latest timestamp is still current.  This stream is
distributed to all cache nodes using a reliable multicast mechanism.

When a cache node receives an invalidation message, it truncates the
validity intervals of any affected cached objects, setting their upper
bound to the timestamp included in the invalidation. This reflects the
fact that the corresponding change in the storage layer may have
invalidated the affected object. The cache treats all objects that are
still valid as though the upper bound of their validity interval is
the timestamp of the latest invalidation received. Thus, receiving an
invalidation message effectively acts to extend the validity intervals
of objects that are still valid, while truncating those that are no
longer valid. This is demonstrated in \autoref{fig:invalidation-processing}.

\begin{figure}[tbp]
  \centering
  \includegraphics[width=0.9\textwidth]{figures/invalidation-demo-1}
  \vspace{1em}

  \raisebox{-0.27\height}{\scalebox{4}{$\downarrow$}} \hspace{1em} \large \command{invalidate}($t=54$, $\text{\it tags}=[\text{foo}]$)
  \vspace{1em}
  
  \includegraphics[width=0.9\textwidth]{figures/invalidation-demo-2}
  \caption[Example of cache invalidation processing]{Example of cache invalidation processing. Initially, the
    latest invalidation received by the cache server has timestamp
    $53$, so the two objects in the cache with unbounded intervals are
    known to be valid at least through time 53. After the cache
    receives an invalidation for timestamp $54$, the validity interval
    of the object affected by the invalidation has its upper bound set
    to that timestamp, while the validity interval of the other object
    is extended as it is now known to remain valid at least through
    time 54.}
  \label{fig:invalidation-processing}
\end{figure}

Including a timestamp invalidation stream message ensures that
invalidations are processed by cache nodes in the correct order. The
concrete upper bounds used when adding new objects to the cache
further ensure that \command{insert} operations are ordered with
respect to invalidations. This ordering resolves a potential race
condition: a cached object with unbounded interval might arrive at the
cache node \emph{after} its invalidation. Note that the window for
this race condition is wider than one might expect. A cacheable object
might depend on significant application-level computation or multiple
queries, and the object is not added to the cache until this is
complete, whereas the invalidation might happen any time after the
first query.

When an invalidation message arrives before the object it is meant to
invalidate, however, the concrete upper bound on the inserted object
will be earlier than the timestamp on the latest invalidation
message. To handle this case, cache servers keep a small history of
recently-received invalidation messages so they can check whether a
newly-arrived object has already been invalidated. It isn't a serious
problem if this history is insufficient -- as might happen if an
object took an unusually long time to compute -- because every cached
object has a guaranteed upper bound: the concrete upper bound. The
cache server simply clears the still-valid flag, truncating its
validity interval at this bound.
  
\subsection{Discussion: Explicit Invalidations}

We have not yet addressed the issue of how to detect and track data
dependencies so the system can automatically generate
invalidations. We do, however, have enough of a framework in place
that we could easily implement explicit invalidations: the application
can indicate to the database during each read/write transaction which
cached objects are affected, and the database can place that
information into the invalidation stream. (It isn't clear how, in our
cacheable-function programming model, the application would identify
cached objects; this approach would be better suited for a system
where applications manage the cache themselves.)

Although this approach doesn't address the major usability issue we
identified -- the need for explicit invalidations -- the ordered
invalidation stream does address an important concern for real
systems. Specifically, it solves the race condition issue mentioned
above, where the invalidation arrives before one of the objects it is
supposed to invalidated.  This race condition was a concern for
MediaWiki developers, who opted not to use \memcached to cache
negative lookup results -- the fact that an article \emph{doesn't}
exist. If a user concurrently created an article, and one of these
negative results was added to the cache just after the invalidation,
the negative result might never be invalidated, causing the article
text to effectively be lost. This situation can't arise in our system:
the timestamps on the new cached object and the invalidation messages
reveal the ordering.

A key design decision here is that in our system, all invalidations
come from the database server, which generates the invalidation
stream. It can therefore guarantee that the invalidation stream is
ordered according to the transaction commit order. In other systems,
applications typically send updates directly to the cache servers;
this avoids the need for the database to be involved in cache
invalidations, but makes it difficult or impossible to achieve
reliable ordering of updates and invalidations.


\section{Invalidation Tags}
\label{sec:inval:inval-tags}

% Invalidation streams alone are enough to allow us to implement
% explicit invalidations, but this solution isn't sufficient. It is
% undesirable to require application developers to determine which
% cached objects are affected by a database modification: this is a
% complex and error-prone analysis because it requires global reasoning
% about the application. We want to free application developers from the
% need to do this analysis, by instead tracking data dependencies on
% cached objects.

Invalidation streams alone are enough to allow us to implement
explicit invalidations, but this solution isn't sufficient. We want to
generate invalidations automatically, which means that we need to be
able to represent the data dependencies of a cached object.  We track
data dependencies using \emph{invalidation tags} which can be attached
to objects in the cache. These tags are identifiers representing a
logical set of data items. As a simple example, one might use an
invalidation tag for each relation in the database.  Each
currently-valid cached object has a \emph{basis}: a set of
invalidation tags representing the data used to compute that object,
\eg the relations accessed when computing the cacheable function.

Having this information for each cached object allows us to send
invalidations in terms of invalidation tags, \ie in terms of which
data objects were modified, rather than which cached objects were
affected. This is a fundamentally important difference. Identifying
the data objects modified by a transaction can be done using
\emph{local} analysis. Taking, for example, a system that uses
database relation IDs as invalidation tags, it is easy to identify the
relations affected by an update transaction: the database can track
this information dynamically, or it could be determined using a
relatively simple static analysis of queries. Without invalidation
tags, determining which cached objects might be affected by an update
requires a \emph{global} analysis of the program to determine what
data it might be caching.

\begin{figure}
  % XXX TABLE
  \begin{itemize} 
 \addtolength{\leftmargini}{1.5em}
 \itemindent=-1.5em 
  \item \command{store}(\cmdarg{key, value, validity-interval, \textbf{basis}}) :
    Add a new entry to the cache, indexed by the specified key and
    validity interval. If the validity interval is unbounded, the
    basis is a set of invalidation tags reflecting the object's data
    dependencies. It is not used for objects with bounded validity intervals.
  \item \command{invalidate}(\cmdarg{timestamp, tags}) : Update the
    validity interval for any items with the specified invalidation
    tags in their basis, truncating it at the given
    \cmdarg{timestamp}.
  \end{itemize}

  \caption[Cache interface with invalidation support]{Cache interface with invalidation support. This is an
    extension to the basic interface in
    \autoref{fig:cache-interface}.}
  \label{fig:cache-interface-with-invalidations}
\end{figure}


Every object stored in the cache that has an unbounded validity
interval must also have a corresponding basis so that the cache knows
when to invalidate it.\footnote{The one exception is a cached object
  that depends on no database state, \ie a constant computation. Such
  an object effectively has an infinite validity interval. This rare
  corner case is not very interesting.} The \txcache library includes
this information as an extra argument in a \command{store} request, as
indicated in
\autoref{fig:cache-interface-with-invalidations}. Invalidation tags
are not meaningful for objects that already have bounded validity
intervals, \ie those that are not currently valid. The \txcache cache
server maintains a tree mapping invalidation tags to the objects whose
basis contains them.  When it receives an invalidation message from
the invalidation stream -- consisting of a timestamp and a list of the
invalidation tags affected -- the cache server looks up the affected
objects, and updates their validity interval. It also clears their
basis, as this information is only needed for objects that are still
valid.

\subsection{Tag Hierarchy}

At what granularity should we track data dependencies? Above, we
considered using invalidation tags to track the database
\emph{relations} that a cached object may have been derived
from. Although it is certainly possible to capture data dependencies
at this level, it is clearly overly conservative. In our auction site
example, a query that looks up the current price for a particular
auction will find itself with a data dependency on the entire
\invtag{auctions} table. As a result, a modification to \emph{any
  other} item in the same table will cause the cached object to be
invalidated, which would have a serious detrimental effect on the
cache hit rate.\footnote{The perils of overly course granularity in
  invalidations are well known in other systems. For example, the
  MySQL database includes a simple query cache. It uses relation-level
  invalidations, which are sufficiently restrictive that the
  conventional wisdom is to turn the cache off entirely for tables
  that see even a moderate number of
  updates~\cite{schwartz12:_high_perfor_mysql}.}

We might instead choose to represent data dependencies at a fine
granularity. Taking this approach to its extreme, we could use
individual database \emph{tuples} as the invalidation tags. In
comparison to relation-level tags, this approach has different
problems. It prevents the problem described above where an update to a
single row unnecessarily invalidates a cached result that depends on a
different row. However, some queries may access many tuples. A query
that accesses every row in a large table (\eg to compute aggregate
statistics) could easily depend on many thousands of rows; the
resulting basis would be too costly to store or transmit. Similarly, a
modification that updates every row in a large table (\eg to add a new
column as part of a schema change) could generate invalidations for
thousands of rows that would have to be processed separately. A second
problem with tuple-granularity invalidation tags is that they fail
entirely in the presence of phantoms, which we discuss in
\autoref{sec:inval:inval-locks}.

The tension between overly-coarse relation-granularity tags and
overly-fine tuple-granularity tags indicates that a single granularity
is unlikely to fit all use cases. We address this by supporting
invalidation tags with \emph{multiple granularities}. This provides
the benefits of fine-grained dependency tracking for objects with a
small set of dependencies, but makes it possible to fall back on
coarse-grained (\eg relation-level) dependency tracking when the costs
of doing otherwise would be prohibitive.

To support multiple granularities, we organize invalidation tags into
a hierarchy. A coarse-granularity tag is considered a \emph{supertag}
of all the finer-granularity \emph{subtags} it subsumes. For example,
in a system with a two-level hierarchy consisting of relation and
tuple tags, the tag identifying tuple 127 of the \invtag{users} table
would be a subtag of the tag identifying the entire \invtag{users}
table. To make the relationship clear, we would write this tag as
\invtag{users:127}, with the colon denoting the subtag relationship. A
more complex system might use  more levels in the hierarchy; we
discuss one with four levels in \autoref{sec:inval:inval-locks}.  It is
always safe (conservative) to \emph{promote} a tag in a cached
object's basis, replacing it with its supertag. This can be used to
reduce the size of the object's basis by coalescing multiple
fine-granularity tags into a coarser-granularity tag, at the cost of a
potential loss in precision.

Invalidations can also be issued at multiple granularities. For
example, a single update transaction might modify a small set of
tuples, or modify an entire relation. Representing the latter as a
single invalidation tag reduces the cost both of distributing the
invalidation message and of identifying matching tuples.

In this hierarchical invalidation tag scheme, cache servers need to
process invalidations according to the following rule:
\begin{itemize}
\item an invalidation message containing invalidation tag $x$ affects
  any cached object whose basis contains $x$, a supertag of $x$,
  \emph{or} a subtag of $x$
\end{itemize}
This rule appears somewhat surprising at first glance, but an example
makes it clear. An invalidation message for tuple 127 of the
\invtag{users} table clearly affects any object whose basis contains
the exact tag \invtag{users:127}. It also affects any object that
depends on the entire \invtag{users} table, \ie any supertag of the
specified tag. Similarly, if an invalidation message for the
relation-level tag \invtag{users} arrives, it must also invalidate any
object depending on a subtag like \invtag{users:127}, as invalidating
the entire relation necessarily affects any objects depending on part of
the table.


\section{Invalidation Locks}
\label{sec:inval:inval-locks}

The final piece of the invalidation puzzle is how to automatically
determine the invalidation tags to assign to a cached object, and
which invalidation tags to invalidate as part of a read/write
transaction. The use of invalidation tags makes this a question of
what data is accessed or modified by the transaction -- a question
that the database is in a prime position to answer.

A related question is exactly what semantics we ought to ascribe to
particular invalidation tags. The hierarchical tag structure described
above explains how to process tags with multiple granularities, but it
does not specify a particular meaning for each level of the
hierarchy. This question is more subtle than it might appear. Above,
we considered relation- and tuple-granularity tags. But
tuple-granularity tags are, in fact, problematic for a reason more
fundamental than their fine granularity:
they do not accurately capture read dependencies for relational
databases. In the relational model, queries access subsets of the data
identified by predicates, \eg the set of users whose name is
Alice. Expressing this subset in terms of a set of tuple-level
invalidation tags fails to capture conflicts with newly inserted
tuples: creating a new user named Alice ought to cause the previous
result to be invalidated, but would not. This is another instance of
the ``phantom'' problem in database concurrency
control~\cite{eswaran76:_notion_consis_predic_locks_datab_system}.

Accurately representing the read and write dependencies of operations
in a relational database presents quite a challenge. Besides the
phantom issue discussed above, an effective solution should be able to
handle complex queries implemented using a variety of indexes and
query operators: what invalidation tags would we assign, for example,
to a cached object containing the list of top ten employees by salary?

Our insight is that tracking these read and write dependencies is
precisely the problem solved by the database's concurrency control
mechanism. If the database supports serializable isolation of
concurrent transactions, it must have a mechanism for tracking which
objects are read or modified by a transaction. This is necessary
regardless of how the database implements serializability, as
determining which transactions conflict with each other is a
fundamental
requirement~\cite{eswaran76:_notion_consis_predic_locks_datab_system}.
Moreover, the database's mechanism must be capable of dealing with
phantoms, and must be accurate even for complex queries, as these are
important requirements for the correctness of the database system.

This section presents a general method for adapting existing
concurrency control mechanisms in a database (or other storage
system) to determine which invalidation tags to use or invalidate.

\paragraph{Invalidation Locks.} When a read-only transaction accesses
the database, it acquires a new type of lock, an \emph{invalidation
  lock}. It acquires these locks on the same data items the system
would normally acquire read locks\footnote{For clarity, we describe
  this technique in terms of a database that uses locking for
  concurrency control, but this isn't a requirement.} on for
concurrency control. The presence of one of these locks indicates that
some cached object may depend on the locked data, and we may later
need to send an invalidation if it changes.  Invalidation locks differ
from normal read locks in two important respects. First, they are not
released when a transaction commits. Second, and most importantly,
invalidation locks do not block conflicting write operations; rather,
a conflict simply indicates the need to send an invalidation. This
makes ``lock'' somewhat of a misnomer; it might be more accurate to
refer to them as flags, but we use the term ``lock'' to make the
connection to the database's lock manager clear.\footnote{This abuse
  of nomenclature is not unique to us. Serializable Snapshot
  Isolation, for example, uses non-blocking ``SIREAD locks'' to track
  read dependencies~\cite{cahill08:_serial_isolat_snaps_datab}.}

The set of invalidation locks acquired during execution of a query
determines the set of invalidation tags that must be applied to any
cached objects that depend on it. If the database already has a
hierarchical mechanism it uses internally for tracking
multi-granularity read dependencies (as most common databases do), we
use the same hierarchy for our invalidation locks. The canonical
example of such a hierarchy would be the lock hierarchy in a system
using multi-granularity
locking~\cite{gray77:_granul_of_locks_and_degrees}. However, it is not
specific to two-phase locking; other concurrency control approaches
such as optimistic concurrency
control~\cite{kung81:_optim_method_for_concur_contr,
  adya95:_effic_optim_concur_contr_using} or serialization graph
testing~\cite{casanova81:_concur_contr_probl_datab_system,
  cahill08:_serial_isolat_snaps_datab} frequently also use a
multi-granularity mechanism for tracking read dependencies.

For concreteness, consider the tag structure that is used in
\postgres, the database on which we base our implementation. (Note, by
the way, that \postgres is an example of a database that does not use
locking; it uses Serializable Snapshot
Isolation~\cite{cahill08:_serial_isolat_snaps_datab,
  ports12:_serial_snaps_isolat_postg}, a form of serialization graph
testing.)  It tracks read dependencies using a four-level
hierarchy. The coarsest granularity is an entire database; the next
coarsest is entire relations within that database. Within a relation,
finer-grained dependencies are expressed in terms of \emph{index
  entries}: the next granularity is pages of a particular index,
containing subtags for individual entries on that
page. Using index entries makes it possible to avoid problems with
phantoms: whereas a tuple-granularity tag could not capture the
predicate ``users whose name is Alice'', a tag representing the
\invtag{name=alice} entry in an appropriate index does. This is a
standard technique known as next-key
locking~\cite{mohan90:_aries_kvl}. \autoref{fig:postgres-tag-hierarchy}
shows an example of a four-level tag.

\begin{figure}[tbp]
  \centering
  \Large
  \[ \underbrace{\invtag{rubis}\strut}_{\text{database}} :
  \underbrace{\invtag{users}\strut}_{\text{relation}} :
  \underbrace{\invtag{firstname}/\text{page-36}\strut}_{\text{index page}} :
  \underbrace{\invtag{alice}\strut}_{\text{index entry}} \]
  
  \caption{A four-level invalidation tag in \postgres's tag hierarchy}
  \label{fig:postgres-tag-hierarchy}
\end{figure}

As read/write transactions acquire locks on data they modify, they may
encounter conflicts with invalidation locks. Unlike normal read locks,
the presence of an invalidation lock does not block the
writer. Rather, the lock conflict ``breaks'' the invalidation lock and
issues an invalidation: the invalidation lock is removed, and the
corresponding invalidation tag is added to the invalidation
stream. More precisely, the invalidation lock is not removed and the
invalidation is not issued until the read/write transaction commits;
this accounts for the possibility that the transaction might abort.

The invalidation lock approach offers several benefits:
\begin{itemize}
\item it doesn't require careful thought about exactly how to
  represent data dependencies; it reuses the mechanisms from
  database concurrency control
\item it can handle all types of queries. In particular, using index
  page locking, it can represent a range query such as finding all
  users with ID between 19 and 37. In an earlier version of this
  work~\cite{ports10:_trans_consis_autom_manag_applic_data_cache}, we
  used a simpler scheme that could not handle range queries without
  the use of coarse relation-granularity locks.
\item it only sends invalidations for data that have invalidation
  locks set. Therefore, it can avoid sending invalidations for data
  that is not in the cache, or that has already been invalidated.
\end{itemize}

\subsection{Storing Invalidation Locks}

The challenge with invalidation locks is how to maintain the set of
invalidation locks that have been acquired. Because there will be a
lock on every piece of data used by any cached object, the set of
invalidation locks is likely to be large. It isn't unrealistic to
imagine that there might be one lock for every row in the database (or
potentially even more, as there can be locks for multiple index
entries). This means that the usual technique of maintaining an
in-memory lock table is unlikely to scale.

Fortunately, invalidation locks differ from regular locks in some ways
that can simplify the task of representing them. First, unlike normal
locks, it does not matter which transaction placed an invalidation
lock on a particular object; all that matters is whether any such lock
exists. It is also never necessary to have multiple invalidation locks
on the same object at the same time; it is only necessary to send one
invalidation for a particular tag no matter how many cached objects
might depend on it.

These simplifying assumptions can be used to reduce the footprint
required to store invalidation locks. All that is needed is a single
bit for each lockable object representing whether that object
currently has an invalidation lock set, \ie a single bit per relation,
index page, or index entry. This bit could be represented simply by
setting a flag in the row/page header. For a database where the entire
data set fits in RAM, there is very little cost to maintaining this
information as it does not have to be written to disk. This use case
is actually a common one; in-memory databases are increasingly common
today~\cite{stonebraker07:_archit_era, ousterhout09:_case_ramcl} and
are especially appealing for the sorts of high-performance web
applications that our system targets.

For databases that don't fit entirely in memory, we could still use
the same approach of storing a flag in the row header to indicate
whether an invalidation lock is set. However, this would mean that
performing a read-only query might require writing to the disk to set
this flag, which would be undesirable. Instead, we use another
simplifying assumption about invalidation locks: it is always safe to
assume that there is an invalidation lock set on an object, as this
would at worst cause a spurious but harmless invalidation to be
sent. Accordingly, we track invalidation locks for any objects (pages
or index entries) that are currently available in the buffer cache,
but do not make any effort to persist the invalidation locks to disk
if the pages need to be swapped out to disk; instead, we simply make
the conservative assumption when loading a page from disk that all of
the entries on it have invalidation locks set. This does mean some
unnecessary invalidations will be sent, but the number of these should
be low. Most deployments of our system will strive to have as much
data stored in the cache as possible. The benefit of suppressing
invalidations for data that is not cached comes primarily
from objects that are updated multiple times in quick succession, and
these will likely be still resident in the buffer cache.


%%% Local Variables: 
%%% mode: latex
%%% TeX-command-default: "Make"
%%% TeX-PDF-mode: t
%%% TeX-master: "main.tex"
%%% End: 

%  LocalWords:  invalidations timestamp granularities cacheable MySQL
%  LocalWords:  multicast tuple tuples Serializable