\chapter{Evaluation}
\label{cha:evaluation}

This chapter analyzes the effectiveness of \txcache using the \rubis
web application benchmark. The evaluation seeks to answer the
following questions:

\begin{itemize}
\item What changes are required to modify an existing web application to
  use \txcache? (\autoref{sec:eval:rubis}).
\item How much does caching improve the application's performance, and
  how does the cache size affect the benefit? (\autoref{sec:eval:size})
\item Does allowing transactions to see slightly stale data improve
  performance? (\autoref{sec:eval:staleness})
\item How much of a performance penalty does \txcache's
  guarantee of transactional consistency impose?
  (\autoref{sec:eval:consistency})
\item How much overhead is caused by \txcache's modifications to the
  database? (\autoref{sec:eval:db})
\end{itemize}

\section{Implementation Notes}
\label{sec:eval:impl}

We implemented all the components of \txcache, including the cache
server, database modifications to PostgreSQL to support validity
tracking and invalidations, and the cache library with PHP language
bindings.

\paragraph{Cache Server.}
The cache server implements the versioned cache described in
\autoref{sec:consistency-protocol:cache}. It provides a versioned hash
table interface, and maintains an index from invalidation tags to
cached objects necessary to process invalidation messages. The server
accepts requests from the \txcache library over a simple TCP-based RPC
protocol. It is single-threaded, allowing it to avoid locking. This
contributes to the simplicity of the server: our implementation
only requires about 2500 lines of C code.

Our cache server is not optimized for maximum throughput, as cache
throughput was not a bottleneck in any of our experiments. Several
performance optimizations have been developed for \memcached that
could be applied to the \txcache cache server. For example, a custom
slab allocator can be used to reduce the cost of allocating cached
objects (our implementation uses the standard system
\texttt{malloc}). Using a UDP-based RPC protocol can also improve
performance by reducing memory overhead and avoiding contended locks
in the kernel~\cite{saab08:_scalin_memcac_at_faceb}. 


\paragraph{Database Modifications.}
We modified \postgres 8.2.11 to provide the necessary support for use
with \txcache: pinned snapshots, validity interval computation, and
invalidations. This version of \postgres provides snapshot isolation
as its highest isolation level rather than serializability, so we use
this configuration in our experiments.

Our implementation generates invalidations using a simplified version
of the protocol in \autoref{cha:invalidations}. It does not attempt to
store invalidation locks; it simply generates all appropriate
invalidations for every update.  Our implementation tracks
invalidation tags using two granularities: relation-level tags and
index-entry tags. Because this version of \postgres does not
incorporate index-range locking, fine-granularity invalidation tags
cannot be used for range queries; this limitation was not an issue for
the queries used in our benchmark application. Subsequently, we
incorporated index-range locking into \postgres as part of a new
serializable isolation
level~\cite{ports12:_serial_snaps_isolat_postg}.

The modifications required to add \txcache support to \postgres were
not extensive; we added or modified about a thousand lines of code to
implement pinned snapshots and validity interval tracking, and another
thousand lines to implement invalidations.

The modified database interacts with the pincushion, which creates new
pinned snapshots and notifies application servers about them. We also
use a separate daemon to receive invalidation notices from the
database and distribute them to cache nodes. It currently sends
unicast messages to each cache server rather than using a more
sophisticated application-level multicast mechanism.

\paragraph{Client Library.}
The \txcache library is divided into two parts: a language-independent
component that accesses the cache and ensures consistency, and a set
of language-specific bindings. The language-independent part uses the
lazy timestamp selection protocol of
\autoref{sec:consistency-protocol:lazy} to ensure consistency between
objects read from the cache and those obtained from the database. It
also mediates interactions between the application and the database:
applications must use the \txcache library's interface to send queries
to the database. This interface always passes queries directly to the
database (it does not try to parse or otherwise interpret them), but
the library may also insert commands to start a transaction at an
appropriate snapshot, and monitors the validity intervals and
invalidation tags associated with each query result.

\paragraph{PHP Bindings.}
The \rubis benchmark is implemented in PHP, so we provided a set of PHP
language bindings for \txcache. These bindings provide the support for
applications to designate cacheable functions, and marshal and
unmarshal objects when they are stored in or retrieved from the cache.

We previously presented the \txcache interface in terms of a
\command{make-cacheable} higher-order procedure. PHP lacks support for
higher-order procedures, so the syntax is a bit more awkward. Instead,
the PHP library provides a \texttt{txcache\_call} function that takes
as arguments the name of a function to call and the arguments to pass
to it. This function checks for an appropriate result in the cache,
and, if none is found, invokes the function. \autoref{fig:php-example}
demonstrates a typical use: users invoke a 
\texttt{getItem} function which is actually
a wrapper function that calls \texttt{txcache\_call}.

\begin{figure}[tbp]
  \centering
  \ttfamily\small
  \begin{lstlisting}[escapechar={@},language={php},columns=fullflexible]
function getItemImpl($itemId)
{
  $res = sql_query("SELECT * FROM users ...");
  @\llangle\textit{some computation on query result} \rrangle@
  $user = @\llangle\textit{result} \rrangle@
  return $user;
}
function getItem($itemId)
{
  global $TX;  // this contains the cache configuration
  return txcache_call($TX, 'getItemImpl', $itemId);
}    
  \end{lstlisting}
  \caption{Example of a cacheable function in PHP}
  \label{fig:php-example}
\end{figure}

\section{Porting the \rubis Benchmark}
\label{sec:eval:rubis}

\rubis~\cite{amza02:_specif_and_implem_of_dynam} is a benchmark that
implements an auction website modeled after eBay where users can
register items for sale, browse listings, and place bids on items. We
ported its PHP implementation to use \txcache.  Like many small PHP
applications, the PHP implementation of \rubis consists of 26 separate
PHP scripts, written in an unstructured way, which mainly make
database queries and format their output. Besides changing code that
begins and ends transactions to use \txcache's interfaces, porting
\rubis to \txcache involved identifying and designating
cacheable functions. The existing implementation had few functions, so
we had to begin by dividing it into functions; this was not difficult
and would be unnecessary in a more modular implementation.

We cached objects at two granularities. First, we cached large
portions of the generated HTML output (except some headers and
footers) for each page. This meant that if two clients viewed the same
page with the same arguments, the previous result could be
reused. Second, we cached common functions such as authenticating a
user's login, or looking up information about a user or item by
ID. Even these fine-grained functions were often more complicated than
an individual query; for example, looking up an item requires
examining both the active items table and the old items table.  These
fine-grained cached values can be shared between different pages; for
example, if two search results contain the same item, the description
and price of that item can be reused.

\paragraph{Determinism.}
Cacheable functions must be deterministic and depend only on database
state. Generally, identifying such functions is easy; indeed, nearly
every read-only function in the system has this property. However,
while designating functions as cacheable, we discovered that one of
\rubis's actions inadvertently made a nondeterministic SQL
query. Namely, one of the search results pages divides results into
pages using a ``\command{select} \ldots\@ \command{limit} 20
\command{offset} $n$'' statement, \ie returning only the $n$ through
$n+20$th results. However, it does not enforce an ordering on the
items it returned (\ie it lacks an \command{order by} clause) so the
database is free to return \emph{any} 20 results. This caused search
results to be divided into pages in an unpredictable and inconsistent
way. This turned out to be a known bug in the PHP implementation,
which was not written by the original authors of \rubis.

\txcache generally leaves determining which functions are cacheable up
to the application developer. However, it incorporates a simple sanity
check that was sufficient to detect this error. The cache issued a
warning when it detected an attempt to insert two versions of the same
object that had overlapping validity intervals, but different
values. This warning indicated a possible non-determinism.

\paragraph{Optimizations.}
We made a few modifications to \rubis that were not strictly necessary
but improved its performance. To take better advantage of the cache,
we modified the code that performs a search and displays lists of
matching items. Originally, this function performed a join in the
database, returning the list of matching items and their
description. We modified it to instead perform a database query that
obtains the list of matching items, then obtain details about each
item by calling our \texttt{getItem} cacheable function. This allowed
the individual item descriptions to be cached separately.

Several other optimizations were not related to caching, but simply
eliminated other bottlenecks with the benchmark workload. For example,
we observed that one interaction, finding all the items for sale in a
particular region and category, required performing a sequential scan
over all active auctions, and joining it against the users table. This
severely impacted the performance of the benchmark with or without
caching, to the point where the system spent the majority of this time
processing this query. We addressed this by adding a new table and
index containing each item's category and region IDs. We also removed
a few queries that were simply redundant.

\section{Experimental Setup}
\label{sec:eval:setup}

We used \rubis as a benchmark to explore the performance benefits of
caching. In addition to the PHP auction site implementation described
above, \rubis provides a client emulator that simulates many
concurrent user sessions: there are 26 possible user interactions
(\eg{browsing items by category, viewing an item, or placing a bid}),
each of which corresponds to a transaction.  The emulator repeatedly
selects the next interaction for each client by following a transition
matrix and each client's interactions are separated by a
randomly-chosen ``think time''.  We used the standard \rubis
``bidding'' workload, a mix of 85\% read-only interactions (browsing)
and 15\% read/write interactions (placing bids). The think time
between interactions was chosen from a negative exponential
distribution with 7-second mean, the standard for this benchmark (as
well as the TPC-W benchmark~\cite{tpc-w}).

We ran our experiments on a cluster of 10 servers, each a Dell
PowerEdge {\liningnumerals SC1420} with two 3.20 GHz Intel Xeon CPUs,
2 GB RAM, and a Seagate {\liningnumerals ST31500341AS 7200 RPM} hard
drive. The servers were connected via a gigabit Ethernet switch, with
0.1 ms round-trip latency. One server was dedicated to the database;
it ran PostgreSQL 8.2.11 with our modifications. The others acted as
front-end web servers running Apache 2.2.12 with PHP 5.2.10 (using
\texttt{mod\_fcgi} to invoke the PHP interpreter), or as
cache nodes. Four other machines, connected via the same switch,
served as client emulators. Except as otherwise noted, database server
load was the bottleneck.

We used two different database configurations. One configuration was
chosen so that the dataset would fit easily in the server's buffer
cache, representative of applications that strive to fit their working
set into the buffer cache for performance. This configuration had
about 35,000 active auctions, 50,000 completed auctions, and 160,000
registered users, for a total database size about 850 MB. The larger
configuration was disk-bound; it had 225,000 active auctions, 1
million completed auctions, and 1.35 million users, for a total
database size of 6 GB.

For repeatability, each test ran on an identical copy of the
database. We ensured the cache was warm by restoring its contents from
a snapshot taken after one hour of continuous processing for the
in-memory configuration and one day for the disk-bound configuration.

For the in-memory configuration, we used seven hosts as web servers,
and two as dedicated cache nodes. For the larger configuration, eight
hosts ran both a web server and a cache server, in order to make a
larger cache available. 


\section{Cache Sizes and Performance}
\label{sec:eval:size}

\begin{figure}[tp]
  \centering
  \subfigure[In-memory database]{
    \includegraphics[keepaspectratio,width=0.8\textwidth]{data/size.pdf}
    \label{fig:eval:size:small}
  }
  
  \subfigure[Disk-bound database]{
    \includegraphics[keepaspectratio,width=0.8\textwidth]{data/size-big.pdf}
    \label{fig:eval:size:big}
  }
  \caption{Effect of cache size on peak throughput (30 second
    staleness limit)}
  \label{fig:eval:size}
\end{figure}

We evaluated \rubis's performance in terms of the peak throughput
achieved (requests handled per second) as we varied the number of
emulated clients. Our baseline measurement evaluates \rubis
running directly on an unmodified \postgres database, without \txcache. This achieved a peak throughput of 928 req/s with the
in-memory configuration and 136 req/s with the disk-bound configuration.

We then ran the same experiment with \txcache enabled, using a 30
second staleness limit and various cache sizes. The resulting peak
throughput levels are shown in Figure~\ref{fig:eval:size}. Depending
on the cache size, the speedup achieved ranged from $2.2\times$ to
$5.2\times$ for the in-memory configuration and from $1.8\times$ to
$3.2\times$ for the disk-bound configuration. This throughput
improvement comes primarily from reduced load on the database server.

The \rubis PHP benchmark does not perform significant
application-level computation. Even so, we see a 15\% reduction in
total web server CPU usage. We could expect a greater reduction in
application server load on a more realistic workload than this simple
benchmark, as it would likely perform more application-level
computation, \eg to do more sophisticated formatting of
results. Furthermore, our measurement of web server CPU usage takes
into account not just time spent on legitimate application processing
but also the time the PHP interpreter spends parsing source code on
each request.

Cache server load is low. Even on the in-memory workload, where the
cache load is spread across only two servers, each cache server's load
remained below 60\% utilization of a single CPU core.  Most of this
CPU overhead in kernel time, suggesting inefficiencies in the kernel's
TCP stack as the cause; switching to a UDP protocol might alleviate
some of this overhead~\cite{saab08:_scalin_memcac_at_faceb}. The
average cache lookup latency is 0.7~ms, significantly lower than even
the in-memory database's average query latency of 5.5~ms (and even
before the database reaches its capacity). There are an average of 2.4
invalidation tags per cached object, and the tags themselves average
16 bytes.

\subsection{Cache Contents and Hit Rate}

We analyzed the contents of the cache (after the warmup period) for
both the in-memory and disk-bound configurations. As shown in
\autoref{fig:cache-contents}, the cache contains many items -- over
500,000 for the in-memory configuration and over 4.5 million for the
disk-bound configuration. Nearly all of the cache server's memory
(over 95\%) goes to storing the cached values themselves; another 2\%
goes to storing the names (keys) for these values. Validity intervals
are small (9 bytes per object), so storing them does not require much
space. Invalidation tags are slightly larger (16 bytes, on average)
and more numerous (an average of 2.4 tags per cache entry).  However,
the cost of storing invalidation tags is still small: about 2\% of
memory.

\autoref{fig:cache-size-histo} shows the distribution of the sizes of
cached objects. Half of the cached
objects are smaller than 400~bytes; these objects typically represent
the results of a single database query, such as looking up the name of
a user. Most of the rest of the objects are under 8~KB in size; these
objects are typically generated HTML fragments or the results of more
complex queries. There are a handful of objects as large as 100~KB,
\eg the detailed bid history of especially popular items,
but there are sufficiently few of them that they do not make up a
significant fraction of the cache workload.

\begin{table}[tbp]
  \centering
  \begin{tabular}{lrrrr}
    \toprule
    & \multicolumn{2}{c}{\bf In-memory workload} & \multicolumn{2}{c}{\bf Disk-bound workload}\\
    \midrule
    \textbf{Cache entries} & $560{,}756$ && $4{,}506{,}747$ \\
    \midrule
    \textbf{Total cache size} & $817.9$ MB && $7.820$ GB \\
    \quad\textrm{Cached values} & $778.5$ MB & ($95.2\%$) & $7.477$ GB & ($95.6\%$)  \\
    \quad\textrm{Cache keys} & $18.1$ MB & ($2.2\%$) & $145$ MB & ($1.8\%$) \\
    \quad\textrm{Validity intervals} & $4.9$ MB & ($0.6\%$) & $40$ MB & ($0.5\%$)\\
    \quad\textrm{Invalidation tags} & $16.4$ MB & ($2.0\%$) & $166$ MB & ($2.1\%$)\\
    \bottomrule
  \end{tabular}
  \caption{Cache contents breakdown}
  \label{fig:cache-contents}
\end{table}

\begin{figure}[tbp]
  \centering
  \subfigure[In-memory database]{
    \includegraphics[width=0.9\textwidth]{data/object-size-histo}
  }
  \subfigure[Disk-bound database]{
    \includegraphics[width=0.9\textwidth]{data/object-size-histo-big}
  }
  \caption{Distribution of cache object sizes. Note the logarithmic x-axis.}
  \label{fig:cache-size-histo}
\end{figure}

Figure~\ref{fig:eval:hitrate:small} shows that for the in-memory
configuration, the cache hit rate ranged from $27\%$ to $90\%$, increasing
linearly until the working set size is reached, and then growing
slowly. On the in-memory workload, the cache hit rate directly
translates into an overall
performance improvement because each cache hit represents load removed
from the database -- and a single cache hit might eliminate the need
to perform several queries. 

\begin{figure}[tp]
  \centering
  \subfigure[In-memory database]{
    \includegraphics[keepaspectratio,width=0.8\textwidth]{data/hitrate.pdf}
    \label{fig:eval:hitrate:small}
  }
  \subfigure[Disk-bound database]{
    \includegraphics[keepaspectratio,width=0.8\textwidth]{data/hitrate-big.pdf}
    \label{fig:eval:hitrate:big}
  }
  \caption{Effect of cache size on cache hit rate (30 second staleness
    limit)}
  \label{fig:eval:hitrate}
\end{figure}

Interestingly, we always see a high hit rate on the disk-bound
database (Figure~\ref{fig:eval:hitrate:big}), but this hit rate does
not always translate into a large performance improvement. This
illustrates an important point about application-level caching: the
value of each cached object is not equal, so hit rates alone rarely
tell the whole story. This workload exhibits some very frequent
queries, such as looking up a user's nickname by ID. These can be
stored in even a small cache, and contribute heavily to the cache hit
rate. However, they are also likely to remain in the database's buffer
cache, limiting the benefit of caching them. This workload also has a
large number of data items that are individually accessed rarely (\eg
the full bid history for auctions that have already ended). The latter
category of objects collectively makes up the bottleneck in this
workload, as the database must perform random I/O to load the
necessary information. The speedup of the cache is primarily
determined by how much of this data it can hold.


\section{The Benefit of Staleness}
\label{sec:eval:staleness}

The staleness limit is an important parameter. By raising this value,
applications may be exposed to increasingly stale data, but are able
to take advantage of more cached data. An invalidated cache entry
remains useful for the duration of the staleness limit, which is
valuable for values that change (and are invalidated) frequently.

\begin{figure}[t]
  \centering
  \includegraphics[keepaspectratio,width=0.8\textwidth]{data/freshness.pdf}
  \caption{Impact of staleness limit on peak throughput}
  \label{fig:eval:freshness}
\end{figure}

Figure~\ref{fig:eval:freshness} compares the peak throughput obtained
by running transactions with staleness limits from 1 to 120 seconds.
Even a small staleness limit of 5-10 seconds provides a significant
benefit.  The \rubis workload includes some objects that are expensive
to compute and have many data dependencies: for example, indexes of
all items in a particular category along with their current prices.
These objects are invalidated frequently, but the staleness limit
permits them to be used.  The benefit begins to diminish at around a
staleness limit of 30 seconds. This reflects the fact that the bulk of
the data either changes infrequently (such as information about
inactive users or auctions), or is modified frequently but also
accessed multiple times every 30 seconds (such as the aforementioned
index pages).



\section{The Cost of Consistency}
\label{sec:eval:consistency}

A natural question is how \txcache's guarantee of transactional
consistency affects its performance. We explore this question in two
ways: we examine cache statistics to analyze the causes of cache
misses, and we compare \txcache's performance against a
non-transactional cache.

\subsection{Cache Miss Analysis}

We classified cache misses into four types, inspired by the common
classification for CPU cache
misses~\cite{hill89:_evaluat_assoc_cpu_caches}: 
\begin{itemize}
\item \emph{compulsory miss}: the object was never in the cache, \ie
  the cacheable function was never previously called with the same
  arguments
\item \emph{staleness miss}: the object has been invalidated, and is
  sufficiently old that it exceeds the application's staleness limit
\item \emph{capacity miss}: the object was previously evicted from the
  cache
\item \emph{consistency miss}: some sufficiently fresh version of
  the object was available, but it was inconsistent with previous data
  read by the transaction
\end{itemize}
The last category, consistency misses, are precisely the category of
interest to us, as these are the cache misses that would be caused by
\txcache but would not occur in a non-transactional cache.

The \txcache cache server records statistics about the cache hit rate
and cache misses of various
types. Figure~\ref{fig:eval:cache-miss-types} shows the breakdown of
misses by type for four different configurations of the \rubis
workload. Our cache server cannot distinguish staleness
and capacity misses, because it discards the associated validity
information when removing an item from the cache either due to
staleness or cache eviction.

We see that consistency misses are the least common by a large
margin. Consistency misses are rare, as items in the cache are likely
to have validity intervals that overlap with the transaction's
request, either because the items in the cache change
rarely or the cache contains multiple versions. Workloads with higher
staleness limits experience a higher proportion of consistency misses
(but fewer overall misses) because they have more stale data that must
be matched to other items valid at the same time. The 64~MB-sized
cache's workload is dominated by capacity misses, because the cache is
smaller than the working set. The disk-bound experiment sees more
compulsory misses because it has a larger dataset with limited
locality, and few consistency misses because the update rate is
slower.

\begin{table}[tb]
  \centering{
  \begin{tabular}{rcccc}
    \toprule
    & \multicolumn{3}{c}{\bf\liningnumerals in-memory DB} & \bf\liningnumerals disk-bound\\
    & \bf\liningnumerals 512 MB & \bf\liningnumerals 512 MB  & \bf\liningnumerals 64 MB & \bf\liningnumerals 9 GB \\
    & \bf\liningnumerals 30 s stale & \bf\liningnumerals 15 s stale & \bf\liningnumerals 30 s stale & \bf\liningnumerals 30 s stale\\
    \midrule
    \input{data/missstats.tex}\\
    \bottomrule
  \end{tabular}}
\caption[Breakdown of cache misses by type]{Breakdown of cache misses
  by type. Figures are percentage of total misses.}
  \label{fig:eval:cache-miss-types}
\end{table}


\subsection{Experimental Comparison}

The low fraction of consistency misses suggests that providing
consistency has little performance cost. We verified this
experimentally by comparing against a non-transactional version of our
cache. That is, we modified our cache to continue to use our
invalidation mechanism, but configured the \txcache library to
blithely ignoring consistency: it
accepted
any data that was valid within the last 30 seconds, rather than the
usual protocol that requires the validity intervals to intersect.

\begin{figure}[tp]
  \centering
    \includegraphics[keepaspectratio,width=0.8\textwidth]{data/size-memcache.pdf}
  \caption{Comparison of \txcache and a non-transactional cache for
    varying cache sizes; in-memory workload}
  \label{fig:eval:memcached}
\end{figure}

The results of this comparison are shown in
\autoref{fig:eval:memcached}, which reproduces the experiment in
Figure~\ref{fig:eval:size:small}: it measures \rubis system throughput
for varying cache sizes on the in-memory configuration. Now, however,
the non-transactional version of our cache is shown as the ``No
consistency'' line.  As predicted, the benefit that this configuration
provides over consistency is small. On the disk-bound configuration,
the results could not be distinguished within experimental error.

\section{Database Overhead}
\label{sec:eval:db}

The \txcache system requires some modifications to the database, to
compute validity intervals, provide access to recent snapshots, and
generate invalidations.  The experiments described earlier compare the
performance of \rubis running with \txcache to a baseline
configuration running directly on a unmodified \postgres database. Not
surprisingly, the benefit of caching exceeds the additional overhead
incurred by the database modifications. Nevertheless, we would like to
know how much overhead these modifications cause.


To answer this question, we first performed the baseline \rubis
performance experiment (without \txcache) with both a stock copy of
\postgres and our modified version. We found no observable difference
between the two cases: our modifications had negligible performance
impact. Because the system
already maintains multiple versions to implement snapshot isolation,
keeping a few more versions around adds little cost, and tracking
validity intervals and invalidation tags simply adds an additional
bookkeeping step during query execution.

Because our end-to-end web application benchmark saw negligible
overhead, we performed a database-level benchmark to more precisely
measure the cost of these modifications. We used the \texttt{pgbench}
benchmark~\cite{pgbench}, a simple transaction-processing benchmark
loosely based on TPC-B~\cite{tpc-b}. This benchmark runs transactions
directly on the database server; there is no cache or application
server involved. The modified database, however, still computes
validity intervals and generates invalidation messages (which are sent
to a ``multicast'' daemon that simply discards them, since there are
no cache servers). A simulated pincushion contacted the database every
second to obtain a new pinned snapshot, then released after 1--180
seconds.

We ran these benchmarks on a different system configuration than the
\rubis benchmark. We used a 2.83 GHz Core 2 Quad Q9550 system with 8
GB RAM running Ubuntu 11.10. The most significant difference is that
the database was not stored on disk at all; rather, it was stored on
an in-memory file system (\texttt{tmpfs}). As a result, the workload
is purely CPU-bound -- even more so than the in-memory \rubis
configurations, where modifications and write-ahead log updates still
had to be written to disk. This is a ``worst-case'' configuration
intended to expose the overhead of our modifications. In particular,
it stresses the invalidation mechanism, because this in-memory
configuration can support an unusually high update rate because
updates do not need to be synchronously written to disk.

Even on this worst-case, in-memory configuration, the overhead of
\txcache's database modifications is low. \autoref{fig:pgbench}
compares the performance of our modified database to stock \postgres
8.2.11. The modifications impose an overhead of less than 7\% for
computing validity intervals and generating invalidations. The amount
of overhead also depends on the age of the pinned snapshots that are
being held, as the system must retain all tuple versions that are
newer than the earliest pinned snapshot. This makes query processing
slightly more expensive, because the system must examine these
versions and decide which ones should be visible to a running
transaction. However, the performance impact is not large: the cost of
retaining an extra 180 seconds of state is under 3\%, even with the
unusually high update rate of this benchmark.

\begin{figure}[tbp]
  \centering
  \includegraphics[width=0.9\textwidth]{data/pgbench}
  \caption{\texttt{pgbench} transactions per second for modified and unmodified \postgres}
  \label{fig:pgbench}
\end{figure}

%%% Local Variables: 
%%% mode: latex
%%% TeX-command-default: "Make"
%%% TeX-PDF-mode: t
%%% TeX-master: "main.tex"
%%% End: 

%  LocalWords:  versioned allocator optimizations invalidations TCP
%  LocalWords:  granularities serializability serializable RPC PHP
%  LocalWords:  unicast multicast timestamp cacheable unmarshal SQL
%  LocalWords:  transactional UDP
