\svnInfo $Id$  

To test the performance of \txcache, we used the \rubis benchmark from
Rice University~\cite{amza02:_specif_and_implem_of_dynam}.  \rubis
implements an auction website modeled after eBay where users can
register items for sale, browse listings, and place bids on items. A
client emulator simulates many concurrent user sessions: there are 26
possible user interactions (\eg{browsing items by category, viewing an
  item, or placing a bid}), each of which corresponds to a
transaction. The
emulator repeatedly selects the next interaction for each client by
following a transition matrix and each client's interactions are
separated by a randomly-chosen ``think time''. We used the standard
\rubis ``bidding'' workload, which consisted of a mix of 85\%
read-only interactions (browsing) and 15\% read/write interactions
(placing
bids)~\cite{amza02:_bottl_charac_of_dynam_web_site_bench}. \rubis used
a think time chosen using a negative exponential distribution with a
7-second mean, as also used in the well-known TPC-W benchmark.

We ran our experiments on a cluster of five physical machines, each a
Dell PowerEdge SC1420 with two 3.20 GHz Intel Xeon CPUs and 2 GB of
RAM. The servers were connected via a gigabit Ethernet switch, with a
round-trip latency of about 0.1 ms between them. One server was
dedicated to the database; it ran PostgreSQL 8.2.11 with our
modifications. Two servers acted as front-end web servers; they ran
Apache 2.2.9 with \texttt{mod\_php}. In addition, these two servers
also acted as cache nodes, running our software and each dedicating
512 MB of RAM to the cache. Using the same physical machines as web
servers and cache nodes is a common practice with caches like
\memcached. The final two machines acted as client emulators. Except
when otherwise noted, database server performance was always the
bottleneck.

We initialized the database with about 3,300 active auctions, 50,000
completed auctions, and 100,000 registered users. The total size of
the database is about 1.1 GB, meaning that the working set easily fits
in the database server's buffer cache. This is representative because
all web applications today must be able to fit their working set into
the buffer cache to achieve reasonable performance.

\begin{figure*}[tbh]
\begin{center}
\subfigure[Throughput] {
\includegraphics[width=0.9\columnwidth,keepaspectratio]{data/tput.pdf}
\label{fig:eval:main:throughput}
}
\subfigure[Latency] {
\includegraphics[width=0.9\columnwidth,keepaspectratio]{data/latency.pdf}
\label{fig:eval:main:latency}
}
\caption{Effects on peak throughput/latency of adding \txcache to
\label{fig:eval:main}
  \rubis workload.}
\end{center}
\end{figure*}

\subsection{Porting \rubis}
\label{sec:exp:porting-rubis}

One of \txcache's goals is to make it easier to add caching to a new
or existing application. The \txcache library makes it straightforward
to designate a function as cacheable. However, ensuring that the
program has functions suitable for caching still requires some
effort. In this section, we describe our experiences modifying the
\rubis benchmark to take advantage of caching.

The first modification was straightforward: code that makes database
queries and begins and ends transactions needed to be modified to do
so using \txcache's library calls instead of the PHP SQL frontend
library directly. A minor complication was that the \rubis
implementation originally used a MySQL database, so we also needed to
port it to use Postgres instead; this was simply a matter of fixing
a few non-standard SQL constructs.

Next, we had to identify cacheable functions. Because the PHP
implementation of \rubis was largely written in a sequential style
with few functions, we had to begin by dividing it into functions; this
was not difficult and would be unnecessary in a more modular
implementation. We chose to designate functions cacheable with two
main granularities. First, we cached large portions of the generated
HTML output
(except some headers and footers) for each page. This meant that if
two clients viewed the same page with the same arguments, the previous
result could essentially be reused. Second, we cached common functions
such as looking up information about a user or item by ID. Even these
fine-grained functions were often more complicated than an individual
query; for example, looking up an item requires examining both the
active items table and the old items table.
These fine-grained cached values can be shared between
different pages; for example, if two search results contain the same
item, then the database only needs to be queried once for the
description and price of that item.

Cacheable functions must be deterministic and depend only on database
state. Generally, identifying such functions is easy; indeed, most of
the read-only functions in the system have this property. However,
while designating functions as cacheable, we discovered that one of
\rubis's actions made a nondeterministic SQL query. Namely, one of the
search results pages does not enforce an ordering on the items it
returns (it uses a \command{select} \ldots \command{limit} 20 query
without an \command{order by} clause).  This turned out to be a known
bug in the PHP implementation, which was not written by the original
authors of \rubis. The bug, which means that the application does not
properly divide search results into pages, became clear when we
observed the validity intervals being inserted into the cache.




\subsection{Basic Performance Evaluation}
\label{sec:eval:main}

We evaluated the performance of \rubis by varying the number of client
sessions being emulated, and measuring both the aggregate throughput
in requests handled per second and the average latency per request.
Our baseline measurement evaluates \rubis running directly on the
Postgres database, with \txcache disabled. This is shown as the ``No
caching''
line in Figure~\ref{fig:eval:main}. Latency remains low, and
throughput scales linearly as the offered load increases, until a peak
of 4,000 clients (643 requests handled per second). Thereafter, latency
increases and throughput drops, in a congestion collapse.

To evaluate the performance impact of the database modifications
described in Section~\ref{sec:database}, we compared three
configurations of the database. We tested both an unmodified version of
Postgres and the version with our modifications. With the modified
database, we also tested both with and without a client concurrently
pinning and unpinning snapshots. We do not show the results, as there
was no observable difference between the three cases. Our
modifications have negligible performance impact, because the system
already maintains multiple versions to implement snapshot isolation,
and keeping a few more versions around adds little cost.

We then ran the same experiment with \txcache enabled, using a 30
second freshness requirement. The results can be seen in
Figure~\ref{fig:eval:main}. The graphs show the same pattern as
without the cache: throughput increases in proportion to the client
count until it reaches the peak. Now, however, the peak is
substantially higher, at 11,000 clients (1,644 requests handled per
second). This 2.5x improvement in peak throughput indicates that the
cache has removed load from the database.

As before, performance suffers beyond the peak because the database is
CPU-bound. Interestingly, when we experimented with a single web
server instead of two, performance with the cache peaked earlier
because the web server reached capacity. Without the cache, the
database server becomes overloaded before even a single web server.

Examining statistics sheds some light on how \txcache improves system
scalability. Over these experiments, the cache hit rate was about
45\%. This sounds low, but it represents a significant performance
improvement because the costs of a cache hit are so much lower than a
miss. When the system is under relatively low load (\ie{in the linear
  region of the graph}), finding and retrieving a value from the cache
takes about 1.5 ms, while a cache miss takes an average of 5
ms. Performing a single database query takes longer than accessing the
cache, and cache misses may involve more than one query plus some
computation. At peak load, when the database is CPU-bound, this effect
is even more dramatic: on average, a cache hit takes 5.7 ms, and a
cache miss 20 ms.

\subsection{Varying Freshness Requirements}
\label{sec:eval:vary-freshn-requ}

The freshness requirement is one of the most important parameters in
our system. By specifying a larger value, applications may be exposed
to increasingly stale data, but they will be able to take advantage of
more cached data. Because our system does not use invalidations, the
freshness requirement limits the useful lifetime of a cached value:
the upper bound on the value's validity interval must be earlier than
the time at which it was inserted into the cache. Therefore,
transactions cannot take advantage of cached data that is older than
their freshness requirement.

\begin{figure}[tbh]
  \centering
  \includegraphics[width=0.9\columnwidth,keepaspectratio]{data/freshness.pdf}
  \caption{Impact of freshness requirement on peak throughput}
  \label{fig:eval:freshness}
\end{figure}

Figure~\ref{fig:eval:freshness} compares the peak throughput obtained
by running transactions with freshness requirements of 5, 15, 30, and
60 seconds. Even a relatively short freshness requirement of 5 seconds
still gives some substantial benefit; the peak throughput increases by
2.1x relative to the baseline. As the freshness requirement increases,
the speedup grows as data in the cache remains useful for longer
periods of time. In the final, 60 second freshness data point, the cache
relieves enough load on the database that the web/cache servers become
the bottleneck, even with two servers. We therefore expect that adding
a third web server would increase performance even further for this 60
second freshness requirement.

\drkp{Maybe look at hit rate statistics here too}

%%% Local Variables: 
%%% mode: latex
%%% TeX-command-default: "Make"
%%% TeX-PDF-mode: t
%%% TeX-master: "paper.tex"
%%% End: 

% LocalWords:  PowerEdge TPC gigabit PostgreSQL CPUs Xeon GHz php cacheable SQL
% LocalWords:  MySQL Postgres granularities nondeterministic invalidations
