\svnInfo $Id$  

We implemented all the components of \txcache, including the cache
server, database modifications to PostgreSQL to support validity
tracking and invalidations, and the cache library with PHP language
bindings.

One of \txcache's goals is to make it easier to add caching to a new
or existing application. The \txcache library makes it straightforward
to designate a function as cacheable. However, ensuring that the
program has functions suitable for caching still requires some
effort. Below, we describe our experiences adding support for caching
to the \rubis benchmark and to \mediawiki.

\subsection{Porting \rubis}
\label{sec:exp:porting-rubis}


% The first modification was straightforward: code that makes database
% queries and begins and ends transactions needed to be modified to do
% so using \txcache's library calls instead of the PHP SQL frontend
% library directly. A minor complication was that the \rubis
% implementation originally used a MySQL database, so we also needed to
% port it to use Postgres instead; this was simply a matter of fixing
% a few non-standard SQL constructs.

\rubis~\cite{amza02:_specif_and_implem_of_dynam} is a benchmark that
implements an auction website modeled after eBay where users can
register items for sale, browse listings, and place bids on items. We
ported its PHP implementation to use \txcache.  Like many small PHP
applications, the PHP implementation of \rubis consists of 26 separate
PHP scripts, written in an unstructured way, which mainly make
database queries and format their output. Besides changing code that
begins and ends transactions to use \txcache's interfaces, porting
\rubis to \txcache involved identifying and designating
cacheable functions. The existing implementation had few functions, so
we had to begin by dividing it into functions; this was not difficult
and would be unnecessary in a more modular implementation.

We cached objects at two granularities. First, we cached large
portions of the generated HTML output (except some headers and
footers) for each page. This meant that if two clients viewed the same
page with the same arguments, the previous result could be
reused. Second, we cached common functions such as authenticating a
user's login, or looking up information about a user or item by
ID. Even these fine-grained functions were often more complicated than
an individual query; for example, looking up an item requires
examining both the active items table and the old items table.  These
fine-grained cached values can be shared between different pages; for
example, if two search results contain the same item, the description
and price of that item can be reused.

% Cacheable functions must be deterministic and depend only on database
% state. Generally, identifying such functions is easy; indeed, nearly
% every read-only function in the system has this property. However,
% while designating functions as cacheable, we discovered that one of
% \rubis's actions made a nondeterministic SQL query. Namely, one of the
% search results pages does not enforce an ordering on the items it
% returns (it uses a \command{select} \ldots \command{limit} 20 query
% without an \command{order by} clause).  This turned out to be a known
% bug in the PHP implementation, which was not written by the original
% authors of \rubis. The bug, which means that the application does not
% properly divide search results into pages, became clear when we
% observed the validity intervals being inserted into the cache.

We made a few modifications to \rubis that were not strictly necessary
but improved its performance. To take better advantage of the cache,
we modified the code for display lists of items to obtain details
about each item by calling our \command{get-item} cacheable function
rather than performing a join on the database. We also observed that
one interaction, finding all the items for sale in a particular region
and category, required performing a sequential scan over all active
auctions, and joining it against the users table. This severely
impacted the performance of the benchmark with or without caching. We
addressed this by adding a new table and index containing each item's
category and region IDs. Finally, we removed a few queries that were
simply redundant.

\subsection{Porting \mediawiki}
\label{sec:exp:porting-mediawiki}

We also ported \mediawiki to use \txcache, to better understand the
process of adding caching to a more complex, existing
system. \mediawiki, which faces significant scaling challenges in its
use for Wikipedia, already supports a variety of caches and
replication systems. Unlike \rubis, it has an object-oriented design,
making it easier to select cacheable functions.

\mediawiki supports master-slave replication for the database server.
Because the slaves cannot process update transactions and lag slightly
behind the master, \mediawiki already distinguishes the few
transactions that must see the latest state from the majority that can
accept the staleness caused by replication lag (typically 1--30
seconds). It also identifies read/write transactions, which must run
on the master. Although we used only one database server, we took
advantage of this classification of transactions to determine which
transactions can be cached and which must execute directly on the
database.

Most \mediawiki functions are class member functions. Caching only
pure functions requires being sure that functions do not mutate their
object. We cached only static functions that do not access or modify
global variables (\mediawiki rarely uses global variables). Of the
non-static functions, many can be made static by explicitly passing in
any member variables that are used, as long as they are only read.
For example, almost every function in the \command{Title} class, which
represents article titles, is cacheable because a \command{Title}
object is immutable.


Identifying functions that would be good candidates for caching was
more challenging, as \mediawiki is a complex application with myriad
features. Developers with previous experience with the \mediawiki
codebase would have more insight into which functions were frequently
used. We looked for functions that were involved in common requests
like rendering an article, and member functions of commonly-used
classes. We focused on functions that constructed objects based on
data looked up in the database, such as fetching a page
revision. These were good candidates for caching because we can avoid
the cost of one or more database queries, as well as the cost of
post-processing the data from the database to fill the fields of the
object. We also adapted existing caches like the localization cache,
which stores translations of user interface messages.

% % We did not cache every function in \mediawiki that was cacheable. For
% % example, accessor functions are not worth caching. Since all queries
% % go through the Database class, it would have been easy to cache all
% % queries, but not all queries are good candidates for caching. Instead,
% % we looked at higher order functions that contain a query and some
% % processing or several queries and tried to cache only commonly used
% % functions. For example, we cache all of the queries and functions 

% The most difficult part of porting \mediawiki for txcache was finding
% functions that are good candidates for caching.  \mediawiki is a
% complex application with dozens of features, so it was hard to find
% frequently-used functions by just looking at the code.  Someone with
% experience with the \mediawiki codebase could much more easily
% identify the best candidates for caching. We took two approaches to
% picking functions to cache. First, we looked for all functions that
% were involved in common requests like rendering an article. Second, we
% looked at all of the functions of commonly used classes, like the
% Title class. We avoided functions that are obviously bad candidates
% for caching, like accessor functions or other simple computation-only
% functions.

% % \mediawiki has several built-in caches that are probably good
% % candidates for caching, although we did not fully port them to
% % \txcache. The localisation cache caches messages that appear on wiki
% % pages and error messages in the local language. \mediawiki saves the
% % localisation cache between requests in the database, so 
