\ifdraft
\svnInfo $Id$
\fi

The indexing system described in the previous chapter addresses one of
the two central challenges of a file sharing system: it provides the
means to perform searches for files. The other major problem remains
to be addressed: finding sources for a particular file and beginning
to download it.

% Searching can support *any* content distribution system
Our indexing system is independent of the file transfer
mechanism. Thus, it is possible to use an existing content
distribution network in conjunction with Arpeggio's indexing system.
For example, a simple implementation might simply store a HTTP URL for
the file as an entry in the metadata blocks. If using a
content-distribution network such as
Coral~\cite{freedman04:_democ_conten_public_with_coral} or
CoDeeN~\cite{wang:_reliab_and_secur_in_codeen} \marnote{Find other
  CDNs?}, the appropriate information can be stored in the metadata
block. This information is, of course, opaque to the indexing system.

\section{Direct vs. Indirect Storage}
\label{sec:content-distribution:direct}


% For example, can just store files in the DHT (as with CFS, etc)
% This doesn't work well because of churn, popular files, etc.
A DHT can be used for direct storage of file contents, as in
distributed storage systems like CFS, the Cooperative File
System~\cite{dabek01:_wide_area_cooper_storag_with_cfs}. In these
systems, file content is divided into chunks, which are stored using
the DHT storage layer. Because the storage layer ensures that the
chunks are stored in a consistent location and replicated, the system
achieves the desirable availability and durability properties of the
DHT. The cost of this approach is maintenance traffic: as nodes join
and leave the network, data must be transferred and new replicas
created to ensure that the data is properly stored and replicated.
For a file sharing network, direct storage is not a viable option
because the amount of churn and the size and number of files would
create such high maintenance costs.

Instead, Arpeggio uses content-distribution systems based on
\emph{indirect storage}: it allows the file content to remain on the
peers that already contain it, and uses the DHT to maintain pointers
to each peer that contains a certain file.  Using these pointers, a
peer can identify other peers that are sharing content it wishes to
obtain. Because these pointers are small, they can easily be
maintained by the network, even under high churn, while the large file
content remains on its originating nodes.  This indirection retains
the distributed lookup abilities of direct storage, while still
accommodating a highly dynamic network topology, but may sacrifice
content availability, since file availability is limited to those files
that are shared by peers currently existing in the network.

In this chapter, we discuss several possible designs of
content-distribution systems that can be used in conjunction with
Arpeggio's indexing functionality.

\section{A BitTorrent-based System}
\label{sec:content-distribution:bittorrent}

In earlier versions of our design~\cite{clements05:_arpeg}, we
proposed a comprehensive system including its own content distribution
mechanism, which we describe below
(Section~\ref{sec:content-distribution:novel}). However, after our
system was designed, BitTorrent increased in popularity and gained a
decentralized tracker method that made it more suitable for use in a
system like ours. As a result, we implemented the following system,
which combines Arpeggio's indexing mechanism with BitTorrent's
transfer capabilities.

BitTorrent~\cite{bittorrent} is an extremely popular protocol for
peer-to-peer file downloads, because it achieves high download speeds
by downloading from multiple sources. It uses a distributed
tit-for-tat protocol to ensure that each peer has incentives to upload
to others~\cite{cohen03:_incen_build_robus_in_bittor}. Besides this
use of incentives, BitTorrent differs from other peer-to-peer file
sharing systems in that it does not provide any built-in searching
capabilities, relying instead on index websites. The information
needed to download a set of files is stored in a \texttt{.torrent}
file, including the file names, sizes, and content hashes, as well as
the location of a tracker. The tracker maintains a list of peers
currently downloading or sharing (\emph{seeding}) the file, and which
pieces of the file they have available.

Since a centralized tracker has many disadvantages, implementers of
BitTorrent clients introduced decentralized mechanisms for finding
peers. In the Peer Exchange protocol~\cite{peer_exchan}, peers gossip
with each other to discover other peers. Though this idea is employed
by many of the most popular clients (albeit with mutually incompatible
implementations), it only functions in conjunction with a
tracker. Trackerless
systems~\cite{forum:_bittor_track_dht_protoc_specif_v1} have been
implemented that use a Kademlia DHT to maintain lists of peers for
each torrent. Both the official BitTorrent client~\cite{bittor_softw}
and the Azureus client~\cite{azureus} support this approach, though
again with mutually incompatible implementations.

With trackerless capabilities, BitTorrent serves as an effective
content distribution system for Arpeggio. We must simply add a
mechanism for mapping the files found by a metadata search to a
\texttt{torrent} file for download. It is not desirable to store the
contents of the \texttt{torrent} file in the file's metadata block,
because many copies of the metadata block are created for KSS
indexing. Instead, we employ a layer of indirection: the metadata
block contains an identifier which can be used to look up the
\texttt{torrent} file via a DHT lookup. Indeed, support for this level
of indirection is already built-in to some BitTorrent clients, such as
Azureus: these clients support
Magnet-URIs~\cite{magnet_uri_draft_specif}, which are a
content-hash-based name that can be resolved via a DHT lookup. For
clients that do not support Magnet-URI resolution, a similar step can
be implemented using Arpeggio's DHT. Note also that for torrents that
contain more than one file, a separate metadata block with the same
torrent identifier can be created for each file, making it possible to
search for any of the individual files.


\section{A Novel Content-Distribution System}
\label{sec:content-distribution:novel}

Though the BitTorrent-based system above is practical and uses
existing technologies in order to ease deployment, starting from
scratch allows us to develop a more comprehensive system tuned to some
of the unique requirements of a peer-to-peer file sharing
network. Below, we describe some salient features of a novel content
distribution system we have designed but have yet to implement; it
includes mechanisms for sharing content between similar files,
efficiently locating sources for a file, and for improving the
availability of popular but rare files.

\subsection{Segmentation}
\label{sec:content-distribution:segmentation}

% Deal with chunks rather than files

For purposes of content distribution, we segment all files into a
sequence of \emph{chunks}. Rather than tracking which peers are
sharing a certain file, our system tracks which chunks comprise
each file, and which peers are currently sharing each chunk. This is
implemented by storing in the DHT a \emph{file block} for each file,
which contains a list of \emph{chunk IDs}, which can be used to locate
the sources of that chunk, as in Table~\ref{tab:layers}.  File and
chunk IDs are derived from the hash of their contents to ensure that
file integrity can be verified.

%% \begin{figure}[tbp]
%%   \centering
%%   \includegraphics{figures/chunks}
%%   \caption{Division of files into chunks}
%%   \label{fig:file-chunk-diagram}
%% \end{figure}

\begin{table}[tbp]
  \centering
  \caption{Layers of lookup indirection}
  \label{tab:layers}
  \begin{tabular}{|c|c|}
    \hline
    \textbf{Translation} & \textbf{Method} \\
    \hline
    keywords $\to$ file IDs & keyword-set index search \\
    file ID $\to$ chunk IDs & standard DHT lookup \\
    chunk ID $\to$ sources & content-sharing subring \\
    \hline
  \end{tabular}
\end{table}

%  The confusing diagram
%  Can share chunks between similar files
%  Partially downloaded files
The rationale for this design is twofold. First, peers that do not
have an entire file are able to share the chunks they do have: a peer
that is downloading part of a file can at the same time upload other
parts to different peers. This makes efficient use of otherwise unused
upload bandwidth.  For example, Gnutella does not use chunking,
requiring peers to complete downloads before sharing them. Second,
multiple files may contain the same chunk.  A peer can obtain part of
a file from peers that do not have an exactly identical file, but
merely a \emph{similar} file.

Though it seems unlikely that multiple files would share the same
chunks, file sharing networks frequently contain multiple versions of
the same file with largely similar content. For example, multiple
versions of the same document may coexist on the network with most
content shared between them.  Similarly, users often have MP3 files
with the same audio content but different ID3 metadata tags.  Dividing
the file into chunks allows the bulk of the data to be downloaded from
any peer that shares it, rather than only the ones with the same
version.

%  The variable-length chunking algorithm (LBFS)
However, it is not sufficient to use a segmentation scheme that draws
the boundaries between chunks at regular intervals. In the case of MP3
files, since ID3 tags are stored in a variable-length region of the
file, a change in metadata may affect all of the chunks because the
remainder of the file will now be ``out of frame'' with the original.
Likewise, a more recent version of a document may contain insertions
or deletions, which would cause the remainder of the document to be
out of frame and negate some of the advantages of fixed-length
chunking.

To solve this problem, we choose variable length chunks based on
content, using a chunking algorithm derived from the LBFS file
system~\cite{muthitacharoen01:_low_bandw_networ_file_system}. Due to
the way chunk boundaries are chosen, even if content is added or
removed in the middle of the file, the remainder of the chunks will
not change. \marnote{mention approximate chunk size?}  While most
recent networks, such as FastTrack, BitTorrent, and
eDonkey,\marnote{Cite FastTrack and eDonkey?} divide files into
chunks, promoting the sharing of partial data between peers,
our segmentation algorithm additionally promotes sharing of
data \emph{between files}.

%  Are there any papers on other P2P networks that chunk we can cite?

% Algorithm
%% When using this content distribution system, the index entry blocks of
%% the search index (described in Subsection
%% \ref{sec:searching:index-structure-and-algorithms}) contain the block
%% ID of the file block for the file to which they refer. The file block
%% itself simply contains a list of the block IDs of all the associated
%% chunk blocks. File and chunk blocks are inserted into the DHT using
%% block IDs derived from the hash of the file or chunk contents. This
%% ensures that the integrity of downloaded content can be easily
%% verified by computing its hash value and comparing to the block ID.
%% A chunk block contains a list of IP addresses of peers from which the
%% file can be obtained. The entries in this list will expire after a
%% fixed time, as it is possible for nodes to fail or leave the network
%% unexpectedly. However, since the list is provided on a best-effort
%% basis, clients should be robust in the event of being unable to
%% contact one of these peers.


\subsection{Content-Sharing Subrings}
\label{sec:content-distribution:subrings}

To download a chunk, a peer must discover one or more sources for this
chunk. A simple solution for this problem is to maintain a list of
peers that have the chunk available, which can be stored in the DHT or
handled by a designated ``tracker'' node as in
BitTorrent~\cite{bittorrent}. However, the node responsible for
tracking the peers sharing a popular chunk represents a single point
of failure that may become overloaded.

We instead use \emph{subrings} to identify sources for each chunk,
distributing the query load throughout the network. The Diminished
Chord protocol~\cite{karger04:_dimin_chord} allows any subset of the
nodes to form a named ``subring'' and allows \proc{Lookup} operations
that find nodes in that subring in $\bigO{\log n}$ time, with constant
storage overhead per node in the subring. We create a subring for each
chunk, where the subring is identified by the chunk ID and consists of
the nodes that are sharing that chunk. To obtain a chunk, a node
performs a \proc{Lookup} for a random Chord ID in the subring to
discover the address of one of the sources.  It then contacts that
node and requests the chunk. If the contacted node is unavailable or
overloaded, the requesting node may perform another \proc{Lookup} to
find a different source. When a node has finished downloading a chunk,
it becomes a source and can join the subring.  Content-sharing
subrings offer a general mechanism for managing data that may be
prohibitive to manage with regular DHTs.

\subsection{Postfetching}
\label{sec:content-distribution:postfetching}

To increase the availability of files, our system caches file
chunks on nodes that would not otherwise be sharing the chunks. Cached
chunks are indexed the same way as regular chunks, so they do not
share the disadvantages of direct DHT storage with regards to having
to maintain the chunks despite topology changes.  Furthermore, this
insertion symmetry makes caching transparent to the search system.
Unlike in direct storage systems, caching is non-essential to the
functioning of the network, and therefore each peer can place a
reasonable upper bound on its cache storage size.

% Passive caching
% Active caching
%  [I don't really like these names. Can we come up with better ones?]

% I think it just dawned on me why, in practice, queue lengths don't
% balance in most networks: Most networks maintain one big queue for
% all files that a peer has, instead of a queue per file.  Bittorrent
% doesn't have this issue because it works on a per-file basis (and
% uses something much more dynamic than a queue).  Would this be
% possible to prove somehow?
%
% In fact, this now seems really silly.  The idea was to for the
% downloader of a block to simultaneously push it to some cache node,
% who would then begin sharing it, but why not just make that cache
% node someone who wants that block?

\emph{Postfetching} provides a mechanism by which caching can increase
the supply of rare files in response to demand.  \emph{Request blocks}
are introduced to the network to capture requests for unavailable
files.  Due to the long expiration time of metadata blocks, peers can
find files whose sources are temporarily unavailable.  The peer can
then insert a request block into the network for a particular
unavailable file.  When a source of that file rejoins the network it
will find the request block and actively increase the supply of the
requested file by sending the contents of the file chunks to the
caches of randomly-selected nodes with available cache space. These in
turn register as sources for those chunks, increasing their
availability. Thus, the future supply of rare files is actively
balanced out to meet their demand.


%%% Local Variables: 
%%% mode: latex
%%% TeX-master: "thesis"
%%% TeX-command-default: "rubber"
%%% End: 

% LocalWords:  metadata CFS BitTorrent BitTorrent's trackerless Kademlia DHT
% LocalWords:  Azureus lookup KSS