Recent years have seen a dramatic increase in the popularity of
peer-to-peer file sharing systems. These systems, which allow users to
locate and obtain files shared by other users, allow data to be
transferred more efficiently than the traditional client-server model
by utilizing peers' upload bandwidth and break the complexity barrier
of content publication. Moreover, they can be implemented without the
need for a central server. However, many current file sharing networks
trade-off scalability for correctness, resulting in either systems
that scale well but sacrifice completeness of search results or
vice-versa.

The unique abilities of distributed hash tables are ideal for
constructing peer-to-peer networks because they overcome the
difficulties of quickly and correctly locating peers.  However, the
lookup query primitives provided by DHTs are not immediately
sufficient to perform complex search queries of the data stored in the
network, nor it is clear how to construct interfaces for search
queries without sacrificing scalability or query completeness.

We will present the design for Arpeggio, a novel file-sharing system
that uses the indexing primitives of the Chord distributed hash table
to construct a system that facilitates the finding and sharing of bulk
data. This design retains many of the advantages of a central index
such as completeness and speed of queries, while providing the
scalability and other benefits of full decentralization.

By utilizing the properties of distributed hash tables, Arpeggio
provides guarantees on network scalability, the correctness of search
results, the speed of searches, and the availability of data. The
amount of network traffic scales approximately O(n log n) with the
number of files present in the network, which itself scales
approximately linearly with the number of users, thus achieving
scalability.  The system can consistently locate even rare files
scattered throughout the network, thus achieving perfect recall.
Searches generally require O(1) queries to be sent to the network, and
the response length is O(n) in the number of files that actually match
the query.  Multiple forms of caching are employed to improve file
availability by ensuring both that the supply of files in high-demand
is actively increased to meet demand and that rare files are made
available to those requesting them.