Why not use Bittorrent?

Our methodology draws far more from older P2P systems like Gnutella, <insert more>, etc. -- many of which have now been overtaken by Bittorrent. While Bittorrent has made it much easier to download large files, it has a couple of disadvantages that make it unsuitable for our goal.

- Centralized Trackers. In its original form, Bittorrent requires centralized tracking servers to keep track of which clients possess specific data. Thus, infrastructure support would be required by any organization deploying a Bittorrent-based solution. In addition, recent history has shown that legal organizations target such file trackers to stem copyright violations.  <finish>
- High Latency. Bittorrent downloads typically have a setup period on the order of many seconds to a minute -- acceptable for large file downloads but ill-suited for large sequences of small caching transactions.
- Bandwidth. Bittorrent is especially suited for situations where many people might possess parts of a file, but connections to each have low throughput. Because our system is specifically designed for local area networks, it makes just as much sense to download the file from one person than to try coordinating many downloads. <questionable point, but..>

We note that there have been recent systems and analyses covering the benefits of local peer selection for Bittorrent. These systems are described in more detail in previous work.

Previous Work and Alternatives:

Local area schemes for reducing intra-ISP traffic is hardly a new idea. Here are some alternative solutions that have been proposed or implemented in the past.

Centralized Web Cache: A simple alternative would be to maintain a centralized web cache on a proxy server; indeed, this is a solution already employed by many companies. Our solution is far less expensive and doesn't require maintenance, however.

Distributed File System: Much work has been done in created distributed file systems over local networks. A solution of this nature would ask every computer on the local area network to donate some amount of storage space (e.g. 100 MB each) to maintain copies of the most commonly requested data. Our solution has two advantages over these types of systems: (1) Clients in our system only contain data that they're interested in (2) No implicit trust is assumed to exist between clients in our system.


Feasibility:

Clearly, the the benefits of this system are limited by how similar the data that clients contain is...

We analyzed two sets of traces in order to answer this question. The first set was collected from students and faculty computers at UC Berkeley in November, 1996 through the Home IP service. This dataset contained complete records of internet usage, anonymizing client and server IPs via MD5. Analyzing 95,768 requests from 916 unique clients over a four hour period, we found that 17,394 of the requests were for data already stored on another node in the network. Note, however, that these traces were collected from client computers, and thus include requests served by the local browser cache. Eliminating those, we find that this figure is 24.3% of all data requests.

The second set of traces requested was from the IRCache project. While this represents a reasonable approximation <finish>

We've also constructed a statistical model for trace generation and analysis. Prior work has shown that a Zipf model accurately models user web behavior. We use a baseline of $N = 50000$ sites. Note that