% -*- TeX-master: "report.tex" -*-

% Problem Description

The main goal of InHome is to reduce external bandwidth usage by
satisfying web requests from the internal network of an organization.

The system setting consists of many computers or clients (ranging from
$1000$ to $10,000$ or even $100,000$) inside an organization. These
clients are connected with each other by fast network links. They are
also connected to outside entities by slower or more expensive
links. Thus, getting data from a client within the same organization
can be orders of magniture higher than getting the same data from an
external, remote server. Each client performs web requests for objects
such as web pages, music, video, or general files. Some of these
objects have already been requested by other clients inside the
network. InHome would like to fetch these objects from the local
clients rather than from the external origin server.
 

An ideal solution should have the following properties:
\begin{enumerate}
\item The clients should not see a significant increase in latency.
\item The clients should not store data they are not interested in.
\item The system should not require new hardware and maintenance. 
\item The system should be customizable for organization sizes.
\end{enumerate}

The first property of InHome is the most important; if InHome
significantly increases the latency of fetching web content, clients
may not be willing to use the system. As we will discuss in Section
\ref{sec:Search}, clients can tradeoff latency for bandwidth
reduction. If they are willing to wait more, there is a higher chance
that the desired data is retrived from the InHome local network which
results in higher reductions in external bandwidth usage.

The second property states that InHome should not require clients to
store data they are not interested in. The reason is that the content
from another client may be compromising (such as porn files) or
insecure (such as viruses). Furthermore, unnecessary transfer of
objects should be avoided so the local area network is not overwhelmed
by InHome traffic.

The third property is aimed at increasing adoption of the system. New
hardware requires additional expenditure, overhead of maintenance, and
time overhead of installing the needed hardware. A typical solution to
reduce external bandwidth traditionally used by organizations is to
install a centralized proxy for caching web data. Clients in the
organization first attempt to get their desired web content from the
proxy before falling back to the origin server. The problem with this
approach is the overhead of maintenance, the possibility of
centralized failure, and poor scalability. In contrast, InHome should
be a distributed peer-to-peer system that does not require additional
hardware.

The last property states that the system should be flexible enough to
be adapted to different settings.  Organization sizes vary from very
small, such as start-up companies with about $100-1000$ computers, to
very large, with $10,000-100,000$ computers such as a university
campus or large organization (e.g. Microsoft).

% One scenario in which a system such as InHome would be useful is a
% conference. In such a setting, within a small area, there are about
% $100 - 1,000$ computers that have wireless access via the few WAP-s
% located in the proximity. The DSL link outgoing from the WAP is a
% bottleneck because it can only support a limited number of connections
% simultaneously, which results in the usual bad latency experience at a
% conference. However, clients have high bandwidth links and low latency
% to communicate with each other using their ad-hoc wireless mode of
% operation. For example, a typical infrastructure data rate to an
% external server is of $\approx 10 Kbps$ while the ad-hoc connection is
% $\approx 11 Mbps$. InHome should remove the stereotypical traffic
% overhead at a conference such as the conference schedule, speakers'
% webpages, weather forecast at that location, etc. This can result in
% less bottleneck at the DSL links as well as better latency for the
% common content because it can be found at another client.


 
