[HTML reformatted by RAM -- 04/01/2001]
Ultrapeers: Another Step Towards Gnutella Scalability
Working Draft
Anurag Singla
Christopher Rohrs
Lime Wire LLC
December 18, 2001
{anurag, crohrs}@limewire.com
Abstract: We propose a scheme to have a hierarchical gnutella
network by categorizing the nodes on the network as client-nodes and
super-nodes. A client-node keeps only one connection open, and that is
to a super-node. A super-node acts as a proxy to the gnutella network
for the client nodes connected to it. This has a effect of making
the gnutella network scale, by reducing the number of nodes on the
network involved in message handling and routing, as well as reducing
the actual traffic among them.
1. Introduction
The Gnutella network is a peer-to-peer network, all the hosts on the
network are treated equal regardless of their bandwidth capabilities or
computation power or other properties (like uptime, connection capability
etc). Nodes on the network communicate using either of Gnutella Protocol
0.4 [PROT04], or an enhanced version that uses new Connection Handshaking
mechanism [GC01], both of which have many interesting properties. New
hosts can join any time, and the current hosts on the network may leave
the network any time. The ability to have a dependable network, without
dependence on any particular host (or set of hosts), is a remarkable
feature of gnutella that has led to its immense popularity. Along with
this there are many other remarkable features of gnutella network
[WHYGNUT01] that distinguish it from other systems.
But along with all the good features that gnutella possesses, there have
always been doubts cast upon the scalability of gnutella network. That?s
primarily because of the overabundance of messages flowing thru the
gnutella network including broadcast pings and queries, as well as the
inability to assure content availability close enough to the searching
host. Queries are sent to large number of hosts in an attempt to secure
a hit from few of them. This has led to bandwidth problems in the past
as well as in the present.
Gnutella developers have taken a number of very promising initiatives to
reduce the number of messages flowing in the gnutella network, by making
the methods more directed as well as carry more information. Notable among
these are Ping Pong Scheme Proposal [CV01] which reduces the number of
messages by eliminating the ones not contributing much to the system,
Meta Information Search Proposal (S01) which attempts to make search
more specific, and Query Routing Proposal [C00] which reduces the query
traffic by doing selective broadcast.
Although the above initiatives are a promising step towards making the
network efficient, there definitely exist other supplementary schemes,
one of which is the scheme proposed in this paper that has a major
potential of making network scalable and efficient. We propose a scheme
which if implemented will create a two level hierarchy of nodes in the
gnutella network, where faster nodes (ones having better networking
capabilities and CPU power) take charge of much of the load on slower
nodes. In the proposed scheme this restructuring of the network takes
place automatically and dynamically, and works in unison with the other
initiatives mentioned above. The restructured network will have a effect
of significant reduction in the number of messages flowing thru the
gnutella network.
At the same point, we would like to mention that the concept of
the ultrapeers is not new, and has been in existence in variety of
applications (including proxy access to web which has static ultrapeer
selection, as well as distributed transaction processing which has
dynamic ultrapeer selection algorithm). Also it has been applied to
recent peer-to-peer applications using FastTrack [FTRACK], or other
such similar technologies. The application of the ultrapeer concept in
Peer-to-Peer application has been done using proprietary algorithms and
protocols. Our contribution in this paper is to come up with a scheme
to implement the dynamic ultrapeer selection mechanism in the Gnutella
network. No such scheme has been published as yet. Our approach may or
may not be similar to the proprietary ones in existence, with which we
have no way to compare.
We present our scheme as follows. First we describe how ultrapeers work
in an ideal network with a static topology. In doing so, we discuss
options for ultrapeers to index [xyz1]leaf nodes. Next we tackle the
more difficult problem of ?ultrapeer election?, i.e., of dynamically
constructing a tiered network. We describe a handshaking mechanism
based on the recent Gnutella 0.6 protocol in detail and the concept
of ?ultrapeer guidance?. Finally, we conclude with some performance
observations and directions for future work.
2. Routing with Ultrapeers
Nodes in this network can be divided into leaf nodes and ultrapeers.
Leaf nodes maintain only a single connection to a ultrapeer. Ultrapeers
maintain many (10-100) leaf connections, as well as a small (<10)
number of connections to other ultrapeers. Ultrapeers shield leaf nodes
from virtually all ping and query traffic. This can be done in one of
two ways:
* Reflector indexing: in the style of the Clip2 Reflector[xyz2],
ultrapeers periodically send an indexing query (four spaces,
i.e., ? ?) to leaf nodes. Leaf nodes respond with a query
reply naming all shared files. (The leaf can send multiple query
reply messages if sharing more than 255 messages.) The ultrapeer
uses these replies to build an index of its leaves? contents.
The ultrapeer then responds to queries on behalf of leaf nodes,
shielding them from all query traffic.
* QRP routing: leaf nodes periodically send ultrapeers query
routing tables using LimeWire?s proposed Query Routing Protocol
[xyz3](QRP). Ultrapeers then forward queries only to those leaf
nodes whose QRP table has a corresponding entry. Leaves respond
in the normal manner. Ultrapeers do not propogate QRP tables
amongst themselves.[1]
Hybrid schemes are of course possible. However, we recommend QRP routing
for several reasons[xyz4]:
* Better QHD support: because leaf nodes are given the chance
to respond to all queries, they can provide up-to-the-minute
information in the QHD, such as estimated download speed and
busy status.
* Update efficiency: because QRP tables eliminate redundant keywords
and add compression, they are likely to be significantly smaller
than replies to indexing queries. Furthermore, a leaf node only
needs to send incremental QRP updates when its shared files
have changed. In contrast, a reflector-style ultrapeer would
need to periodically re-index all of the leaves? files-this takes
more bandwidth.
* CPU efficiency: it is very cheap (constant time) for a ultrapeer
to decide whether to forward a query to a leaf node with the
QRP proposal. The reflector-style scheme can be implemented
efficiently, but it is considerably more difficult.
* Privacy: ultrapeers do not actually know what files are shared by
leaf nodes-only those files? hashes.
* Ease of implementation: LimeWire has already implemented QRP,
and the code is available to the public under the GNU Public
License (GPL). Maintaing a single QRP table per connection is
easier to implement than building a ?virtual file manager? for
all connections.
A more difficult question is how ultrapeers communicate amongst
themselves. Initially we recommend simple broadcast as with the current
Gnutella protocol. Eventually ultrapeers may use more complicated
protocols. One idea is for a ultrapeer to not forward a query if it
receives a large number of replies from its leaf nodes. QRP could also
be used between ultrapeers.
The proposed scheme is backward compatible with older clients. From the
perspective of newer clients, old clients are simply ultrapeers that
do not maintain any shielded leaf connections. The scheme benefits the
network even in the presence of old clients, though of course it would
be better if all clients supported ultrapeers.
3. Ultrapeer Election
Now we return to the question of how the ultrapeer-leaf hiearchy is
created in a distributed manner. Hosts regularly determine whether they
are eligible to be ultrapeers (?ultrapeer capable?) by looking at uptime,
operating system, bandwidth, etc. We recommend the following requirements
for all ultrapeers:
* Not firewalled. This can usually be approximated by looking at
whether the host has received incoming connections.
* Suitable operating system. Some operating systems handle large
numbers of sockets better than others. Linux, Windows 2000/NT/XP,
and Mac OS/X will typically make better ultrapeers than Windows
95/98/ME or Mac Classic.
* Sufficient bandwidth. We recommend at least 15KB/s downstream and
10KB/s upstream bandwidth. This can be approximated by looking
at the maximum upload and download throughput.
* Sufficient uptime. Ultrapeers should have long expected uptimes.
A reasonable heuristic is that the expected future uptime is
proportional to the current uptime. That is, nodes should not
become ultrapeers until they have been running for several minutes.
It is important to distinguish between hosts that are ultrapeer capable
and hosts that are actually ultrapeers:
* Ultrapeer: hosts that are ultrapeer capable and not a shielded leaf
node to an ultrapeer. Note that an ultrapeer does not necessarily
have leaf connections.
* Leaf: hosts that have only a single, routed connection to an
ultrapeer. Note that leaves may actually be ultrapeer capable.
When hosts connect, they express their capabilities and intents using
Gnutella 0.6 [xyz5] connection headers. In some cases, this will require
some negotiation to prevent too many nodes from becoming ultrapeers.
There are five possible cases, shown below.
3.1. Leaf to Ultrapeer
The simplest case is a leaf node connecting to a ultrapeer. The key
header here is "X-Ultrapeer", which tells whether a host plans on acting
as a ultrapeer (if true) or a shielded node (if false). When this
interaction is over, the leaf is a shielded node of the ultrapeer.
The leaf should drop any Gnutella 0.4 connection it has and send a QRP
routing table.
Client
Server
GNUTELLA CONNECT/0.6
X-Ultrapeer: False
User-Agent: LimeWire 1.9
X-Query-Routing: 0.1
X-My-Address: 10.254.0.16:6349
GNUTELLA/0.6 200 OK
X-Ultrapeer: True
X-Ultrapeer-Needed: false
User-Agent: LimeWire 1.9
X-Try-Ultrapeers: 23.35.1.146:6346,
18.207.63.25:6347
X-Try: 24.37.144:6346, 193.205.63.22:6346
X-My-Address: 10.254.0.16:6346
X-Query-Routing: 0.1
GNUTELLA/0.6 200 OK
The meaning of the other headers are as follows:
User-Agent: the identity of the vendor
X-Ultrapeer-Needed: used to balance the number of ultrapeers.
This is discussed in detail in section 5 below.
X-Query-Routing: the version of query routing to be used between leaf
and ultrapeer, IF a shielded leaf/ultrapeer connection is established
X-My-Address: the host's address and port reported in query replies
and pongs
X-Try: addresses of non-ultrapeer hosts to try if the connection
is lost
X-Try-Ultrapeers: addresses of ultrapeer to try if the connection
is lost
These headers are sent in almost all interactions. However, for clarity,
non-essential headers will be ommitted in the remaining examples.
It is important to note that headers can be sent in any order. Also,
case is ignored in "True" and "False".
3.2. Leaf to Shielded Leaf
The next case we consider is a ultrapeer-incapable node A connecting to
a leaf node B who happens to have a connection to a ultrapeer S. In this
case, B will refuse to accept the connection, redirecting A instead to S.
Note that B does not send "200 OK" in its response.
GNUTELLA CONNECT/0.6
X-Ultrapeer: False
GNUTELLA/0.6 503 I am a shielded leaf node
X-Ultrapeer: False
X-Try-Ultrapeers: 18.2.3.14:6346,
18.1.17.2:6346
[terminates connection]
At this point, A should should try to establish a connection to S,
as in case (1) above.
[Note by RAM: should send a "204" error here, since there's no content]
3.3. Leaf to Unshielded Leaf
Sometimes nodes will be ultrapeer-incapable but unable to find
a ultrapeer. In this case, they behave exactly like old, unrouted
Gnutella 0.4 connections.
GNUTELLA CONNECT/0.6
X-Ultrapeer: False
GNUTELLA/0.6 200 OK
X-Ultrapeer: False
GNUTELLA/0.6 200 OK
The original ultrapeer proposal involved two TCP connections in this case.
This version is much simpler.
3.4. Ultrapeer to Ultrapeer
When two ultrapeers meet, both set X-Ultrapeer true. If both have leaf
nodes, they will remain ultrapeers after the interaction. Note that
no QRP route table is sent between ultrapeers after the connection
is established.
GNUTELLA CONNECT/0.6
X-Ultrapeer: True
GNUTELLA/0.6 200 OK
X-Ultrapeer: True
GNUTELLA/0.6 200 OK
Note that the X-Ultrapeer-Needed header is ignored in this case. This is
discussed in the next section.
3.5. Ultrapeer to Ultrapeer, with Leaf Guidance
Sometimes there will be too many ultrapeer-capable nodes on the network.
Consider the case of an ultrapeer A connecting to an ultrapeer B.
If B doesn't have enough leaves, it may direct A to become a leaf node.
If A has no leaf connections, it stops fetching new connections, drops
any Gnutella 0.4 connections, and sends a QRP table to B. (If A has
leaf connections, it obeys the guidance, as in the above case.) Then B
will shield A from all traffic.
GNUTELLA CONNECT/0.6
X-Ultrapeer: True
GNUTELLA/0.6 200 OK
X-Ultrapeer: True
X-Ultrapeer-Needed: false
GNUTELLA/0.6 200 OK
X-Ultrapeer: False
The original ultrapeer proposal involved two TCP connections in this case.
This version is much simpler.
----------------------
[1] Readers familiar with the QRP protocol will note that only some of
the protocol is actually used. In particular, all route table entries
will be limited to INFINITY or 1.
[xyz1] I don't like using this word. It's not accurate.
[xyz2] Cite...except it doesn't exists any more.
[xyz3] Expand this section...perhaps discuss XML.
[xyz4] Bullet points over-simplifies here. For example up-to-dateness
and efficiency are related. There are tradeoffs.
[xyz5] Cite this.