The power-of-two-choices technique can be applied directly to the
problem of performing routing table lookups in high-performance
network routers. These devices store a table mapping IP address
prefixes to the next routing hop, and must be able to quickly find the
next hop associated with the longest prefix matching a IP packet's
destination. This is typically accomplished by constructing a set of
hash tables, each containing all prefixes of a certain length, and
performing a binary search in order to find the longest matching
prefix
\cite{srinivasan99:_fast_addres_lookup_using_contr_prefix_expan}. The
goal is to minimize the number of memory transfers required to perform
a lookup by ensuring that the contents of hash buckets fit in a single
cache line. In the static, off-line case, this can be done by building
a slightly relaxed version of a perfect hash table that is allowed to
have a small number of entries in the same bucket.

Broder and Mitzenmacher applied the power-of-two-choices technique to
solve this problem dynamically using multiple hash functions
\cite{broder01:_using_multip_hash_funct}. The idea is straightforward:
use $d$ different hash functions, and always store new entries in
whichever hash bucket is the least full. Assuming ideal random hash
functions, the hashed values will be uniformly distributed, and so the
fundamental power-of-two-choices result of Theorem~\ref{thm:potc}
tells us that with high probability no bucket will be larger than
$\log_d \log n + \bigO{1}$. This can generally be made small enough to
fit in a single cache line. Moreover, the need to use $d$ different
hash functions and fetch the resulting buckets is not particularly
troublesome, since the $d$ hash functions can be computed in parallel
and then the appropriate locations fetched from memory in parallel or
pipelined.

A truly random hash function is impractical to use, however, so
typically a simpler hash function such as a 2-universal hash family is
used. A natural question is whether this suffices to give the results
described above. A theoretical analysis has not yet been given, but
simulation results show that simple hash functions work well on
examples of real IP routing data sets (which are non-adversarial). The
simulation showed that the maximum bucket size is very small; for
example, when hashing 32,000 entries to 32,000 buckets using only two
choices, no bucket contained more than 4 items, and only a small
fraction contained more than 2. This is an impressive result for a
technique far simpler than the offline generation of semi-perfect
hash functions.

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: "paper"
%%% End: