\documentclass{article}
\input{6046preamble}


\begin{document}
\pset{7-4}{Dan Ports}{No collab}{Recitation \#8}

\begin{ppart}{Algorithm}
This algorithm is closely based on the close-pair algorithm from lecture. We will divide the space into a grid where each side has length $\sqrt{2}$, and hash each grid space into a bucket. Specifically, we maintain an array of $k$ buckets ($k = O(n)$), and use a universal hash function $h$ that maps $(\mathbb{R}\times\mathbb{R})\mapsto \{1\ldots k\}$. Then for every point $p_i=(x_i,y_i)$, we can compute its hash value $h(\lfloor\frac{x_i}{\sqrt{2}}\rfloor,\lfloor\frac{y_i}{\sqrt{2}}\rfloor)$. We iterate through the array of students, with $i$ from $1$ to $n$. For each $i$, we compute the hash value for the student's location $p_i=(x_i,y_i)$, and add him to the appropriate hash bucket. If the bucket contains 6 or more students, then we know that there must be one sociable student in the heavy bucket, unless it is a spurious hit caused by a hash collision. To ensure that this is not the case, we verify that there is a sociable student: for each student in the bucket, we check the distance to every other student in the bucket; if there exists some student such that the distance to at least five other students in the bucket is less than two meters, we have found a sociable student and we terminate immediately. Otherwise we continue adding students to buckets.
\paragraph{}
If we have placed every student in a hash bucket without finding a sociable student, then we iterate through the set of students, and for each student we check the distance from him to every student in his hash bucket and the hash buckets associated with the 24 grid cells that are 1 or 2 cells away from the student. If we find some student who is less than two meters away from at least five other students, then he is sociable and we terminate immediately. Otherwise we continue. If we have finished performing this check for each student, then there is no sociable student and we return false.
\end{ppart}

\begin{ppart}{Proof of correctness}
To justify the correctness of the algorithm, we first prove that if six students are in the same grid cell (which is equivalent to having six students in the same hash bucket, except in the case of a collision), there must be some sociable student. To see this, note that the grid sides have length $\sqrt{2}$. The furthest any two points can be away from each other is in the case where they are located at opposite corners diagonally, and in this case they are $\sqrt{2}\sqrt{2} = 2$ meters away from each other. Thus, each of the six students in the bucket are within two meters of each other. So a student in the bucket must be within two meters of five other students, meaning he is sociable.
\paragraph{}
Next we show that we need only consider the 25 neighboring buckets when searching for sociable students after filling the buckets. Any student who is not in one of these buckets must be at least $2 \sqrt{2}$ meters away, or they would be in one of the neighboring grid cells. Certainly $2 \sqrt{2} > 2$, so these students are more than two meters away, and we do not need to check them. Thus we need only check the students who are in the 25 hash buckets that are two grid cells or less away. (Actually, the 4 most distant of these 25 grid cells do not need to be considered, but this simplifies the description of the algorithm and does not affect the asymptotic runtime.)
\end{ppart}

\begin{ppart}{Runtime analysis}
This algorithm uses hashing, so it is randomized, and we will consider the expected runtime. Assume that computing the hash value requires constant time. Then, if we implement the buckets as a linked list, adding one student to the correct hash bucket requires constant time. If we keep track of the number of entries in the hash bucket, we can also check in constant time whether the bucket contains six students. Thus adding $n$ students to the appropriate hash buckets requires $O(n)$ time. However, if we ever find six students in the same hash bucket, we will need to verify that they are indeed sociable rather than being a spurious hit. This could potentially require many comparisons. However, the probability of a hash collision is low (the expected number is $\frac{n}{k}$). Thus, the expected runtime for this phase is still $O(n)$.
\paragraph{}
Next we show that scanning neighboring cells also requires $O(n)$ time. For each student, we want to scan 25 hash buckets. Assuming there are no collisions, each hash bucket will have no more than 5 students in it. Thus the total number of distances to be computed per student is less than 125, which is still constant. Thus applying this process of scanning neighboring cells for each of the $n$ students requires $O(n)$ time. It is of course possible that we will have hash collisions, which can increase the runtime of this step. However, again we note that the probability of a hash collision is low, so the expected number of students in each bucket is still $O(1)$, so the expected runtime is still $O(n)$. Thus the algorithm has overall expected $O(n)$ runtime.
\end{ppart}
\end{document}

