\documentclass{article}
\input{6046preamble}


\begin{document}
\pset{6-1}{Dan Ports}{No collab}{Recitation \#8}

\begin{ppart}{Part a}
\paragraph{}
We can solve the on-line string matching problem using the Rabin-Karp algorithm. We will show that no non-trivial modifications to the algorithm are required to identify matches with constant probability of correcness, using $\Theta(m)$ time for preprocessing and $\Theta(1)$ time per character. Since the analysis depends on the size of the alphabet $\Sigma$, we assume (without loss of generality) that the characters are the digits $\{0, 1, 2, \ldots , d-1\}$, with $d = \abs{\Sigma}$. We also assume that the alphabet $\Sigma$, its size $d$, the length of the string $n$, and the pattern $P$ and its length $m$ are known in advance. 
\paragraph{}
As part of the preprocessing, the algorithm must compute a suitable prime number $q$. We assume the correctness of the prime-determining algorithm given in Lecture 16, which finds $q$ in time polynomial in $\log(n)$. The other preprocessing required is computation of the hash value of the pattern, which requires $\Theta(m)$ time. So the total preprocessing time required is $\Theta(m + \log^c n)$, for some constant $c$.
\paragraph{}
We keep track of the number $i$ of characters already received; this counter is initialized to zero and incremented by one each time a character is received, requiring constant time. We also store the incoming characters in an array $T[1\ldots n]$. Before the Rabin-Karp algorithm can begin identifying matches, it must receive the first $m$ characters of the text. Thus, while $i < m$, we build the initial hash value $t_0$ using the recurrence $t_0 \leftarrow (d t_0 + T[i])$ mod $q$ ($t_0$ is initialized to 0 when the algorithm begins). This is taken directly from CLRS's Rabin-Karp-Matcher procedure; it requires constant time for each received character.
\paragraph{}
After the first $m$ characters are received and the initial hash value $t_0$ is computed, Rabin-Karp can begin to check for matches with each following character. When the $i$th character is received, we let $s$ = $n-i$. We compute the hash value $t_s$ using the relation $t_s \leftarrow (d(t_s - T[s+1]d^{m-1} + T[i]$ mod $q$ (assuming $s \neq 0$; if $s = 0$, we just use the value $t_0$ computed in the previous paragraph's procedure). Note that this computation requires constant time, and uses only the $i$th and $s+1$th elements of T; it depends only on values already received. If the previously computed hash value $t_s$ equals the hash value $p$ of the pattern, we identify the shift $s$ as a possible location of a match and output it. We repeat this process for each incoming character until the end of the input string.
\paragraph{}
This algorithm requires $\Theta(m + \log^c n)$ preprocessing time, and constant time for processing each received character. Since it requires only trivial modifications to the Rabin-Karp procedure as presented on CLRS p. 914 or in the lecture notes, the proof of correctness follows from the correctness proof given there. The output is correct except in the case of a spurious hit; we know from the lecture notes that for each $s$ the probability of such a spurious hit is small: it is $\frac{1}{2n}$.
\end{ppart}

\begin{ppart}{Part b}
\paragraph{}
If the text is being broadcast in reverse, we simply reverse the pattern and apply the same algorithm. This provides a solution with the same running time and correctness characteristics.
\paragraph{}
To justify that reversing the pattern gives the correct results when the text is broadcast in reverse, we note that there is an occurrence of $P$ in $T$ if and only if $T[i ~\ldots~ i+m-1] = P[1~ \ldots~ m]$ for some $i$. If we let $P'$ and $T'$ be the reversed pattern and text, respectively, then the Rabin-Karp algorithm above will identify matches between them: every index $i$ such that $T'[i ~\ldots~ i+m-1] = P'[1~ \ldots~ m]$. Since $T'$ is the reversal of $T'$, $T'[i] = T[n-i+1]$ and similarly $P'[i] = P[m-i+1]$. Therefore, for every index $i$, we have found locations where $T[n-i-1 ~\ldots~ n-i-m] = P[m~ \ldots~ 1] \implies T[n-i-m ~\ldots~ n-i-1] = P[1~ \ldots~ m]$. If we let $j = n-i-m$, then we have found locations such that $T[j~\ldots~j+m-1] = P[1~\ldots~m]$, which is precisely the definition of a match at index $j$. Thus we have shown that the matches identified by our algorithm using the reversed pattern are exactly the correct matches in the forward order.
\end{ppart}

\end{document}

