\documentclass{article}
\input{6046preamble}


\begin{document}
\pset{6-2}{Dan Ports}{No collab}{Recitation \#8}

\paragraph{}
Assume that we are given a pattern and text, both randomly generated such that each symbol in both is selected independently and unifromly at random from ${0,1}$. We show that the naive string-matching algorithm runs in expected time $\Theta(n)$. 
\paragraph{}
We will first show that the inner loop of the naive algorithm requires expected constant time to execute. The inner loop performs comparisons of $T[s+j]$ and $P[j]$ with $j$ from $1$ to $m$. It continues to perform comparisons until it has performed $j$ comparisons, or until it has found a pair that do not match. The sample space contains four possibilities for the pair $(T[s+j],P[j])$: $(0,0)$, $(0,1)$, $(1,0)$, and $(1,1)$. All of these are equally likely, since the pattern and text are randomly generated. Two of these four ($(0,1)$ and $(1,0)$) are not matches, so they will immediately stop the loop. If this is the only condition that can stop the loop, we know that the loop stops with probability $\frac{1}{2}$ at each iteration, so the expected number of iterations executed is $\frac{1}{\frac{1}{2}} = 2$. In fact, there is also the constraint that the loop will stop after $j$ iterations, so the expected number of iterations will be even less. Each iteration requires constant time to execute, so the inner loop requires expected $\Theta(1)$ time to execute.
\paragraph{}
Next, note that the outer loop runs a total of $n-m = \Theta(n)$ times. Each iteration of the outer loop requires constant time overhead plus whatever time is required to run the inner loop. The expected runtime of each run of the inner loop (for each iteration of the outer loop) is not independent; however, by linearity of expectation, we can add their expectations to find the expected runtime. Thus, running the procedure requires $\Theta(n)$ overhead plus $n$ times the runtime of the inner loop. This is $\Theta(n) + n\Theta(1) = \Theta(n)$.
\paragraph{}
The same result continues to hold if the pattern is fixed and the text is random. The only part of the analysis that differs is the probability that an element of the pattern and an element of the text will be the same. With both the pattern and text random, this probability is $\frac{1}{2}$. However, even with only the text random, the probability is still $\frac{1}{2}$ because exactly one of the two possible values of the text element will always match a fixed pattern element, and both values of the text element are equally likely. The rest of the analysis proceeds in exactly the same way to give the same result. The same result is possible if only the text is fixed; it only does not hold if both the text and pattern are fixed.
\end{document}

