\documentclass{article}
\input {../exercise-preamble}
\begin{document}
\psetnum{2}
\date{2005/11/22}

\begin{pset}

  We begin by implementing the value iteration algorithm\footnote{Python source code for the
    implementation is available at
    \url{http://www.ambulatoryclam.net/svn/classes/6.825/proj3exercises/ex1}},
  and running it for a few iterations. The results are shown below.

  A few minor implementation notes: we assume that if a move, either
  intentional or unintentional, moves into a wall (e.g. going west
  from state 1), the agent stays in the same state just as if it had
  done nothing. Also, we do not calculate utilities for taking actions
  from state 3, since it is a termination state; we assume that the
  expected utility of moving to that state is always the reward
  function $R(3) = 1$. 
  
  \begin{center}
    \begin{tabular}{|c||c||c|c|c|c|c||c|c|c|}
      \hline
      t & State & North & East & South & West & Nothing & Best Action &
      Exp. Util. & Updated Util.\\
      \hline
      \input{table}
    \end{tabular}  
  \end{center}

  After three iterations, we see that the best policy is to move south
  when in state 6, and east otherwise.

  \paragraph{Extra credit.}
  The state utilities converge to the values shown below after 12
  iterations. Though the correct policy is reached after only three
  iterations, the state utilities do not converge to within
  $\epsilon=0.01$ until the 12th iteration.

  \begin{center}
    \begin{tabular}{|c||c||c|c|c|c|c||c|c|c|}
      \hline
      t & State & North & East & South & West & Nothing & Best Action &
      Exp. Util. & Updated Util.\\
      \hline
12 & 1 & 0.729882 & 0.870637 & 0.775509 & 0.767229 & 0.769764 & East & 0.870637 
& 0.769766 \\
12 & 2 & 0.833938 & 0.985647 & 0.884683 & 0.778434 & 0.884661 & East & 0.985647 
& 0.884661 \\
12 & 3 & - & - & - & - & - & - & 1.000000 & 1.000000 \\
12 & 4 & 0.724528 & 0.819891 & 0.770155 & 0.721602 & 0.719067 & East & 0.819891 
& 0.719072 \\
12 & 5 & 0.828268 & 0.929208 & 0.879013 & 0.732807 & 0.828278 & East & 0.929208 
& 0.828279 \\
12 & 6 & 0.931839 & 0.940425 & 0.988278 & 0.842314 & 0.937290 & South & 0.988278
 & 0.937290 \\
\hline
\end{tabular}  
  \end{center}

\end{pset}

\end{document}
