\documentclass{article}
\input {../exercise-preamble}
\begin{document}
\psetnum{3}
\date{2005/12/08}

\begin{pset}

  Using SVMlight and a test wrapper script\footnote{Python source code
    is available at
    \url{http://www.ambulatoryclam.net/svn/classes/6.825/proj3exercises/ex3}},
  we evaluated different kernels and classification error penalties on
  two different test sets. Specifically, we considered linear kernels,
  polynomial kernels with degrees 2, 3, and 10, and RBF kernels with
  $\sigma = \nicefrac{1}{2}$, $\nicefrac{1}{10}$, and
  $\nicefrac{1}{50}$.

  Table~\ref{tab:ex1} shows the results for the first example
  training/test set, and Table~\ref{tab:ex2} shows the results for the
  second example training/test set.

  In general, increasing $C$ causes the number of support vectors used
  to decrease. Increasing the degree of the polynomial kernel or
  decreasing the value of $\sigma$ in the RDF kernel increases the
  number of support vectors, presumably causing a danger of
  overfitting. Accordingly, increasing $C$ generally caused the test
  accuracy to increase in Example~1; in Example~2, the test accuracy
  was usually 100\% regardless of $C$.
  
  For the first example, the classification boundary is fairly simple,
  so all choices of kernel and $C$ provide high accuracy
  (97-99\%). The best results are obtained from polynomial kernels; a
  degree-3 kernel provides 98.90\% accuracy with $C=0.1$, and 98.87\%
  accuracy with $C=1.0$ or $C=20$, and a degree-2 kernel provides
  98.88\% accuracy with $C=20$. All provide essentially the same
  accuracy; though the degree-3, $C=0.1$ case provides slightly more
  accuracy than the others, the others might also be considered a bit
  more preferable because they use less support vectors (3 rather than
  6).

  For the second example, all kernels except for the linear kernel
  provide 100\% accuracy and no training misclassifications, and the
  value of $C$ does not have much of an effect. It isn't surprising
  that the linear kernel performs poorly, since there's no way to fit
  a straight line to the circular data set. Of the others, since the
  accuracy is perfect for each, we would presumably want to choose the
  simplest model, i.e. the one with the least support vectors. This is
  the degree-2 polynomial kernel, with any of the values of $C$.
  
  \begin{table}[bp]
    \caption{Training/Test Set 1}
    \label{tab:ex1}
    \begin{center}
      \begin{tabular}{|c|c|c|c|c|}
        \hline
        \textbf{Kernel} & \textbf{C} & \textbf{\# SVs} & \textbf{\#
          Training Misclass.} & \textbf{\% Test Accuracy} \\
        \hline
        \input{table1}
        \hline
      \end{tabular}
    \end{center}
  \end{table}

  \begin{table}
    \caption{Training/Test Set 2}
    \label{tab:ex2}
    \begin{center}
      \begin{tabular}{|c|c|c|c|c|}
        \hline
        \textbf{Kernel} & \textbf{C} & \textbf{\# SVs} & \textbf{\#
          Training Misclass.} & \textbf{\% Test Accuracy} \\
        \hline
        \input{table2}
        \hline
      \end{tabular}
    \end{center}
  \end{table}

\end{pset}

\end{document}
