6.825
Exercise 3: SVMs
Due Thursday December 8th at 5pm 
Please turn in to the box outside Tomas's door

This exercise asks you to go through some small experiments
with support vector machines. For more information on support 
vector machines, see the svm_links.html page also available
in this directory. The main idea here is to look at using
SVMs as a way to try to find a classifier to separate two
classes based on their feature values. For example, you might
want to classify whether an image is of a woman or a man
based on the 16x16 pixels that make up an image (and so here
the features are the pixels and the class is "man" or "woman").
In this exercise you are asked to look at learning a classifier
on a set of training examples, and then test this classifier
on a set of test data. We want you to explore how different
kernels will allow you to learn different sorts of classifiers:
a linear kernel only allows the classifier to learn a line
that separates the input examples, but other kernels allow
more complicated, nonlinear functions to separate the data.

Here we are using SVM light to run our experiments. See 
http://svmlight.joachims.org/ for more information on SVM light.
I have set it up in my Public athena account and you are welcome
to use it there or download and install it.

There are 2 example directories in my Public athena account, 6825 directory
http://web.mit.edu/emmab/Public/6825

ex1
ex2

Inside each is a training and testing set of data, as well as a picture
of the training data in a pdf file.

To train using svmlight, run

./svm_learn options  input-data-file output-model-filename
Useful options include
-t 0   		= linear kernel
-t 1 -d 2 	= poly kernel of degree 2
-t 2 -g 10	= RB kernel with sigma = 1/10
-c 1		= set C to be 1 (penalty of classification errors)

Note that the number of SVs and the misclassifications on the training
set, as well as the C used (if you don't set it) will be written to stdout

To classify a test set and check accuracy, run
/svm_classify test-filename model-filename prediction-filename

Note that the accuracy level will be written to stdout

** Important note: You can either run the code directly from my
Athena public directory from the command line, or you can download
and install SVM light. If you choose to run the code in my Public
directory, then your commands will look something like

./svm_learn options  input-data-file ~/...output-model-filename
/svm_classify test-filename ~/...model-filename ~/...prediction-filename

(Aka you can run the executables in my Public directory but
you don't have write permission to my directory, so the model
and prediction files you create must be written back to your own
directory)


For each example:

Pick 3 different kernels, train and test and report results of
	how many SVs there were
	how many misclassifications on the training set
	Percent accuracy on the test set
For at least 1 kernel (do for all 3 if you have time), choose 3 
values of C (such as C=0.1, C=1, C=20, but you can pick whatever 
you want-- just try to look and see if there are interesting 
differences), and again train, test and report results
as above. Please report all these results in a table form similar 
to the following so that it is easy to compare the between kernel 
and C values:

example table:

Kernel	    C    # SVs   # Misclassifc on Training    % Test Accuracy
poly 2      1      15                     0                 97%
rbf sig=.5  2      ...


Also please answer the following questions (briefly):

As you change C does this alter the number of support vectors? 
As you change the kernel does this change the number of support vectors? 
How does changing C affect the number of training misclassification 
   and test accuracy rate in these 2 examples (if at all)?
How does changing the kernel affect the test accuracy and the number
   of training misclassifications?
For each example, also state:
- What you thought the best kernel and value of C was out of your experiments
- Why do you pick the kernel you did? (generalization as reflected by test
  accuracy, number of support vectors, etc.)