\documentclass[11pt]{article}

\bibliographystyle{plain}
\usepackage{epsf}
\usepackage{graphicx}
\usepackage{fullpage}
\usepackage{lscape}
\usepackage{float}
\usepackage{amsmath}

\title{6.825 Project 3, Part 1: \\
Parameter Estimation and Scoring}

\pagestyle{plain}
\setlength\parskip{1ex}
\setlength\parindent{0em}

\begin{document}

\author{
  Eric Mumpower    \\
  Dan RK Ports
}

\date{November 18, 2005}

\maketitle

\section*{Task 1: Parameter Estimation}

The Bayes'-net parameters for the provided dataset B, given the
provided network graph, were estimated by our Parameter Estimation
implementation to be:

\input{estBcpts.tex}

\section*{Task 2: Scoring Using the Bayesian Information Criterion}

Here are the components of the Bayesian Information Criterion (BIC)
score we calcuated for our parameter estimation of dataset ``B'' given
the provided network graph:

\begin{tabular}{r|c|c|}
\cline{2-3}
$\log_2$-Likelihood & $l(\hat{\theta}_G : D)$ & -34285.24 \\
\cline{2-3}
Structural Penalty & $\frac{\log_2M}{2}$Dim$[G]$ & 98.30 \\
\cline{2-3}
BIC Score & $S(G : D)$ & -34383.54 \\
\cline{2-3}
\end{tabular}

Here, the magnitude of the log-likelihood of our data (given the
graph) is much larger than was computed for dataset ``A'', but this
may be plausibly explained by one of several factors.

If one makes the (unreasonable) assumption that all 5000 samples in
dataset ``B'' were equally likely, then they each have probability
$2^{-34285/5000} \approx 0.0086$. Given that dataset ``B'' has 8
variables, there are $2^8 = 256$ possible combinations of node values.
If every possible outcome were equally likely, they would each have
probability $2^{-8} \approx 0.0039$, which is not so different from
the ``average'' likelihood achieved by our fit to the data. Thus, it's
conceivable that such a large log-likelihood does not indicate a poor
fit.

If this data represents actual experimental measurements (or if it was
distorted by transformation with non-bayesian noise), then no fit is
going to be perfect. Given a particular quality of fit, the magnitude
of the log-likelihood will increase in proportion to the number of
data points. Thus, a great deal of noisy data would cause the
magnitude of the log-likelihood to be very large.

However, assuming this data was generated from samples made according
to the structure and parameters of some actual Bayes' network, it's
plausible that the assumed network structure is actually a poor match
to the original network, and that a more similar network structure
would yield a better log-likelihood.

This is all, however, conjecture; in order to better examine the
relative quality of our parameter estimation, we will need to perform
comparison with other network structures. Fortunately, that's exactly
what we'll be doing in the next part of this project.

\end{document}
