Multi-class Classification & Neural Networks (Coursera ML class)
The third programming exercise in Coursera's Machine Learning class deals with one-vs-all logistic regression (aka multi-class classification) and an introduction to the use of neural networks to recognize hand-written digits. This is - by far! - the most interesting assignment yet.
Getting my head wrapped around the setup for this assignment took almost as much time as actually implementing the solution. In short, the data we're looking at is a subset of the MNIST handwritten digit dataset. We have 5000 training examples, each of which is an "unrolled" version of a 20 x 20 pixel grayscale image (ie a 400-dimensional vector). Each pixel is encoded as a grayscale intensity, and the "0" digit is labeled "10" for convenience with Octave vector indexing. We get a sense of what we're dealing with at first, by running some provided code that displays a random 10 x 10 array of training examples:
You can see that the clarity or messiness of the digits is all over the map, so to speak. This is very clearly a relevant opportunity for a learning algorithm!
We're asked to vectorize previous code, but if you were thinking that way the first time around, your code from previous exercises will already be vectorized. This saves a bunch of steps in this assignment. Next we add in regularization, but once more we leave the choice of lambda to Prof. Ng in the code that tests our solutions. I imagine (and hope!) that in exercises in the not-too-distant future, we'll be learning how to choose our own values of the regularization parameter.
Then we get to the business of implementing one-vs-all classification by training a regularized logistic classifier for one each of the K classes in the dataset (here, K=10). We also make use of some new techniques in this exercise (logical arrays, and the fmincg advanced optimization function). In the end, our trained algorithm is loosed upon the same dataset (not ideal or realistic, but ok...) and for each training example, outputs its prediction of the correct digit. On the given set of data, our algorithm correctly classifies about 95% of the training examples. Pretty good!
The second part of the programming exercise deals with neural networks (NN). Turns out NN are pretty complicated; I watched the video lectures on this a few times and still barely caught it. Since the algorithm is complex, it's split across assignments; in this one, we're only implementing feedforward propagation using a previously-trained set of parameters/weights (theta matrices). When we complete the forward propagation, the test code randomly chooses an entry from the MNIST 5000-example subset we have and displays it's guess at the value along with the actual image. Impressively, it's training accuracy is about 97%. Here are some examples of the test output (may have to "View Image" in a new tab to read the prediction. Spoiler: they're all correct.):
This exercise was great. Though there are still some core details that are being hidden from us (e.g. the training of this NN), this is starting to look like a legitimate machine learning exercise. The details of the code are pretty interesting, too, and I'm working on getting those files available. Stay tuned on that front!