As a starting point for the class, you should have a good enough understanding of Python and numpy to work through the basic task of classifying MNIST digits with a one-hidden-layer MLP.

**The due date for the assignment is Thursday, January 15, 2015. You’re not required to hand in anything. However, it is highly recommended that you complete it, as some of the material that will be covered in the next few weeks will build on the assumption that you’re comfortable with this assignment.**

### Refresher: MNIST

The MNIST dataset contains labeled images of handwritten digits. Images have size 28 x 28 pixels and are divided into a training set containing 60,000 examples and a test set containing 10,000 examples. Here are some examples:

You can download the dataset as follows:

wget http://deeplearning.net/data/mnist/mnist.pkl.gz

In this version of the dataset, the training set has further been split into a training set containing 50,000 examples and a validation set containing 10,000 examples.

Once unzipped and unpickled, the file contains a tuple with three elements for the training, validation and test set respectively. Each element is itself a tuple containing a matrix of examples and a -dimensional vector of labels (where is the number of examples in this specific set). Features in the example matrix are floats in [0, 1] (0 being black, and 1 being white), and labels are integers ranging between 0 and 9.

You can load the data like so:

import cPickle import gzip with gzip.open('mnist.pkl.gz', 'rb') as f: train_set, valid_set, test_set = cPickle.load(f) # Whatever else needs to be done

### Refresher: MLP

A multilayer perceptron (MLP) produces a prediction according to the following equations:

Here, is interpreted as a vector of probabilities, i.e. is the probability that example belongs to class and is normalized such that .

Given a one-hot encoded target (i.e., where is the number of classes, and ), the loss function is definded as

### The task

For this assignment, you are asked to implement a one-hidden-layer MLP and train it on MNIST using Python and numpy. You’ll need to do the following:

- Using the backpropagation principle, find the derivatives of the loss function with respect to parameters .
- Using what you derived in step 1, find an expression for the derivative of the loss function with respect to parameter vectors and matrices . This is a good thing to do, because numpy has abstractions to represent dot products between vectors, vector-matrix products, elementwise operations, etc. that rely on heavily optimized linear algebra libraries. Coding up all these operations using for-loops would be both inefficient and tedious.
- Implement the code necessary to compute gradients using the expressions derived in step 2 for a pair of feature and target vectors and .
- Using the finite differences method, implement a numerical version of the code that computes the gradients, and compare the analytical gradients and the numerical gradients using a toy example (e.g. a random pair of feature and target vectors for an MLP with 10 input dimensions, 5 hidden units and 2 categories). A mismatch between the two indicates an error in the implementation of the analytical gradients. The
**numpy.testing.assert_allclose**method will be handy for that. - Generalize your code to work on minibatches of data. Make sure that numerical and analytical gradients still check out on a toy example.
- Implement a simple minibatch SGD loop and train your MLP on MNIST. You should be able to achieve under 5% error rate on the test set.