Important upcoming dates

As we discussed in class yesterday, here are the important dates for the remainder of our course.

  • April 9th, 2015: Spotlight project presentations. The presentations should each be a max of 4 minutes (3 slides max.) and should highlight your most significant contributions.
  • April 13th, 2015: Final exam. 14h30-17h30 (3hours). Same room as our regular Monday class (Z210). Closed book but you are allowed 1 8.5×10 “cheat-sheet” (both sides)  for notes. The notes must be hand-written.
  • April 20th, 2015: Course Project due date. All project blog should be upto date and finalized by this date.

Leaderboard for the class project

You are encouraged to publicize your results on the class project through the newly-created leaderboard wiki, which you can find here.

The wiki is editable by anybody with a Github account, so feel free to add your results by yourself. Here are the instructions, taken from the wiki:

Use this wiki to publicize your results on the Dogs vs. Cats class project.

Every time you get a better result on the challenge, you can insert an entry to the list below (at the appropriate place please, so that the list stays sorted by test error rate). Make sure to include a link to a blog post detailing how you achieved that result. The format for entries is as follows:

<test error rate> (<train error rate>, <valid error rate>): <short description> (<link to blog post>)

Class project starts now!

The Pylearn2 implementation of the Dogs vs. Cats dataset is complete, which means that you now have everything you need to start training models for the class project.

This post details where to find and how to use the dataset implementation along with Pylearn2.

Requirements

  • Pylearn2 and its dependencies
  • PyTables

Getting the code

You can find the code for the dataset in Vincent Dumoulin’s repository.

Clone the repo in a place that’s listed in your PYTHONPATH, and you’re ready to go.

Getting the data

N.B.: This part is only required if you’re working on your own machine. The data is already available on the LISA filesystem.

Downloading the images

First, you’ll need to make sure your PYLEARN2_DATA_PATH environment variable is set (e.g. through an export PYLEARN2_DATA_PATH=<path_of_your_choice> call in your .bashrc if you’re on Linux). This is where Pylearn2 expects your data to be found.

Create a dogs_vs_cats directory in ${PYLEARN2_DATA_PATH}.

Finally, download and unzip the train.zip file under ${PYLEARN2_DATA_PATH}/dogs_vs_cats directory (many thanks to Kyle Kastner for making it available without having to go through the whole Kaggle signup process!)

Generating the HDF5 dataset file

Once the images have been downloaded, unzipped and placed into the ${PYLEARN2_DATA_PATH}/dogs_vs_cats directory, run

python ift6266h15/code/pylearn2/datasets/generate_dogs_vs_cats_dataset.py

This will create an HDF5 file under ${PYLEARN2_DATA_PATH}/dogs_vs_cats/train.h5 which contains the whole training set. This may take some time. The file should weight around 11 gigabytes.

Instantiating and iterating over the dataset

We’re going to use the ift6266h15.code.pylearn2.datasets.variable_image_dataset.DogsVsCats subclass of Dataset.

The dataset constructor expects three arguments: an instance of a ift6266h15.code.pylearn2.dataset.variable_image_dataset.BaseImageTransformer subclass, and optionally a starting and stopping index specifying what slice of the whole dataset to use.

The BaseImageTransformer subclass is responsible for transforming a variable-sized image to a fixed-sized one through some sort of preprocessing and is used by the dataset to construct batches of fixed-sized examples.

There is currently only one subclass implemented, ift6266h15.code.pylearn2.dataset.variable_image_dataset.RandomCrop, which scales the input image so that its smallest side has length scaled_size, and takes a random square crop of dimension crop_size inside the scaled image (both scaled_size and crop_size are constructor arguments).

Here’s how we would instantiate and iterate over the dataset:

from ift6266h15.code.pylearn2.datasets.variable_image_dataset import DogsVsCats, RandomCrop
dataset = DogsVsCats(
    RandomCrop(256, 221),
    start=0, stop=20000)
iterator = dataset.iterator(
    mode='batchwise_shuffled_sequential',
    batch_size=100)
for X, y in iterator:
    print X.shape, y.shape

Note that by default the dataset iterates over both features and targets.

Here’s how you would use the dataset inside a YAML file to train a linear classifier on the dataset:

!obj:pylearn2.train.Train {
    dataset: &train !obj:ift6266h15.code.pylearn2.datasets.variable_image_dataset.DogsVsCats {
        transformer: &transformer !obj:ift6266h15.code.pylearn2.datasets.variable_image_dataset.RandomCrop {
            scaled_size: 256,
            crop_size: 221,
        },
        start: 0,
        stop: 20000,
    },
    model: !obj:pylearn2.models.mlp.MLP {
        nvis: 146523,
        layers: [
            !obj:pylearn2.models.mlp.Softmax {
                layer_name: 'y',
                n_classes: 2,
                irange: 0.01,
            },
        ],
    },
    algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
        batch_size: &batch_size 100,
        train_iteration_mode: 'batchwise_shuffled_sequential',
        batches_per_iter: 10,
        monitoring_batch_size: *batch_size,
        monitoring_batches: 10,
        monitor_iteration_mode: 'batchwise_shuffled_sequential',
        learning_rate: 1e-3,
        learning_rule: !obj:pylearn2.training_algorithms.learning_rule.Momentum {
            init_momentum: 0.95
        },
        monitoring_dataset: {
            'train' : *train,
            'valid': !obj:ift6266h15.code.pylearn2.datasets.variable_image_dataset.DogsVsCats {
                transformer: *transformer,
                start: 20000,
                stop: 25000,
            },
        },
        cost: !obj:pylearn2.costs.cost.MethodCost {
            method: 'cost_from_X',
        },
        termination_criterion: !obj:pylearn2.termination_criteria.EpochCounter {
            max_epochs: 10
        },
    },
}

This YAML file trains a softmax classifier for 10 epochs using the first 20,000 examples of the training set. An epoch consists of 10 batches of 100 random examples. Monitoring values are approximated with 10 batches of 100 random examples.

Implementing a BaseImageTransformer subclass

In order to implement your own preprocessing of the images, your BaseImageTransformer subclass needs to implement two methods: get_shape and preprocess.

The get_shape method needs to return the width and height of preprocessed images.

The preprocess method does the actual preprocessing. Given an input image, it returns the preprocessed version whose shape needs to be consistent with what get_shape returns.

Have a look at how RandomCrop is implemented to get a better feel of how it’s done.