Lecture 1, Jan. 8, 2015

The first class will be January 8th. We will discuss the plan for the course and the pedagogical method chosen.

Lecture 01 slides (slides built mostly on Hugo Larochelle’s review slides)

The following material will also be covered, and questions on this material can optionally be posted below or brought-up in class (and posted afterwards, along with answers):


4 Replies to “Lecture 1, Jan. 8, 2015”

  1. Regarding the question raised in class about visualization of units in a neural network, “Understanding representations learned in deep architectures” paper has a great review on different techniques.

    Here is the summary:
    – For the first layer of representation: Simply visualize the weights of connections to each of hidden units. We can easily visualize them because they are in a same space as the inputs are.

    – For higher-level layers, there are 3 famous approaches:
    1. Sampling. In generative models such as Deep Belief Network, we can reconstruct input with respect to a particular node to get an insight to its functionality. But as we know, the reconstruction is similar to input. So extra processing is required to get a clear understanding of the behavior of that unit.
    2. The idea is to maximize the activation of that unit regardless of the activation function (The total sum of the input to the unit from the previous layer plus its bias.). This technique boils down to finding images in the training or test set which activate that unit.
    3. Finding the representation of a particular node by linear combination of lower layer units. So, a 3rd-level unit is linear combination of filters in the 2nd-level and a 2nd-level unit is a linear combination of filters in the 1st-level.

    Erhan, Dumitru, Aaron Courville, and Yoshua Bengio. Understanding representations learned in deep architectures. Technical Report 1355, Université de Montréal/DIRO, 2010.

    Liked by 2 people

    1. Thanks Mohammad for your answer!

      Regarding the second approach for deeper layers:
      They use backpropagation to optimise the activation with respect to the inputs, so the resulting solutions actually do not belong to the training/testing datasets.
      Finding the training/testing images that produce the highest activations can be a simple and effective way to interpret a feature learnt by a neuron (they do it in “Visualizing and Understanding Convolutional Networks”). Nevertheless, when the neuron’s feature is not obvious/big (say a watch that can be very small in the image), you may need many images to figure out that it is indeed the watch that is responsible for the high activation.

      In a recent paper (Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps), the VGG group from Oxford apply the same maximization technique to the output neurons of a convnet. They also introduce the notion of saliency maps which represent the value of the gradient of a specific (output) neuron with respect to the inputs for a particular image. These saliency maps therefore reveal which parts of the image are important in the activation of the output neuron (they actually use this information to localize the object inside its image and obtain decent results given the fact that they trained the network only with object labels and no information about their localization (=weakly supervised object localisation)). These saliency maps could also be used for any neurons in the architecture.

      Liked by 1 person

  2. In “Visualizing and Understanding Convolutional Networks” paper — figures in slides are from the same paper — Matthew D. Zeiler and Rob Fergus are using deconvnet to map these activities back to the input pixel space.

    In a nutshell, to examine a convnet, a deconvnet is attached to each of its layers, providing a continuous path back to image pixels.

    Then the architecture would be:
    Input goes through conv layer, non-linearity layer and pooling layer to create feature maps while knowing the switch variables — remember which pixels of a feature map were kept. Switch variables are introduced to address the problem with subsampling and pooling in higher layers.
    Then there is unpoooling layer, non-linearity layer and deconvnet layer to get feature activations.

    To start, an input image is presented to the convnet and features computed throughout the layers. To examine a given convnet activation, all other activations in the layer are set to zero and pass the feature maps as input to the attached deconvnet layer. Then we can successively (i) unpool, (ii) rectify and (iii) filter to reconstruct the activity in the layer beneath that gave rise to the chosen activation. This is then repeated until input pixel space is reached.

    Link to the paper:

    link to slide:

    Liked by 1 person

  3. Very basic stuff. There is a mistake in the definition of the Nullspace of a matrix on slide 9. The nullspace is {x \in mathbb{R}^n | Ax=0} This is clearly different from the definition in the slides….


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s