Lecture 14, Feb. 23rd, 2015: Autoencoders

In this lecture we will begin our discuss of unsupervised learning methods. In particular, we will study a particular kind of neural network known as an autoencoder.

Please study the following material in preparation for the class:

  • Lecture 6 (6.1 to 6.7) of Hugo Larochelle’s course on Neural Networks.
  • Chapter 10 of the Deep Learning Textbook.

Other relevant material:

  •  Lecture 15 ( 15a-15f ) of Geoff Hinton’s cousera course on Neural Networks.

 

16 Replies to “Lecture 14, Feb. 23rd, 2015: Autoencoders”

  1. Hugo Larochelle mentions that adding input noise is equivalent to adding weight decay, but the denoising autoencoder performs better that autoencoder with weight decay. What is the reason for it? Also Jacobian with respect to input parameters is proportional to weight matrix. Why it is different from weight decay?

    Like

    1. Dima, the Jacobian used is the derivative of the loss function w.r.t. the input (e.g. the pixels of an image) and not the weights of the model. This is very different, and in fact I’m using a similar derivative for visualizing chess features (which I will talk more about Friday…).

      As we’ve already discussed in class, weight decay and adding (Gaussian) input noise is only equivalent in the case of a quadratic bowl (a second order Taylor approximation). Probably something else is happening when we add a hidden layer with non-linear activation functions…

      Like

      1. Woops, I was too fast there! The derivative (Jacobian) used for contractive auto-encoders is the derivative of the output value of the hidden units w.r.t. the input. For a sigmoid unit in the first hidden layer this is its weight matrix times a function involving exponentials…

        Like

        1. I think what he means is that weight noise/weight decay/jacobian regularization are all equivalent in the linear case, but this connection falls apart when you have the non-linearities. They might be still weakly related in the non-linear case, but I’m not sure what that relationship is.

          Like

    1. I am also interested in this. Can we relate it to the approximation deconvnets are making?

      For example, the assumption that the inverse matrix of the input weights W is approximately its transpose, if it is sparse enough and close to orthonormal (which it of them should be anyway in a completly linear auto-encoder)? Or is it simply another weight-sharing scheme to improve the sample-size statistical properties of the model?

      Like

  2. 1. Has the contractive penality been applied to other models such as feedforward neural networks? It seems like a reasonable thing to do at a first glance…

    2. The book first states that sparse coding uses a non-parametric function to encode the input. That is, it can choose any input representation for each unique training example. Yet in “10.2.4 Sparse Coding as a Generative Model” it is clearly defined as a parameteric model. Are the two forms different (or am I misunderstanding something?), and if yes then what is the non-parametric form useful for / how could you use it to learn a representation for a problem you need to solve?

    Liked by 2 people

    1. With respect to 1): The contractive penalty has been applied to supervised problems. In this recent paper they propose it to be more robust to adverserial examples. (The penalty really encourage smooth input-output mappings).

      Like

  3. I thougth section 10.2 of the DL book about factor analysis illustrates well why we want to keep weights W from growing outragously. Indeed, as the model weights increase, the covariance (WW_T + some term) will increase and the model pdf will become narrower and therefore not be able to represent unseen variations of the data.

    Like

  4. In section 10.2.3 of the DL book, I don’t think I understand the argument suggesting that imposing independance among gaussian factors does not allow one to disentangle them. I can see how this is true if the input has diagonal covariance, but not otherwise.

    Like

Leave a reply to francisquintallauzon Cancel reply