Lecture 6, Jan. 26th, 2015: Training Neural Networks

In this lecture we continue with our introduction to neural networks. Specifically we will discuss how to train neural networks: i.e. the Backpropagation Algorithm

Please study the following material in preparation for the class:

Other relevant material:


3 thoughts on “Lecture 6, Jan. 26th, 2015: Training Neural Networks”

  1. I have two questions for us to discuss in this lecture:

    – In the book on page 108: “On the other hand, other loss functions such as the squared error applied to sigmoid outputs (which was popular in the 80’s and 90’s) will have vanishing gradient when an output unit saturates, even if the output is completely wrong (Solla et al., 1988).”. My question is, why does the gradient vanish w.r.t. squared error on the output unit if the output is wrong? Maybe I misunderstood something?

    – My second question is about unsupervised pretraining of neural networks. The book doesn’t mention it at all yet (and neither did Hugo in his lectures), but instead only discusses rectified linear units. So how important is unsupervised pretraining in SOTA models? And if we don’t need unsupervised pretraining anymore, then what consequences (i.e. what kind of prior) do rectified linear units impose on our problem? Can we say anything about the prior based on Guido Montufar’s work, e.g. that rectified linear units induce many more linear regions in the networks output compared to sigmoid or tanh activation functions?

    – Julian

    Liked by 2 people

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s