In this lecture we continue with our introduction to neural networks. Specifically we will discuss how to train neural networks: i.e. the **Backpropagation Algorithm**

**Please study the following material in preparation for the class:**

- Hugo Larochelle’s video lectures 2.1 to 2.7.
- Chapters 6 of the Deep Learning textbook (section 6.3)

**Other relevant material:**

- Hinton’s coursera lecture 3, videos 1 to 5.

Advertisements

I have two questions for us to discuss in this lecture:

– In the book on page 108: “On the other hand, other loss functions such as the squared error applied to sigmoid outputs (which was popular in the 80’s and 90’s) will have vanishing gradient when an output unit saturates, even if the output is completely wrong (Solla et al., 1988).”. My question is, why does the gradient vanish w.r.t. squared error on the output unit if the output is wrong? Maybe I misunderstood something?

– My second question is about unsupervised pretraining of neural networks. The book doesn’t mention it at all yet (and neither did Hugo in his lectures), but instead only discusses rectified linear units. So how important is unsupervised pretraining in SOTA models? And if we don’t need unsupervised pretraining anymore, then what consequences (i.e. what kind of prior) do rectified linear units impose on our problem? Can we say anything about the prior based on Guido Montufar’s work, e.g. that rectified linear units induce many more linear regions in the networks output compared to sigmoid or tanh activation functions?

– Julian

LikeLiked by 2 people

Is it needed to normalize the input layer in neural networks? How about hidden layers in deep neural nets?

LikeLike

Is there any sense in using higher-order optimization and, in particular, second-order optimization techniques? This paper http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf discusses different approaches and concludes, that second-order methods are not efficient. Is it possible to combine first-order and second-order optimization to gain a good performance and avoid saddle points?

LikeLiked by 2 people