6 Replies to “Lecture 19, March 23rd, 2015: Deep Boltzmann Machines”

  1. I don’t quite understand the training procedure presented in the paper. When pre-training each layer of the DBM, twice as many parameters are computed than in the final model. How are those twice as many parameters re-combined to produce only one copy of the layer? Does the Gibbs sampler just discard the modified layer and preserve the other one as a sample?

    Liked by 1 person

  2. People who talk about this paper often mention the fact that the actual implementation uses an array of hacks/tricks that cannot be inferred from the paper. (Ian Goodfellow mentioned this in his PhD defense).

    I’d be nice if someone could go over these tricks.


    Liked by 1 person

  3. In the Paper Figure.4 they show examples of samples obtained from the 2 hidden layers and the 3 hidden layers DBM, should we see a difference in quality of the numbers (they look pretty good in the 2 models…)?
    if this is right what’s the advantage/interpretation? do we only need more then 2 layer for more complex representation in MNIST the 2 layer would be a good estimator of the distribution.


    1. My intuition is that it follows the Boltzmann Machine’s energy function (Eq 1. in the paper) but filling with zeros the weights corresponding to nodes that are not connected.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s