In this lecture we will discuss Variational Autoencoders.
Please study the following material in preparation for class:
- Auto-Encoding Variational Bayes by Diederik P Kingma, and Max Welling
- Slides from class lecture.
Other relevant material:
- Semi-supervised Learning with Deep Generative Models by Diederik P. Kingma, Danilo J. Rezende, Shakir Mohamed, Max Welling
- Stochastic Backpropagation and Approximate Inference in Deep Generative Models by Danilo Jimenez Rezende, Shakir Mohamed, Daan Wierstra
Can we go over the semi-supervised VAE M2 model in some detail? It seems you pay a penalty (section 3.3) compared to the number of classes. Could you get around this in practice if you wanted samples from every class?
It would also be good to talk about formulations of the sigma for stability: exp(log_sigma) vs. softplus vs. others. It is hard to see what the differences between softplus(sigma) vs parameterizing log_sigma directly are.
LikeLike
In the VAE paper, they compared the VAE to Monte Carlo EM (a variant from 1987!) in figure 3. There it clearly seems that Monte Carlo EM performs better than VAE for a small number of training examples, and (judging from the increasing blue lines in the right plot) perhaps also better in the case of many training examples. However, it is well known that online EM usually works even better than batch EM for large datasets… So, has the VAE ever been compared to other online variational EM algorithms?
Aaron, I guess you plan to do this anyway, but it would be nice to dicuss the shortcomings of VAEs as well. The VAE paper doesn’t really touch on that.
LikeLiked by 1 person
The DeepMind paper (Rezende et al.) seems to obtain worse samples on MNIST than Kingma et al. in “Auto-Encoding Variational Bayes”. I can’t quite tell if it’s because they use the binarized version of MNIST, or if it’s because each layer of the deep latent Gaussian model includes Gaussian noise. Could it be the latter case, since their NORB samples are quite blurry?
LikeLike
In section 2.1 of Kingma & Welling’s paper is mentioned, that they “do not make the common simplifying assumptions about the marginal or posterior”. Which assumptions do they mean?
LikeLike
It would be great if we go over section 2.2 (Variational Bound) in details.
LikeLike
Could you give us an insight on how to choose g(.) when q(z|x) doesn’t follow any of the approaches listed in section 2.4? Also, in practical, how is q(z|x) chosen for a case less obvious than a gaussian?
LikeLike