Lec19 - GANs

Generative Adversarial

Networks (GANs)
From Ian Goodfellow et al.

A short tutorial by :-
Binglin, Shashank & Bhargav

Adapted for Purdue MA 598, Spring 2019 from

• Part 1: Review of GANs

• Part 2: Some challenges with GANs

• Part 3: Applications of GANs

GAN’s Architecture

D D(x)


• Z is some random noise (Gaussian/Uniform).

• Z can be thought as the latent representation of the image.
Training Discriminator

Training Generator

GAN’s formulation

• It is formulated as a minimax game, where:

• The Discriminator is trying to maximize its reward
• The Generator is trying to minimize Discriminator’s reward (or maximize its loss)

• The Nash equilibrium of this particular game is achieved at:


Vanishing gradient strikes back again…


• =

• Gradient goes to if is confident, i.e.

• Minimize for Generator instead (keep Discriminator as it is)


Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. 2014.
DCGAN: Bedroom images

Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv:1511.06434 (2015).
Deep Convolutional GANs (DCGANs)
Key ideas:
• Replace FC hidden layers with
Generator Architecture Convolutions
• Generator: Fractional-Strided

• Use Batch Normalization after

each layer

• Inside Generator
• Use ReLU for hidden layers
• Use Tanh for the output layer

Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv:1511.06434
Part 2
• Training Challenges
• Non-Convergence
• Mode-Collapse
• Proposed Solutions
• Supervision with Labels
• Mini-Batch GANs
• Modification of GAN’s losses
• Discriminator (EB-GAN)
• Generator (InfoGAN)
•• Deep
  Learning models (in general) involve a single player
• The player tries to maximize its reward (minimize its loss).
• Use SGD (with Backpropagation) to find the optimal parameters.
• SGD has convergence guarantees (under certain conditions).
• Problem: With non-convexity, we might converge to local optima.

• GANs instead involve two (or more) players

• Discriminator is trying to maximize its reward.
• Generator
  min is trying to minimize Discriminator’s reward.
max 𝑉 ( 𝐷 ,𝐺 )

• SGD was not designed to find the Nash equilibrium of a game.

• Problem: We might not converge to the Nash equilibrium at all.
Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information Processing Systems. 2016.

• Differential equation’s solution has sinusoidal


• Even with a small learning rate, it will not


Goodfellow, Ian. "NIPS 2016 Tutorial: Generative Adversarial Networks." arXiv preprint arXiv:1701.00160 (2016).
• Generator fails to output diverse samples




Metz, Luke, et al. "Unrolled Generative Adversarial Networks." arXiv preprint arXiv:1611.02163 (2016).
How to reward sample diversity?
• At Mode Collapse,
• Generator produces good samples, but a very few of them.
• Thus, Discriminator can’t tag them as fake.

• To address this problem,

• Let the Discriminator know about this edge-case.

• More formally,
• Let the Discriminator look at the entire batch instead of single examples
• If there is lack of diversity, it will mark the examples as fake

• Thus,
• Generator will be forced to produce diverse samples.
Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information Processing Systems. 2016.
Mini-Batch GANs
• Extract features that capture diversity in the mini-batch
• For e.g. L2 norm of the difference between all pairs from the batch

• Feed those features to the discriminator along with the image

• Feature values will differ b/w diverse and non-diverse batches

• Thus, Discriminator will rely on those features for classification

• This in turn,
• Will force the Generator to match those feature values with the real data
• Will generate diverse batches

Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information Processing Systems. 2016.
Supervision with Labels
• Label information of the real data might help




• Empirically generates much better samples

Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information Processing Systems. 2016.
Alternate view of GANs

• In this formulation, Discriminator’s strategy was

• Alternatively, we can flip the binary classification labels i.e. Fake = 1, Real = 0

• In this new formulation, Discriminator’s strategy will be

Zhao, Junbo, Michael Mathieu, and Yann LeCun. "Energy-based generative adversarial network." arXiv preprint arXiv:1609.03126 (2016)
Alternate view of GANs (Contd.)
• If
  all we want to encode is

We can use this

• Now, we can replace cross-entropy with any loss function (Hinge Loss)

• And thus, instead of outputting probabilities, Discriminator just has to output :-

• High values for fake samples
• Low values for real samples

Zhao, Junbo, Michael Mathieu, and Yann LeCun. "Energy-based generative adversarial network." arXiv preprint arXiv:1609.03126 (2016)
Energy-Based GANs
•  Modified game plans
• Generator will try to generate samples with  
low values
• Discriminator will try to assign high scores to
fake values

• Use AutoEncoder inside the Discriminator

• Use Mean-Squared Reconstruction error as

• High Reconstruction Error for Fake samples
• Low Reconstruction Error for Real samples

Zhao, Junbo, Michael Mathieu, and Yann LeCun. "Energy-based generative adversarial network." arXiv preprint arXiv:1609.03126 (2016)
More Bedrooms…

Zhao, Junbo, Michael Mathieu, and Yann LeCun. "Energy-based generative adversarial network." arXiv preprint arXiv:1609.03126 (2016)
Feature parameterization
3D Faces

Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. InfoGAN: Interpretable Representation Learning by Information Maximization Generative
Adversarial Nets, NIPS (2016).
How to reward Disentanglement?
•  Disentanglement means individual dimensions
independently capturing key attributes of the image

• Let’s partition the noise vector into 2 parts :-

• z vector will capture slight variations in the image
• c vector will capture the main attributes of the image
• For e.g. Digit, Angle and Thickness of images in MNIST

• If c vector captures the key variations in the image,

Will c and be highly correlated or weakly correlated?

Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. InfoGAN: Interpretable Representation Learning by Information Maximization Generative Adversarial Nets
Mutual Information
• Mutual
  Information captures the mutual dependence between two variables

• Mutual information between two variables is defined as:

•  We want to maximize the mutual information
between and

• Incorporate in the value function of the minimax


 min max 𝑉 ( 𝐷 , 𝐺 ) =𝑉 ( 𝐷 ,𝐺 ) − 𝜆 𝐼 ( 𝑐 ; 𝐺 ( 𝑧 , 𝑐 ))

Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. InfoGAN: Interpretable Representation Learning by Information Maximization Generative
Adversarial Nets, NIPS (2016).
  Information’s Variational Lower bound  

Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. InfoGAN: Interpretable Representation Learning by Information Maximization Generative
Adversarial Nets, NIPS (2016).
Part 3
• Conditional GANs
• Applications
• Image-to-Image Translation
• Text-to-Image Synthesis
• Face Aging
• Advanced GAN Extensions
• Coupled GAN
• LAPGAN – Laplacian Pyramid of Adversarial Networks
• Adversarially Learned Inference
• Summary
Conditional GANs
• Simple modification to the original GAN
framework that conditions the model on
additional information for better multi-modal

• Lends to many practical applications of GANs

when we have explicit supervision available.

Image Credit: Figure 2 in Odena, A., Olah, C. and Shlens, J., 2016. Conditional image synthesis with auxiliary classifier GANs.  arXiv preprint arXiv:1610.09585.

Mirza, Mehdi, and Simon Osindero. “Conditional generative adversarial nets”. arXiv preprint arXiv:1411.1784 (2014).
Conditional GANs
MNIST digits generated conditioned on their class label.

MNIST digits

Figure 2 in the original paper.

Mirza, Mehdi, and Simon Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).
Image-to-Image Translation

Figure 1 in the original paper.

Link to an interactive demo of this paper
Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. “Image-to-image translation with conditional adversarial networks”. arXiv preprint arXiv:1611.07004. (2016).
Image-to-Image Translation
• Architecture: DCGAN-based

• Training is conditioned on the images

from the source domain.

• Conditional GANs provide an effective

way to handle many complex domains
without worrying about designing Figure 2 in the original paper.

structured loss functions explicitly.

Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. “Image-to-image translation with conditional adversarial networks”. arXiv preprint arXiv:1611.07004. (2016).
Text-to-Image Synthesis

Given a text description, generate

images closely associated.

Uses a conditional GAN with the

generator and discriminator being
conditioned on “dense” text
Figure 1 in the original paper.

Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. “Generative adversarial text to image synthesis”. ICML (2016).
Text-to-Image Synthesis

Figure 2 in the original paper.

Positive Example: Negative Examples:

Real Image, Right Text Real Image, Wrong Text
Fake Image, Right Text
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. “Generative adversarial text to image synthesis”. ICML (2016).
Face Aging with Conditional GANs
• Differentiating Feature: Uses an Identity Preservation Optimization using an
auxiliary network to get a better approximation of the latent code (z*) for an
input image.
• Latent code is then conditioned on a discrete (one-hot) embedding of age

Figure 1 in the original paper.

Antipov, G., Baccouche, M., & Dugelay, J. L. (2017). “Face Aging With Conditional Generative Adversarial Networks”. arXiv preprint arXiv:1702.01983.
Face Aging with Conditional GANs

Figure 3 in the original paper.

Antipov, G., Baccouche, M., & Dugelay, J. L. (2017). “Face Aging With Conditional Generative Adversarial Networks”. arXiv preprint arXiv:1702.01983.
Part 3
• Conditional GANs
• Applications
• Image-to-Image Translation
• Text-to-Image Synthesis
• Face Aging
• Advanced GAN Extensions
• LAPGAN – Laplacian Pyramid of Adversarial Networks
• Adversarially Learned Inference
• Summary
Laplacian Pyramid of Adversarial

Figure 1 in the original paper. (Edited for simplicity)

• Based on the Laplacian Pyramid representation of images. (1983)

• Generate high resolution (dimension) images by using a hierarchical system of GANs
• Iteratively increase image resolution and quality.

Denton, E.L., Chintala, S. and Fergus, R., 2015. “Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks”. NIPS (2015)
Laplacian Pyramid of Adversarial

Figure 1 in the original paper.

 Image Generation using a LAPGAN

• Generator generates the base image from random noise input .
• Generators () iteratively generate the difference image ( conditioned on previous small image ().
• This difference image is added to an up-scaled version of previous smaller image.

Denton, E.L., Chintala, S. and Fergus, R., 2015. “Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks”. NIPS (2015)
Laplacian Pyramid of Adversarial

Figure 2 in the original paper.

Training Procedure:
Models at each level are trained independently to learn the required representation.
Denton, E.L., Chintala, S. and Fergus, R., 2015. “Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks”. NIPS (2015)
Adversarially Learned Inference
•  Basic idea is to learn an encoder/inference network along with the
generator network.

• Consider the following joint distributions over (image) and (latent

variables) :
encoder distribution

generator distribution

Dumoulin, Vincent, et al. “Adversarially learned inference”. arXiv preprint arXiv:1606.00704 (2016).

Adversarially Learned Inference

Discriminator Network

Encoder/Inference Network Generator Network

Figure 1 in the original paper.

•  min max 𝔼 𝑞 ( 𝑥 ) ¿ ¿ ¿

Dumoulin, Vincent, et al. “Adversarially learned inference”. arXiv preprint arXiv:1606.00704 (2016).

Adversarially Learned Inference
•  Nash equilibrium yields

• Joint:
• Marginals: and 
• Conditionals: and 

• Inferred latent representation successfully reconstructed the original

• Representation was useful in the downstream semi-supervised task.

Dumoulin, Vincent, et al. “Adversarially learned inference”. arXiv preprint arXiv:1606.00704 (2016).

• GANs are generative models that are implemented using two
stochastic neural network modules: Generator and Discriminator.
• Generator tries to generate samples from random noise as input
• Discriminator tries to distinguish the samples from Generator and
samples from the real data distribution.
• Both networks are trained adversarially (in tandem) to fool the other
component. In this process, both models become better at their
respective tasks.
Why use GANs for Generation?
• Can be trained using back-propagation for Neural Network based
Generator/Discriminator functions.
• Sharper images can be generated.
• Faster to sample from the model distribution: single forward pass
generates a single sample.
