Lec19 - GANs

Generative Adversarial
Networks (GANs)
From Ian Goodfellow et al.
A short tutorial by :-
Binglin, Shashank & Bhargav
Adapted for Purdue MA 598, Spring 2019 from

http://slazebni.cs.illinois.edu/spring17/lec11_gan.pptx
Outline
• Part 1: Review of GANs
• Part 2: Some challenges with GANs
• Part 3: Applications of GANs

GAN’s Architecture
x
D D(x)
G
z
G(z)
D(G(z))
• Z is some random noise (Gaussian/Uniform).

• Z can be thought as the latent representation of the image.
https://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-generative-models-and-adversarial-training-upc-2016
Training Discriminator
Training Generator
GAN’s formulation
•
• It is formulated as a minimax game, where:

• The Discriminator is trying to maximize its reward
• The Generator is trying to minimize Discriminator’s reward (or maximize its loss)
• The Nash equilibrium of this particular game is achieved at:
•D
Discriminator
updates
Generator
updates
Vanishing gradient strikes back again…
•
• =
• Gradient goes to if is confident, i.e.
• Minimize for Generator instead (keep Discriminator as it is)

CIFAR
Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. 2014.
DCGAN: Bedroom images
Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv:1511.06434 (2015).
Deep Convolutional GANs (DCGANs)
Key ideas:
• Replace FC hidden layers with
Generator Architecture Convolutions
• Generator: Fractional-Strided
convolutions
• Use Batch Normalization after

each layer
• Inside Generator
• Use ReLU for hidden layers
• Use Tanh for the output layer
Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv:1511.06434
(2015).
Part 2
• Training Challenges
• Non-Convergence
• Mode-Collapse
• Proposed Solutions
• Supervision with Labels
• Mini-Batch GANs
• Modification of GAN’s losses
• Discriminator (EB-GAN)
• Generator (InfoGAN)
Non-Convergence
•• Deep
Learning models (in general) involve a single player
• The player tries to maximize its reward (minimize its loss).
• Use SGD (with Backpropagation) to find the optimal parameters.
• SGD has convergence guarantees (under certain conditions).
• Problem: With non-convexity, we might converge to local optima.
• GANs instead involve two (or more) players

• Discriminator is trying to maximize its reward.
• Generator
min is trying to minimize Discriminator’s reward.
max 𝑉 ( 𝐷 ,𝐺 )
𝐺 𝐷
• SGD was not designed to find the Nash equilibrium of a game.

• Problem: We might not converge to the Nash equilibrium at all.
Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information Processing Systems. 2016.
Non-Convergence
•
• Differential equation’s solution has sinusoidal

terms
• Even with a small learning rate, it will not

converge
Goodfellow, Ian. "NIPS 2016 Tutorial: Generative Adversarial Networks." arXiv preprint arXiv:1701.00160 (2016).
Mode-Collapse
• Generator fails to output diverse samples
Target
Expected
Output
Metz, Luke, et al. "Unrolled Generative Adversarial Networks." arXiv preprint arXiv:1611.02163 (2016).
How to reward sample diversity?
• At Mode Collapse,
• Generator produces good samples, but a very few of them.
• Thus, Discriminator can’t tag them as fake.
• To address this problem,

• Let the Discriminator know about this edge-case.
• More formally,
• Let the Discriminator look at the entire batch instead of single examples
• If there is lack of diversity, it will mark the examples as fake
• Thus,
• Generator will be forced to produce diverse samples.
Mini-Batch GANs
• Extract features that capture diversity in the mini-batch
• For e.g. L2 norm of the difference between all pairs from the batch
• Feed those features to the discriminator along with the image
• Feature values will differ b/w diverse and non-diverse batches

• Thus, Discriminator will rely on those features for classification
• This in turn,
• Will force the Generator to match those feature values with the real data
• Will generate diverse batches
Supervision with Labels
• Label information of the real data might help
Car
Dog
Real
Human
D D
Fake
Fake
• Empirically generates much better samples
Alternate view of GANs
•
• In this formulation, Discriminator’s strategy was
• Alternatively, we can flip the binary classification labels i.e. Fake = 1, Real = 0
• In this new formulation, Discriminator’s strategy will be

Zhao, Junbo, Michael Mathieu, and Yann LeCun. "Energy-based generative adversarial network." arXiv preprint arXiv:1609.03126 (2016)
Alternate view of GANs (Contd.)
• If
all we want to encode is
We can use this
• Now, we can replace cross-entropy with any loss function (Hinge Loss)
• And thus, instead of outputting probabilities, Discriminator just has to output :-

• High values for fake samples
• Low values for real samples
Energy-Based GANs
• Modified game plans
• Generator will try to generate samples with
low values
• Discriminator will try to assign high scores to
fake values
• Use AutoEncoder inside the Discriminator
• Use Mean-Squared Reconstruction error as

• High Reconstruction Error for Fake samples
• Low Reconstruction Error for Real samples
More Bedrooms…
Feature parameterization
3D Faces
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. InfoGAN: Interpretable Representation Learning by Information Maximization Generative
Adversarial Nets, NIPS (2016).
How to reward Disentanglement?
• Disentanglement means individual dimensions
independently capturing key attributes of the image
• Let’s partition the noise vector into 2 parts :-

• z vector will capture slight variations in the image
• c vector will capture the main attributes of the image
• For e.g. Digit, Angle and Thickness of images in MNIST
• If c vector captures the key variations in the image,

Will c and be highly correlated or weakly correlated?
Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. InfoGAN: Interpretable Representation Learning by Information Maximization Generative Adversarial Nets
Mutual Information
• Mutual
Information captures the mutual dependence between two variables
• Mutual information between two variables is defined as:

InfoGAN
• We want to maximize the mutual information
between and
• Incorporate in the value function of the minimax

game.
min max 𝑉 ( 𝐷 , 𝐺 ) =𝑉 ( 𝐷 ,𝐺 ) − 𝜆 𝐼 ( 𝑐 ; 𝐺 ( 𝑧 , 𝑐 ))
𝐼
𝐺 𝐷
InfoGAN
•Mutual
Information’s Variational Lower bound
𝐷
𝑄
Part 3
• Conditional GANs
• Applications
• Image-to-Image Translation
• Text-to-Image Synthesis
• Face Aging
• Advanced GAN Extensions
• Coupled GAN
• LAPGAN – Laplacian Pyramid of Adversarial Networks
• Adversarially Learned Inference
• Summary
Conditional GANs
• Simple modification to the original GAN
framework that conditions the model on
additional information for better multi-modal
learning.
• Lends to many practical applications of GANs

when we have explicit supervision available.
Image Credit: Figure 2 in Odena, A., Olah, C. and Shlens, J., 2016. Conditional image synthesis with auxiliary classifier GANs. arXiv preprint arXiv:1610.09585.
Mirza, Mehdi, and Simon Osindero. “Conditional generative adversarial nets”. arXiv preprint arXiv:1411.1784 (2014).
Conditional GANs
MNIST digits generated conditioned on their class label.

MNIST digits
Figure 2 in the original paper.
Mirza, Mehdi, and Simon Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).
Image-to-Image Translation

Link to an interactive demo of this paper
https://affinelayer.com/pixsrv/
Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. “Image-to-image translation with conditional adversarial networks”. arXiv preprint arXiv:1611.07004. (2016).
Image-to-Image Translation
• Architecture: DCGAN-based
architecture
• Training is conditioned on the images

from the source domain.
• Conditional GANs provide an effective

way to handle many complex domains
without worrying about designing Figure 2 in the original paper.
structured loss functions explicitly.

Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. “Image-to-image translation with conditional adversarial networks”. arXiv preprint arXiv:1611.07004. (2016).
Text-to-Image Synthesis
Motivation
Given a text description, generate

images closely associated.
Uses a conditional GAN with the

generator and discriminator being
conditioned on “dense” text
embedding.
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. “Generative adversarial text to image synthesis”. ICML (2016).
Text-to-Image Synthesis
Positive Example: Negative Examples:

Real Image, Right Text Real Image, Wrong Text
Fake Image, Right Text
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. “Generative adversarial text to image synthesis”. ICML (2016).
Face Aging with Conditional GANs
• Differentiating Feature: Uses an Identity Preservation Optimization using an
auxiliary network to get a better approximation of the latent code (z*) for an
input image.
• Latent code is then conditioned on a discrete (one-hot) embedding of age
categories.
Antipov, G., Baccouche, M., & Dugelay, J. L. (2017). “Face Aging With Conditional Generative Adversarial Networks”. arXiv preprint arXiv:1702.01983.
Face Aging with Conditional GANs
Antipov, G., Baccouche, M., & Dugelay, J. L. (2017). “Face Aging With Conditional Generative Adversarial Networks”. arXiv preprint arXiv:1702.01983.
Part 3
• Conditional GANs
• Applications
• Image-to-Image Translation
• Text-to-Image Synthesis
• Face Aging
• Advanced GAN Extensions
• LAPGAN – Laplacian Pyramid of Adversarial Networks
• Adversarially Learned Inference
• Summary
Laplacian Pyramid of Adversarial
Networks
Figure 1 in the original paper. (Edited for simplicity)
• Based on the Laplacian Pyramid representation of images. (1983)

• Generate high resolution (dimension) images by using a hierarchical system of GANs
• Iteratively increase image resolution and quality.
Denton, E.L., Chintala, S. and Fergus, R., 2015. “Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks”. NIPS (2015)
Networks
Image Generation using a LAPGAN

• Generator generates the base image from random noise input .
• Generators () iteratively generate the difference image ( conditioned on previous small image ().
• This difference image is added to an up-scaled version of previous smaller image.
Networks
Training Procedure:
Models at each level are trained independently to learn the required representation.
Adversarially Learned Inference
• Basic idea is to learn an encoder/inference network along with the
generator network.
• Consider the following joint distributions over (image) and (latent

variables) :
encoder distribution
generator distribution
Dumoulin, Vincent, et al. “Adversarially learned inference”. arXiv preprint arXiv:1606.00704 (2016).

Discriminator Network
Encoder/Inference Network Generator Network

• min max 𝔼 𝑞 ( 𝑥 ) ¿ ¿ ¿
𝐺 𝐷

• Nash equilibrium yields
• Joint:
• Marginals: and
• Conditionals: and
• Inferred latent representation successfully reconstructed the original

image.
• Representation was useful in the downstream semi-supervised task.

Summary
• GANs are generative models that are implemented using two
stochastic neural network modules: Generator and Discriminator.
• Generator tries to generate samples from random noise as input
• Discriminator tries to distinguish the samples from Generator and
samples from the real data distribution.
• Both networks are trained adversarially (in tandem) to fool the other
component. In this process, both models become better at their
respective tasks.
Why use GANs for Generation?
• Can be trained using back-propagation for Neural Network based
Generator/Discriminator functions.
• Sharper images can be generated.
• Faster to sample from the model distribution: single forward pass
generates a single sample.
Reading List
• Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. and Bengio, Y. Generative adversarial nets, NIPS (2014).
• Goodfellow, Ian NIPS 2016 Tutorial: Generative Adversarial Networks, NIPS (2016).
• Radford, A., Metz, L. and Chintala, S., Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint
arXiv:1511.06434. (2015).
• Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. Improved techniques for training gans. NIPS (2016).
• Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P.
InfoGAN: Interpretable Representation Learning by Information Maximization Generative Adversarial Nets, NIPS (2016).
• Zhao, Junbo, Michael Mathieu, and Yann LeCun. Energy-based generative adversarial network. arXiv preprint arXiv:1609.03126 (2016).
• Mirza, Mehdi, and Simon Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).
• Liu, Ming-Yu, and Oncel Tuzel. Coupled generative adversarial networks. NIPS (2016).
• Denton, E.L., Chintala, S. and Fergus, R., 2015. Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks. NIPS (2015)
• Dumoulin, V., Belghazi, I., Poole, B., Lamb, A., Arjovsky, M., Mastropietro, O., & Courville, A. Adversarially learned inference. arXiv preprint
arXiv:1606.00704 (2016).
Applications:
• Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. Image-to-image translation with conditional adversarial networks. arXiv preprint arXiv:1611.07004. (2016).
• Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. Generative adversarial text to image synthesis. JMLR (2016).
• Antipov, G., Baccouche, M., & Dugelay, J. L. (2017). Face Aging With Conditional Generative Adversarial Networks. arXiv preprint arXiv:1702.01983.
Questions?

Lec19 - GANs

Uploaded by

Copyright:

Available Formats

Lec19 - GANs

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec19 - GANs

Uploaded by

Copyright:

Available Formats

Generative Adversarial

Adapted for Purdue MA 598, Spring 2019 from

• Part 2: Some challenges with GANs

• Part 3: Applications of GANs

• Z is some random noise (Gaussian/Uniform).

• It is formulated as a minimax game, where:

• The Nash equilibrium of this particular game is achieved at:

• Gradient goes to if is confident, i.e.

• Minimize for Generator instead (keep Discriminator as it is)

• Use Batch Normalization after

• GANs instead involve two (or more) players

• SGD was not designed to find the Nash equilibrium of a game.

• Differential equation’s solution has sinusoidal

• Even with a small learning rate, it will not

• To address this problem,

• Feed those features to the discriminator along with the image

• Feature values will differ b/w diverse and non-diverse batches

• Empirically generates much better samples

• In this formulation, Discriminator’s strategy was

• In this new formulation, Discriminator’s strategy will be

We can use this

• And thus, instead of outputting probabilities, Discriminator just has to output :-

• Use AutoEncoder inside the Discriminator

• Use Mean-Squared Reconstruction error as

• Let’s partition the noise vector into 2 parts :-

• If c vector captures the key variations in the image,

• Mutual information between two variables is defined as:

• Incorporate in the value function of the minimax

• Lends to many practical applications of GANs

Figure 2 in the original paper.

Figure 1 in the original paper.

• Training is conditioned on the images

• Conditional GANs provide an effective

structured loss functions explicitly.

Given a text description, generate

Uses a conditional GAN with the

Figure 2 in the original paper.

Positive Example: Negative Examples:

Figure 1 in the original paper.

Figure 3 in the original paper.

Figure 1 in the original paper. (Edited for simplicity)

• Based on the Laplacian Pyramid representation of images. (1983)

Figure 1 in the original paper.

Image Generation using a LAPGAN

Figure 2 in the original paper.

• Consider the following joint distributions over (image) and (latent

Dumoulin, Vincent, et al. “Adversarially learned inference”. arXiv preprint arXiv:1606.00704 (2016).

Encoder/Inference Network Generator Network

Dumoulin, Vincent, et al. “Adversarially learned inference”. arXiv preprint arXiv:1606.00704 (2016).

• Inferred latent representation successfully reconstructed the original

Dumoulin, Vincent, et al. “Adversarially learned inference”. arXiv preprint arXiv:1606.00704 (2016).

You might also like