Lec19 - GANs

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 47

Generative Adversarial

Networks (GANs)
From Ian Goodfellow et al.

A short tutorial by :-
Binglin, Shashank & Bhargav

Adapted for Purdue MA 598, Spring 2019 from


http://slazebni.cs.illinois.edu/spring17/lec11_gan.pptx
Outline
• Part 1: Review of GANs

• Part 2: Some challenges with GANs

• Part 3: Applications of GANs


GAN’s Architecture
x

D D(x)

G
z
G(z)
D(G(z))

• Z is some random noise (Gaussian/Uniform).


• Z can be thought as the latent representation of the image.
https://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-generative-models-and-adversarial-training-upc-2016
Training Discriminator

https://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-generative-models-and-adversarial-training-upc-2016
Training Generator

https://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-generative-models-and-adversarial-training-upc-2016
GAN’s formulation
• 

• It is formulated as a minimax game, where:


• The Discriminator is trying to maximize its reward
• The Generator is trying to minimize Discriminator’s reward (or maximize its loss)

• The Nash equilibrium of this particular game is achieved at:

•D
Discriminator
updates

Generator
updates
Vanishing gradient strikes back again…

• 

• =

• Gradient goes to if is confident, i.e.

• Minimize for Generator instead (keep Discriminator as it is)


CIFAR

Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. 2014.
DCGAN: Bedroom images

Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv:1511.06434 (2015).
Deep Convolutional GANs (DCGANs)
Key ideas:
• Replace FC hidden layers with
Generator Architecture Convolutions
• Generator: Fractional-Strided
convolutions

• Use Batch Normalization after


each layer

• Inside Generator
• Use ReLU for hidden layers
• Use Tanh for the output layer

Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv:1511.06434
(2015).
Part 2
• Training Challenges
• Non-Convergence
• Mode-Collapse
• Proposed Solutions
• Supervision with Labels
• Mini-Batch GANs
• Modification of GAN’s losses
• Discriminator (EB-GAN)
• Generator (InfoGAN)
Non-Convergence
•• Deep
  Learning models (in general) involve a single player
• The player tries to maximize its reward (minimize its loss).
• Use SGD (with Backpropagation) to find the optimal parameters.
• SGD has convergence guarantees (under certain conditions).
• Problem: With non-convexity, we might converge to local optima.

• GANs instead involve two (or more) players


• Discriminator is trying to maximize its reward.
• Generator
  min is trying to minimize Discriminator’s reward.
max 𝑉 ( 𝐷 ,𝐺 )
𝐺 𝐷

• SGD was not designed to find the Nash equilibrium of a game.


• Problem: We might not converge to the Nash equilibrium at all.
Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information Processing Systems. 2016.
Non-Convergence
• 

• Differential equation’s solution has sinusoidal


terms

• Even with a small learning rate, it will not


converge

Goodfellow, Ian. "NIPS 2016 Tutorial: Generative Adversarial Networks." arXiv preprint arXiv:1701.00160 (2016).
Mode-Collapse
• Generator fails to output diverse samples

Target

Expected

Output

Metz, Luke, et al. "Unrolled Generative Adversarial Networks." arXiv preprint arXiv:1611.02163 (2016).
How to reward sample diversity?
• At Mode Collapse,
• Generator produces good samples, but a very few of them.
• Thus, Discriminator can’t tag them as fake.

• To address this problem,


• Let the Discriminator know about this edge-case.

• More formally,
• Let the Discriminator look at the entire batch instead of single examples
• If there is lack of diversity, it will mark the examples as fake

• Thus,
• Generator will be forced to produce diverse samples.
Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information Processing Systems. 2016.
Mini-Batch GANs
• Extract features that capture diversity in the mini-batch
• For e.g. L2 norm of the difference between all pairs from the batch

• Feed those features to the discriminator along with the image

• Feature values will differ b/w diverse and non-diverse batches


• Thus, Discriminator will rely on those features for classification

• This in turn,
• Will force the Generator to match those feature values with the real data
• Will generate diverse batches

Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information Processing Systems. 2016.
Supervision with Labels
• Label information of the real data might help
Car

Dog

Real

Human
D D
Fake
Fake

• Empirically generates much better samples

Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information Processing Systems. 2016.
Alternate view of GANs
• 

• In this formulation, Discriminator’s strategy was

• Alternatively, we can flip the binary classification labels i.e. Fake = 1, Real = 0

• In this new formulation, Discriminator’s strategy will be


Zhao, Junbo, Michael Mathieu, and Yann LeCun. "Energy-based generative adversarial network." arXiv preprint arXiv:1609.03126 (2016)
Alternate view of GANs (Contd.)
• If
  all we want to encode is

We can use this

• Now, we can replace cross-entropy with any loss function (Hinge Loss)

• And thus, instead of outputting probabilities, Discriminator just has to output :-


• High values for fake samples
• Low values for real samples

Zhao, Junbo, Michael Mathieu, and Yann LeCun. "Energy-based generative adversarial network." arXiv preprint arXiv:1609.03126 (2016)
Energy-Based GANs
•  Modified game plans
• Generator will try to generate samples with  
low values
• Discriminator will try to assign high scores to
fake values

• Use AutoEncoder inside the Discriminator

• Use Mean-Squared Reconstruction error as


• High Reconstruction Error for Fake samples
• Low Reconstruction Error for Real samples

Zhao, Junbo, Michael Mathieu, and Yann LeCun. "Energy-based generative adversarial network." arXiv preprint arXiv:1609.03126 (2016)
More Bedrooms…

Zhao, Junbo, Michael Mathieu, and Yann LeCun. "Energy-based generative adversarial network." arXiv preprint arXiv:1609.03126 (2016)
Feature parameterization
3D Faces

Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. InfoGAN: Interpretable Representation Learning by Information Maximization Generative
Adversarial Nets, NIPS (2016).
How to reward Disentanglement?
•  Disentanglement means individual dimensions
independently capturing key attributes of the image

• Let’s partition the noise vector into 2 parts :-


• z vector will capture slight variations in the image
• c vector will capture the main attributes of the image
• For e.g. Digit, Angle and Thickness of images in MNIST

• If c vector captures the key variations in the image,


Will c and be highly correlated or weakly correlated?

Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. InfoGAN: Interpretable Representation Learning by Information Maximization Generative Adversarial Nets
Mutual Information
• Mutual
  Information captures the mutual dependence between two variables

• Mutual information between two variables is defined as:


InfoGAN
•  We want to maximize the mutual information
between and

• Incorporate in the value function of the minimax


game.

 min max 𝑉 ( 𝐷 , 𝐺 ) =𝑉 ( 𝐷 ,𝐺 ) − 𝜆 𝐼 ( 𝑐 ; 𝐺 ( 𝑧 , 𝑐 ))
𝐼
𝐺 𝐷

Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. InfoGAN: Interpretable Representation Learning by Information Maximization Generative
Adversarial Nets, NIPS (2016).
InfoGAN
•Mutual
  Information’s Variational Lower bound  
𝐷  
𝑄

Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. InfoGAN: Interpretable Representation Learning by Information Maximization Generative
Adversarial Nets, NIPS (2016).
Part 3
• Conditional GANs
• Applications
• Image-to-Image Translation
• Text-to-Image Synthesis
• Face Aging
• Advanced GAN Extensions
• Coupled GAN
• LAPGAN – Laplacian Pyramid of Adversarial Networks
• Adversarially Learned Inference
• Summary
Conditional GANs
• Simple modification to the original GAN
framework that conditions the model on
additional information for better multi-modal
learning.

• Lends to many practical applications of GANs


when we have explicit supervision available.

Image Credit: Figure 2 in Odena, A., Olah, C. and Shlens, J., 2016. Conditional image synthesis with auxiliary classifier GANs.  arXiv preprint arXiv:1610.09585.

Mirza, Mehdi, and Simon Osindero. “Conditional generative adversarial nets”. arXiv preprint arXiv:1411.1784 (2014).
Conditional GANs
MNIST digits generated conditioned on their class label.
 

MNIST digits

Figure 2 in the original paper.

Mirza, Mehdi, and Simon Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).
Image-to-Image Translation

Figure 1 in the original paper.


Link to an interactive demo of this paper
https://affinelayer.com/pixsrv/
Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. “Image-to-image translation with conditional adversarial networks”. arXiv preprint arXiv:1611.07004. (2016).
Image-to-Image Translation
• Architecture: DCGAN-based
architecture

• Training is conditioned on the images


from the source domain.

• Conditional GANs provide an effective


way to handle many complex domains
without worrying about designing Figure 2 in the original paper.

structured loss functions explicitly.


Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. “Image-to-image translation with conditional adversarial networks”. arXiv preprint arXiv:1611.07004. (2016).
Text-to-Image Synthesis
Motivation

Given a text description, generate


images closely associated.

Uses a conditional GAN with the


generator and discriminator being
conditioned on “dense” text
embedding.
Figure 1 in the original paper.

Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. “Generative adversarial text to image synthesis”. ICML (2016).
Text-to-Image Synthesis

Figure 2 in the original paper.

Positive Example: Negative Examples:


Real Image, Right Text Real Image, Wrong Text
Fake Image, Right Text
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. “Generative adversarial text to image synthesis”. ICML (2016).
Face Aging with Conditional GANs
• Differentiating Feature: Uses an Identity Preservation Optimization using an
auxiliary network to get a better approximation of the latent code (z*) for an
input image.
• Latent code is then conditioned on a discrete (one-hot) embedding of age
categories.

Figure 1 in the original paper.

Antipov, G., Baccouche, M., & Dugelay, J. L. (2017). “Face Aging With Conditional Generative Adversarial Networks”. arXiv preprint arXiv:1702.01983.
Face Aging with Conditional GANs

Figure 3 in the original paper.

Antipov, G., Baccouche, M., & Dugelay, J. L. (2017). “Face Aging With Conditional Generative Adversarial Networks”. arXiv preprint arXiv:1702.01983.
Part 3
• Conditional GANs
• Applications
• Image-to-Image Translation
• Text-to-Image Synthesis
• Face Aging
• Advanced GAN Extensions
• LAPGAN – Laplacian Pyramid of Adversarial Networks
• Adversarially Learned Inference
• Summary
Laplacian Pyramid of Adversarial
Networks

Figure 1 in the original paper. (Edited for simplicity)

• Based on the Laplacian Pyramid representation of images. (1983)


• Generate high resolution (dimension) images by using a hierarchical system of GANs
• Iteratively increase image resolution and quality.

Denton, E.L., Chintala, S. and Fergus, R., 2015. “Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks”. NIPS (2015)
Laplacian Pyramid of Adversarial
Networks

Figure 1 in the original paper.

 Image Generation using a LAPGAN


• Generator generates the base image from random noise input .
• Generators () iteratively generate the difference image ( conditioned on previous small image ().
• This difference image is added to an up-scaled version of previous smaller image.

Denton, E.L., Chintala, S. and Fergus, R., 2015. “Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks”. NIPS (2015)
Laplacian Pyramid of Adversarial
Networks

Figure 2 in the original paper.

Training Procedure:
Models at each level are trained independently to learn the required representation.
Denton, E.L., Chintala, S. and Fergus, R., 2015. “Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks”. NIPS (2015)
Adversarially Learned Inference
•  Basic idea is to learn an encoder/inference network along with the
generator network.

• Consider the following joint distributions over (image) and (latent


variables) :
encoder distribution

generator distribution

Dumoulin, Vincent, et al. “Adversarially learned inference”. arXiv preprint arXiv:1606.00704 (2016).


Adversarially Learned Inference

Discriminator Network

Encoder/Inference Network Generator Network


Figure 1 in the original paper.

•  min max 𝔼 𝑞 ( 𝑥 ) ¿ ¿ ¿
𝐺 𝐷

Dumoulin, Vincent, et al. “Adversarially learned inference”. arXiv preprint arXiv:1606.00704 (2016).


Adversarially Learned Inference
•  Nash equilibrium yields

• Joint:
• Marginals: and 
• Conditionals: and 

• Inferred latent representation successfully reconstructed the original


image.
• Representation was useful in the downstream semi-supervised task.

Dumoulin, Vincent, et al. “Adversarially learned inference”. arXiv preprint arXiv:1606.00704 (2016).


Summary
• GANs are generative models that are implemented using two
stochastic neural network modules: Generator and Discriminator.
• Generator tries to generate samples from random noise as input
• Discriminator tries to distinguish the samples from Generator and
samples from the real data distribution.
• Both networks are trained adversarially (in tandem) to fool the other
component. In this process, both models become better at their
respective tasks.
Why use GANs for Generation?
• Can be trained using back-propagation for Neural Network based
Generator/Discriminator functions.
• Sharper images can be generated.
• Faster to sample from the model distribution: single forward pass
generates a single sample.
Reading List
• Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. and Bengio, Y. Generative adversarial nets, NIPS (2014).
• Goodfellow, Ian NIPS 2016 Tutorial: Generative Adversarial Networks, NIPS (2016).
• Radford, A., Metz, L. and Chintala, S., Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint
arXiv:1511.06434. (2015).
• Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. Improved techniques for training gans. NIPS (2016).
• Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. 
InfoGAN: Interpretable Representation Learning by Information Maximization Generative Adversarial Nets, NIPS (2016).
• Zhao, Junbo, Michael Mathieu, and Yann LeCun. Energy-based generative adversarial network. arXiv preprint arXiv:1609.03126 (2016).
• Mirza, Mehdi, and Simon Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).
• Liu, Ming-Yu, and Oncel Tuzel. Coupled generative adversarial networks. NIPS (2016).
• Denton, E.L., Chintala, S. and Fergus, R., 2015. Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks. NIPS (2015)
• Dumoulin, V., Belghazi, I., Poole, B., Lamb, A., Arjovsky, M., Mastropietro, O., & Courville, A. Adversarially learned inference. arXiv preprint
arXiv:1606.00704 (2016).

Applications:
• Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. Image-to-image translation with conditional adversarial networks. arXiv preprint arXiv:1611.07004. (2016).
• Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. Generative adversarial text to image synthesis. JMLR (2016).
• Antipov, G., Baccouche, M., & Dugelay, J. L. (2017). Face Aging With Conditional Generative Adversarial Networks. arXiv preprint arXiv:1702.01983.
Questions?

You might also like