Realistic Face Image Generation Based On Generative Adversarial Network

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

REALISTIC FACE IMAGE GENERATION BASED ON GENERATIVE

ADVERSARIAL NETWORK
TING ZHANG1, WEN-HONG TIAN1, TING-YING ZHENG1, ZU-NING LI1, XUE-MEI DU1, FAN LI1
1
School of Information and Software Engineering,
University of Electronic Science and Technology of China, Chengdu 611731
E-MAIL: [email protected], [email protected]

Abstract: by their confrontation to capture the distribution of data.


Using a computer to generate images with realistic images We obtain the model by using Deep Convolutional
is a new direction in current computer vision research. This Neural Network and Deep Transpose Convolutional Neural
paper designs an image generation model based on the Network respectively as a discriminator and generator. We
Generative Adversarial Network (GAN). This paper creates a create an image generation model based on GAN, tests it on
model – a discriminator network and a generator network by
eliminating the fully connected layer in the traditional network
the CelebA dataset, and compares the model with the results
and applying batch normalization and deconvolution of Deep Convolution Generative Adversarial Networks
operations. This paper also uses a hyper-parameter to measure (DCGAN) and Wasserstein Generative Adversarial
the diversity and quality of the generated image. The Networks (WGAN). The experimental results show that our
experimental results of the model on the CelebA dataset show model has better performance on image generation.
that the model has excellent performance in face image
generation. 2. Related Work

Keywords: For deep learning algorithms, in order to understand the


Generative Adversarial Network; Face image generation; input data, they need to learn to create the data. The most
Hyper-parameter
promising approach is to use a generation model that learns
to discover the rules of the data and find the best distribution
1. Introduction to represent it. In addition, by learning to generate models,
we can even draw samples that are not in the training set but
There has been a surge of interest in the field of follow the same distribution.
computer vision over the past few years, most notably in the As a new generation model framework, the Generative
area of generate images automatically with real visual effects Adversarial Network [1] proposed in 2014 can generate a
by computers. With the increasing use of deep learning in the composite image that is better than the previous generation
field of computer vision, automatic image generation based model, and has since become one of the most popular
on depth models has received more and more attention. By research fields. The Generative Adversarial Network
using the powerful learning ability of the depth model, the includes two neural networks, a generator and a
inherent distribution law in the data can be efficiently mined discriminator, wherein the generator attempts to generate a
to generate images with similar distribution. These generated real sample to spoof the discriminator, and the discriminator
images can be applied to different scenes such as automatic attempts to distinguish between the real sample and the
synthesis of images, face image prediction of different ages, generated sample from the generator.
and obtaining artistic pictures. In addition, high-quality Generating images is by far the most widely studied
generated images can also be used to expand the amount of area of GAN. The main methods of GAN in image
image data in the dataset, which can alleviate the need for a generation are hierarchical methods, iterative methods and
-large number of training samples during deep learning direct methods. The algorithm under the hierarchical
model training, so that practical applications such as face approach uses two generators and two discriminators in its
recognition have higher accuracy. model, where different generators have different purposes,
In the current research on the automatic generation of and the relationship between the two generators can be
images, the idea of using GAN is widely adopted. The parallel or in series, such as Secure Steganography based on
discriminator and the generator will eventually get balance

978-1-7281-4242-5/19/$31.00 ©2019 IEEE 303

Authorized licensed use limited to: University of Canberra. Downloaded on June 07,2020 at 10:03:21 UTC from IEEE Xplore. Restrictions apply.
GAN (SS-GAN) [4]. The iterative methods are different channel image, obtains the feature map after multiple
from the hierarchical methods. First, the models generate convolution operations, maps it to a scalar value after
images from coarse to fine, and each generator regenerates deforming it into a vector, and uses the Sigmoid function to
the details of the results. When using the same structure in determine the loss function value of the discriminator, as
the generator, the iterative method can use weight sharing shown in Fig.1.
between the generators, and the hierarchical methods usually
cannot do this. The classic types is StackGAN [5]. The direct
methods follow the principle of using a generator and a
discriminator in its model, and the structure of the generator
and discriminator is straightforward, with no branches. Many
of the GAN models fall into this category, such as DCGAN
[3], Spectral Normalization (SN-GAN) [7]. Among them,
DCGAN [3] is one of classic, since many later models use Fig.2 Generator Architecture
its structure, and many GAN model network designs like this, Generator G: We use the same architecture as the
such as cGANs [8]. This approach is relatively discriminator decoder, but with different weights. The first
straightforward to design and implement compared to layer receives the random noise signal vector for the fully
layered and iterative methods, and generally yields good connected layer, and is multiplied by the weight matrix to be
results. transform into a three-dimensional tensor; the remaining
layers are deconvolution operation layers, and 2×2 strides
3. Method are used in the deconvolution operation. The Relu activation
function is used, the last deconvolution layer uses the tanh
GAN inspires from the two-player game in game theory. activation function to generate a three-channel image, the
The two players in the GAN model are respectively structure of which shows in Fig.2.
composed of a generative model and a discriminative model.
Generate model G to capture the distribution of sample data, 3.2. Hyper-parameter
and generate a sample similar to real training data with noise
z obeying a certain distribution (uniform distribution, Ideally, when the image generated by G is the same as
Gaussian distribution, etc.). The goal is to generate samples the real image, the result is the best.
as good as real samples; the discriminant model D is one the However, there is a problem at this time: if the
classifier that estimates the probability of deriving the discriminator cannot discriminate between the generated
samples from the training data (rather than the generated sample and the real sample, then their error distribution and
data). If the sample is from real training data, D outputs a their expected errors will be the same. To solve this problem,
large probability; otherwise, D outputs a small probability. we introduce a hyper-parameter γ that can help the generator
and the discriminator balance the assignment tasks so that
3.1. Network Architecture the two sides are balanced and the above problem does not
occur.
In this section, we will introduce the network Thus, the discriminator has two competing goals: self-
architecture used. encoding the real image and distinguishing the generated
image from the real image. This hyper-parameter γ balances
these two goals. The low γ value will result in poor image
diversity, because the discriminator is too concerned about
self-encoding the real image, so the best value of γ is
between [0, 1].

Fig.1 Discriminator Architecture 3.3. Loss Function

Discriminator D: We use an auto-encoder with a depth Wasserstein Distance. We use a self-encoder as a


encoder and decoder. We use two convolutions per layer. At classifier to match the loss distribution of the self-encoder by
each subsampled, the convolution filter is linearly increased, the loss based on the Wasserstein distance.
achieve subsampling with stride of two, and do upsampling The Wasserstein distance is also called the Earth-Mover
by nearest neighbors. The discriminator receives the three- (EM) distance and is defined as follows:

304

Authorized licensed use limited to: University of Canberra. Downloaded on June 07,2020 at 10:03:21 UTC from IEEE Xplore. Restrictions apply.
𝑊(𝑃𝑟 , 𝑃𝑔 ) = inf 𝐸(𝑥,𝑦)~𝛾 [‖𝑥 − 𝑦‖] (1) 4.2. Parameters Setting
𝛾~𝜋 ∏(𝑃𝑁 𝑃𝑒 )
⚫ (𝑃𝑟 , 𝑃𝑔 ) is a set of all possible joint distributions of 𝑃𝑟
We trained two models, one is 64×64, and the other is
and 𝑃𝑔 . Conversely, the edge distribution of each of
128×128, set both initial learning rate as 1e-4. When we
(𝑃𝑟 , 𝑃𝑔 ) is 𝑃𝑟 and 𝑃𝑔 . For each possible joint trained the model, we found the results are the best when the
distribution γ, a real sample x and a generated sample y learning rate decay is 0.9 every 2000 iter.
can be obtained from the middle sample (x, y)
obeying the γ distribution; 4.3. Results
⚫ The Wasserstein distance is smooth, and it provides a
meaningful gradient when the parameters are optimized The 64×64 model’s result shows as Fig.3. The 128×
using the gradient descent method. 128 model’s result is very impressive, we can see set of teeth,
⚫ Calculate the distance of the pair of samples‖𝑥 − 𝑦‖; as shown in Fig.4.
we can obtain the expected value of the distance of the
sample under the joint distribution γ. The lower bound
of this mean value in all possible joint distributions. It
is the auto-encoder.
G Loss and D Loss. According to the principle of GAN
confrontation, the goal of D is to enlarge the distance
between the two distributions, that is, to maximize
Wasserstein distance: 𝑊(𝑃𝑟 , 𝑃𝑔 ), and G to minimize it. The
loss functions show as follows:
𝐿𝐷 = 𝑘𝑡 × 𝐿(𝐺(𝑧𝐷 )) 𝑓𝑜𝑟 𝜃𝐷
{ 𝐿𝐺 = 𝐿(𝐺(𝑧𝐺 )) 𝑓𝑜𝑟 𝜃𝐺 (2) Fig.3 The 64×64 model generated image
]
𝑘𝑡+1 = 𝛾[𝐿(𝑥) × 𝐿𝐺 + 𝑘𝑡 𝑓𝑜𝑟 𝑠𝑡𝑒𝑝 𝑡

4. Experiments

In this section, we will introduce the train and test


results of our two models (64×64 and 128×128) on the
celebA dataset, and we will compare the results with the
classic DCGAN and WGAN architecture. The specific
experimental procedures and results are as follows.

4.1. Dataset
Fig.4 The 128×128 model generated image
This paper conducted training and testing of the model We also trained the classic DCGAN and WGAN model,
on the CelebA dataset. The CelebA Face Dataset [2] is the as shown in Fig.5, our model has a better performance in
open data of The Chinese University of Hong Kong, which generating face images.
contains 202,599 face images of 10,177 celebrity identities,
all of which have been feature-marked, which is a very useful
dataset for face-related experiments. The tag
identity_CelebA.txt file is the label corresponding to the face,
and the list_attr_celeba.txt file is the feature tag in each face
map. For example, whether the person wears glasses. We
convert the txt file into a csv file, and distinguish the face
image according to the csv file. Each folder represents a
person, and thus obtains a single picture dataset.
Fig.5 Comparison of the results of the three models

305

Authorized licensed use limited to: University of Canberra. Downloaded on June 07,2020 at 10:03:21 UTC from IEEE Xplore. Restrictions apply.
model has a better performance in face image generation.

Acknowledgements

This work is supported by the Sichuan Science and


Technology Program (2018FZ0101, 2018GZDZX0039,
2019ZDZX0009, 2018GZDZX0031) and the Chengdu
Science and Technology Program (2018-YF05-01138-GX).
Fig.6 The loss curve
We also use tensorboard to visualize the change of References
the loss value in the training. The left graph of Fig.6 shows
the generated data’s loss value through the self-encoder, and [1] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, X. Bing,
the right graph shows that the real data’s loss value through D. Warde-Farley, S. Ozair, A. Courville and Y. Bengio,
the self-encoder. We can see from the figure that the loss "Generative adversarial nets," in Advances in Neural
curve tends to be stable eventually and the network model Information Processing Systems 27, Curran Associates,
converges. Inc., 2014, pp. 2672--2680.
[2] Z. Liu, P. Luo, X. Wang and X. Tang, "Deep Learning
4.4. Test and Verify Face Attributes in the Wild," in The IEEE International
Conference on Computer Vision(ICCV), 2015.
High-quality generated images can be used to augment [3] A. Radford, L. Metz and S. Chintala, "Unsupervised
the training image data in the dataset, which can alleviate the Representation Learning with Deep Convolutional
need for a large number of training samples during deep Generative Adversarial Networks," Computer Science,
learning model training. 2015.
[4] H. Shi, J. Dong, W. Wang, Y. Qian and X. Zhang,
Table 1 The results of Face recognition "SSGAN: Secure Steganography Based on Generative
Can detect faces Adversarial Networks," in Advances in Multimedia
Can’t detect face but can’t identify Information Processing -- PCM 2017.
who [5] H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang
DCGAN 281/1000 0/1000 and D. N. Metaxas, "StackGAN: Text to Photo-
WGAN 123/1000 0/1000 Realistic Image Synthesis With Stacked Generative
Our Model 19/1000 0/1000 Adversarial Networks," in The IEEE International
As shown in Table 1, in order to verify the realism of Conference on Computer Vision (ICCV), 2017.
the generated face image, we use the open source project on [6] M. A. Bottou, S. Chintala and Leon, "Wasserstein
GitHub called Face_recognition1, which based on the deep Generative Adversarial Networks," in Proceedings of
learning model dlib, to identify the face images generated by the 34th International Conference on Machine Learning,
three models (DCGAN, WGAN and our model). The results 2017.
show that the face image generated by our model has the [7] T. Miyato, T. Kataoka, M. Koyama and Y. Yoshida,
highest recognition rate, up to 98.10%. "Spectral Normalization for Generative Adversarial
Networks," CoRR, 2018.
5. Conclusions [8] T. Miyato and M. Koyama, "cGANs with Projection
Discriminator," CoRR, 2018.
This paper proposes a new network architecture based
on the Generative Adversarial Network, and designs the
image generation model based on the TensorFlow deep
learning framework. In the experiment, we used the CelebA
dataset to train DCGAN, WGAN, and our model. We also
use the open source project on GitHub called
Face_Recognition1 to detect and recognize the faces
generated by the three models. The results show that our

1 The code for Face_recognition is publicly at: https://github.com/ageitgey/face_recognition

306

Authorized licensed use limited to: University of Canberra. Downloaded on June 07,2020 at 10:03:21 UTC from IEEE Xplore. Restrictions apply.

You might also like