Realistic Face Image Generation Based On Generative Adversarial Network
Realistic Face Image Generation Based On Generative Adversarial Network
Realistic Face Image Generation Based On Generative Adversarial Network
ADVERSARIAL NETWORK
TING ZHANG1, WEN-HONG TIAN1, TING-YING ZHENG1, ZU-NING LI1, XUE-MEI DU1, FAN LI1
1
School of Information and Software Engineering,
University of Electronic Science and Technology of China, Chengdu 611731
E-MAIL: [email protected], [email protected]
Authorized licensed use limited to: University of Canberra. Downloaded on June 07,2020 at 10:03:21 UTC from IEEE Xplore. Restrictions apply.
GAN (SS-GAN) [4]. The iterative methods are different channel image, obtains the feature map after multiple
from the hierarchical methods. First, the models generate convolution operations, maps it to a scalar value after
images from coarse to fine, and each generator regenerates deforming it into a vector, and uses the Sigmoid function to
the details of the results. When using the same structure in determine the loss function value of the discriminator, as
the generator, the iterative method can use weight sharing shown in Fig.1.
between the generators, and the hierarchical methods usually
cannot do this. The classic types is StackGAN [5]. The direct
methods follow the principle of using a generator and a
discriminator in its model, and the structure of the generator
and discriminator is straightforward, with no branches. Many
of the GAN models fall into this category, such as DCGAN
[3], Spectral Normalization (SN-GAN) [7]. Among them,
DCGAN [3] is one of classic, since many later models use Fig.2 Generator Architecture
its structure, and many GAN model network designs like this, Generator G: We use the same architecture as the
such as cGANs [8]. This approach is relatively discriminator decoder, but with different weights. The first
straightforward to design and implement compared to layer receives the random noise signal vector for the fully
layered and iterative methods, and generally yields good connected layer, and is multiplied by the weight matrix to be
results. transform into a three-dimensional tensor; the remaining
layers are deconvolution operation layers, and 2×2 strides
3. Method are used in the deconvolution operation. The Relu activation
function is used, the last deconvolution layer uses the tanh
GAN inspires from the two-player game in game theory. activation function to generate a three-channel image, the
The two players in the GAN model are respectively structure of which shows in Fig.2.
composed of a generative model and a discriminative model.
Generate model G to capture the distribution of sample data, 3.2. Hyper-parameter
and generate a sample similar to real training data with noise
z obeying a certain distribution (uniform distribution, Ideally, when the image generated by G is the same as
Gaussian distribution, etc.). The goal is to generate samples the real image, the result is the best.
as good as real samples; the discriminant model D is one the However, there is a problem at this time: if the
classifier that estimates the probability of deriving the discriminator cannot discriminate between the generated
samples from the training data (rather than the generated sample and the real sample, then their error distribution and
data). If the sample is from real training data, D outputs a their expected errors will be the same. To solve this problem,
large probability; otherwise, D outputs a small probability. we introduce a hyper-parameter γ that can help the generator
and the discriminator balance the assignment tasks so that
3.1. Network Architecture the two sides are balanced and the above problem does not
occur.
In this section, we will introduce the network Thus, the discriminator has two competing goals: self-
architecture used. encoding the real image and distinguishing the generated
image from the real image. This hyper-parameter γ balances
these two goals. The low γ value will result in poor image
diversity, because the discriminator is too concerned about
self-encoding the real image, so the best value of γ is
between [0, 1].
304
Authorized licensed use limited to: University of Canberra. Downloaded on June 07,2020 at 10:03:21 UTC from IEEE Xplore. Restrictions apply.
𝑊(𝑃𝑟 , 𝑃𝑔 ) = inf 𝐸(𝑥,𝑦)~𝛾 [‖𝑥 − 𝑦‖] (1) 4.2. Parameters Setting
𝛾~𝜋 ∏(𝑃𝑁 𝑃𝑒 )
⚫ (𝑃𝑟 , 𝑃𝑔 ) is a set of all possible joint distributions of 𝑃𝑟
We trained two models, one is 64×64, and the other is
and 𝑃𝑔 . Conversely, the edge distribution of each of
128×128, set both initial learning rate as 1e-4. When we
(𝑃𝑟 , 𝑃𝑔 ) is 𝑃𝑟 and 𝑃𝑔 . For each possible joint trained the model, we found the results are the best when the
distribution γ, a real sample x and a generated sample y learning rate decay is 0.9 every 2000 iter.
can be obtained from the middle sample (x, y)
obeying the γ distribution; 4.3. Results
⚫ The Wasserstein distance is smooth, and it provides a
meaningful gradient when the parameters are optimized The 64×64 model’s result shows as Fig.3. The 128×
using the gradient descent method. 128 model’s result is very impressive, we can see set of teeth,
⚫ Calculate the distance of the pair of samples‖𝑥 − 𝑦‖; as shown in Fig.4.
we can obtain the expected value of the distance of the
sample under the joint distribution γ. The lower bound
of this mean value in all possible joint distributions. It
is the auto-encoder.
G Loss and D Loss. According to the principle of GAN
confrontation, the goal of D is to enlarge the distance
between the two distributions, that is, to maximize
Wasserstein distance: 𝑊(𝑃𝑟 , 𝑃𝑔 ), and G to minimize it. The
loss functions show as follows:
𝐿𝐷 = 𝑘𝑡 × 𝐿(𝐺(𝑧𝐷 )) 𝑓𝑜𝑟 𝜃𝐷
{ 𝐿𝐺 = 𝐿(𝐺(𝑧𝐺 )) 𝑓𝑜𝑟 𝜃𝐺 (2) Fig.3 The 64×64 model generated image
]
𝑘𝑡+1 = 𝛾[𝐿(𝑥) × 𝐿𝐺 + 𝑘𝑡 𝑓𝑜𝑟 𝑠𝑡𝑒𝑝 𝑡
4. Experiments
4.1. Dataset
Fig.4 The 128×128 model generated image
This paper conducted training and testing of the model We also trained the classic DCGAN and WGAN model,
on the CelebA dataset. The CelebA Face Dataset [2] is the as shown in Fig.5, our model has a better performance in
open data of The Chinese University of Hong Kong, which generating face images.
contains 202,599 face images of 10,177 celebrity identities,
all of which have been feature-marked, which is a very useful
dataset for face-related experiments. The tag
identity_CelebA.txt file is the label corresponding to the face,
and the list_attr_celeba.txt file is the feature tag in each face
map. For example, whether the person wears glasses. We
convert the txt file into a csv file, and distinguish the face
image according to the csv file. Each folder represents a
person, and thus obtains a single picture dataset.
Fig.5 Comparison of the results of the three models
305
Authorized licensed use limited to: University of Canberra. Downloaded on June 07,2020 at 10:03:21 UTC from IEEE Xplore. Restrictions apply.
model has a better performance in face image generation.
Acknowledgements
306
Authorized licensed use limited to: University of Canberra. Downloaded on June 07,2020 at 10:03:21 UTC from IEEE Xplore. Restrictions apply.