Research On GAN-based Image Super-Resolution Method

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)

Research on GAN-based Image Super-Resolution


Method

Xiangyu Xue, Xiangnan Zhang, Haibing Li, Wenyong Wang*


College of Information Science and Technology
Northeast Normal University
Changchun, China
[email protected], [email protected], [email protected], [email protected]

Abstract—Super-Resolution (SR) refers to the single image. However, this method is not universal, and the
reconstruction of high-resolution image from low-resolution effects of image reconstruction and resolution enhancement
image, which has important application value in object are greatly limited. The traditional image super-resolution
detection, medical imaging, satellite remote sensing and other processing methods are mostly interpolation methods, by
fields. In recent years, with the rapid development of deep finding the nearest pixels of the image to select an
learning, the image super-resolution reconstruction method appropriate interpolation basis function, and calculating the
based on deep learning has made remarkable progress. In this gray value of the point to be interpolated to rebuild the
paper, R-SRGAN (Residual Super-Resolution Generative up-sampling and resolution of the image. With the
Adversarial Networks) is used to build the model and realize
development of artificial intelligence, a variety of deep
image super-resolution. By adding residual blocks between
adjacent convolutional layers of the GAN generator, more
learning super-resolution methods have been produced, such
detailed information is retained. At the same time, the as SRCNN (Super-Resolution Convolutional Neural
Wassertein distance is used as a loss function to enhance the Network, SRCNN)[4] method based on convolutional neural
training effect and achieve image super-resolution. network and SRGAN based on Generative Adversarial
Network[5] method, etc. Methods based on deep learning use
Keywords—Super-Resolution, Image Processing, Generative a large number of training samples to find the mapping
Adversarial Networks relationship between the low-resolution image and the
corresponding high-resolution image.
I. INTRODUCTION
B. Principles of GANs
With the rapid development of science and technology,
images, as the carrier of information, occupy the main Generative Adversarial Networks (GAN)[6] was first
position in information transmission and are widely spread. proposed by Ian Goodfellow in 2014. It is one of the most
However, due to the interference of hardware or environment, promising deep learning recent years, using the method of
the images obtained are often blurred, and there are problems adversarial games to learn the deep characteristics.
of distortion and low-resolution, which cannot meet people's GAN consists of a generator G and a discriminator D.
needs. While high-resolution images have high pixel density, The generator G is used to simulate the data distribution. The
it can provide people with more and more accurate discriminator D is used to calculate the probability that the
information, which can meet the needs of practical sample data comes from the training data but not G.
applications. How to improve the image quality to obtain Assuming input training data , the goal of generator G is to
high-resolution images has become a hot issue in research. learn the distribution of data . Its initial input noise
Image Super-Resolution (SR) method, converting distribution is ( ), ( ; ) represents the mapping of
existing low-resolution (LR) images into high-resolution noise to data space, where is the parameter of the
with the signal processing and image processing methods[1], distribution. Similarly, suppose ( ; ) is the mapping
is widely used in many fields such as medicine, military, function of the discriminator. Then its goal is to minimize the
remote sensing, video surveillance, etc. discrimination error rate, while the training goal of the
generator G is to maximize the probability of the errors
This paper proposes an image super-resolution method produced by the discriminator D. So the objective function
based on R-SRGAN for the problems of instability and mode of the generation adversarial network is a minimization
collapse in GAN training, with the method of which the problem:
super-resolution effect is optimized, the diversity of samples
is enhanced and the image generation quality is improved.
min max ( , ) = ~ ( )[ ( )]
II. RELATED WORK + ~ ( ) [ (1 − ( ( )))] (1)
A. Image Super-Resolution
The basic idea of image super-resolution was first However, the initial GAN has some problems, such as
proposed by J.L.Harris[2] and J.W.Goodman[3] in 1964 and difficult training process and lack of diversity of generated
1968, respectively. It combines prior knowledge to regain the samples. The emergence of WGAN[7] solves the problem of
high frequency information lost during the reconstruction of GAN. It uses EarthMover (EM) distance instead of JS
super-resolution, so that to get reconstructed image for a divergence to measure the distance between real samples and

978-1-7281-7005-3/20/$31.00 ©2020 IEEE 602 Dalian, China


June 27-29,
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on October 01,2023 at 11:37:24 UTC from IEEE Xplore. 2020 apply.
Restrictions
generated sample distributions, and solves the lack of the discriminant model together with the standard
diversity caused by unstable GAN training and easy collapse high-resolution image, and the regression training of the
problem. The WGAN objective function becomes the discriminant model is performed.
following function:
For the design of the generator model structure, in order
to improve the human eye perception effect of the generated
min max ( , ) = ~ ( )[ ( )]
image, and train a more stable deep network, we add the
+ ~ ( ) [ ( ( ))] (2) residual block of the upper convolution between the
convolution layers, and refer to EDSR[8] model proposed by
Lim et al. to remove unnecessary BN layers[10] from the
III. RESIDUAL SUPER-RESOLUTION GENERATIVE residual block[9] and improves the connection structure
ADVERSARIAL NETWORKS between the residual blocks. The resulting generated model
A. R-SRGAN Algorithm Flow and Loss Function structure is shown in the right side of Figure 2.
The R-SRGAN model structure includes a generative The discriminator uses the structure of VGG[11]. The final
model G and a discriminative model D. During training, a generator structure is shown in Figure 3(a), and the
low-resolution image is input into the generated model to discriminator structure is shown in Figure 3(b).
obtain a generated high-resolution image, which is input into
High-Resolution Image

Discriminative Judge the Type of


Model D Image

Low-Resolution Image Generative Model G

Fig.1 GAN Super-Resolution Model Process

Fig.2 Residual blocks before and after improvement

(a)Generator

(b)Discriminator
Fig.3 R-SRGAN generator and discriminator structure

The loss function consists of two parts, MSE loss and B. L2 Loss
adversarial loss: For the discriminator, the most intuitive and widely used
loss metric is the MSE loss. In the WGAN-based image
= + (1 − ) (3) super-resolution algorithm, we apply MSE to calculate the

603
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on October 01,2023 at 11:37:24 UTC from IEEE Xplore. Restrictions apply.
pixel distance between the generated image and the standard including infants, women, avatars, Birds, butterflies, Set14
high-resolution image: dataset has 14 images, including men, zebras, etc. BSD100 is
relatively rich, including 100 images of planes, people, vases,
1 etc.
= ( , − , ) (4) After obtaining a 4x high-resolution image through
super-resolution reconstruction technology, we evaluate the
quality of high-resolution image through the Peak
Among them, represents the standard Signal-to-Noise Ratio (PSNR)[16] and Structural Similarity
high-resolution image, and ( ) is a generated image. (SSIM)[17]. PSNR, usually expressed in decibel units, is a
The MSE loss is a good measure of the pixel difference commonly used image objective evaluation index. It is based
between the two. The MSE loss tends to be infinitely small, on the error between the corresponding pixels and the
which means that the generated image is infinitely close to error-sensitive image quality evaluation. SSIM is an index to
the original high-resolution image. measure the similarity of two images. These two evaluation
C. Adversarial Loss indexes are most widely used in the field of image
By modifying the loss function of the WGAN generator processing because of their simple calculation and clear
and the regression analysis of the generated high-resolution mathematical meaning. The larger the values of PSNR and
image by the discriminator. of the following formula SSIM, the higher the image quality and the better the
super-resolution performance.
represents the low-resolution image of the input generator.
B. Results Comparison
= max ~ ( )[ , − ( ( ))] (5) From the objective evaluation, the three algorithms have
achieved good results in the Set5 and Set14 datasets. As the
IV. EXPERIMENT dataset grows, the effect of the algorithm decreases
significantly with BSD100. Under various datasets, the
A. Dataset learning-based SRCNN, SRGAN, and R-SRGAN methods
The dataset used in this experiment is DIV2K[12]. The are better than the bicubic interpolation method, and
DIV2K dataset contains a total of 1000 2K resolution R-SRGAN is better than the SRCNN and SRGAN methods.
pictures, in which 800 are used as the training set, 100 Because the bicubic interpolation uses the gray values of
verification pictures, and 100 test pictures. In addition, in neighboring pixels to generate the gray values of the pixels
order to perform more accurate and detailed training, we to be interpolated, so comparing with the original image, the
segmented the training set pictures to generate 32220 image processed by the bicubic interpolation method will not
pictures for training. When comparing these with other increase the sharpness of the image, while SRCNN, SRGAN,
results, we used three standard benchmark datasets: Set5[13], and R-SRGAN can better simulate and restore the image
Set14[14], and BSD100[15]. The Set5 dataset has 5 images, with learning the data distribution of the original image.
TABLE I METHOD RESULTS COMPARISON漡BICUBIC INTERPOLATION,SRCNN,SRGAN,R-SRGAN

Bicubic Interpolation SRCNN SRGAN R-SRGAN


Dataset
PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM
Set5 28.43 0.8211 30.48 0.8628 29.40 0.8472 31.94 0.8626
Set14 25.99 0.7486 27.49 0.7513 26.02 0.7397 28.94 0.7866
BSD100 25.94 0.6935 26.84 0.7101 25.16 0.6688 27.28 0.7479
Some results are as follows is shown in figure 4. models. And achieved good results. And also, the generated
image has a strong sense of reality.
From the subjective evaluation, the interpolation method
is only slightly better than the original image, and the However, this paper still has some shortcomings. For
super-resolution method based on SRCNN, SRGAN, and example, in the early stages of the training process, the color
R-SRGAN is significantly larger than the interpolation of the generated image is unstable. when the resolution of the
method. R-SRGAN performs better than SRCNN and original image is too small, the level of detail recovery is
SRGAN, and has higher definition. insufficient. In the future, we will continue to study this issue
to optimize the generation of super-resolution images.
V. CONCLUSION
Image super-resolution is a research hotspot in the field REFERENCES
of image in recent years, and plays an important role in [1] Qi Zhang, Huafeng Wang, Sichen Yang. Image Super-Resolution
image transmission, aerial remote sensing, medical image Using a Wavelet-based Generative Adversarial Network. arXiv
preprint arXiv:1907.10213, 2019.
and so on. This paper uses the R-SRGAN method to realize
[2] Harris J L. Diffraction and resolving power[J]. JOSA, 1964, 54(7):
reconstruction of image super-resolution. This method 931-936.
clearly builds a model mapping structure and fits the [3] Goodman J W, Knight G R. Effects of film nonlinearities on
practical application of the super-resolution method. It does wavefront-reconstruction images of diffuse objects[J]. JOSA, 1968,
not need to deal with complex mathematical constraints, has 58(9): 1276-1283.
strong generalization ability, and can cope with complex [4] C. Dong, C. C. Loy, K. He, and X. Tang. Learning a deep
Changeable image features. when evaluated according to convolutional network for image super-resolution. In European
PSNR and SSIM indicators, this method is superior to Conference on Computer Vision (ECCV), pages 184–199. Springer,
2014.
bicubic interpolation, SRCNN, SRGAN and other methods.
Especially in terms of image contrast and line sharpness, this [5] Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew
Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani,
method is superior to previous super-resolution learning Johannes Totz, Zehan Wang, Wenzhe Shi, Photo-Realistic Single

604
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on October 01,2023 at 11:37:24 UTC from IEEE Xplore. Restrictions apply.
Image Super-Resolution Using a Generative Adversarial Network. [12] Yitong Yan, Chuangchuang Liu, Changyou Chen, Xianfang Sun,
2017 IEEE Conference on CVPR Longcun Jin, Xiang Zhou. Fine-grained Attention and
[6] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, Feature-sharing Generative Adversarial Networks for Single Image
S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Super-Resolution. arXiv preprint arXiv:1911.10773, 2019.
Advances in Neural Information Processing Systems (NIPS), pages [13] M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. Alberi-Morel.
2672–2680, 2014. Low-complexity single-image super-resolution based on nonnegative
[7] Xin Guo, Johnny Hong, Tianyi Lin, and Nan Yang, Relaxed neighbor embedding. BMVC, 2012.
Wasserstein with applications to GANs, 2017, arXiv:1705.07164. [14] R. Zeyde, M. Elad, and M. Protter. On single image scale-up using
[8] Lim, B., Son, S., Kim, H., Nah, S., Lee, K.M.: Enhanced deep sparse-representations. In Curves and Surfaces, pages 711–730.
residual networks for single image super-resolution. In: CVPRW. Springer, 2012.
(2017) [15] Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human
[9] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image segmented natural images and its application to evaluating
recognition. In: CVPR. (2016) segmentation algorithms and measuring ecological statistics. In:
ICCV. (2001)
[10] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep
network training by reducing internal covariate shift. In: ICMR. [16] Kim, J., Kwon Lee, J., Mu Lee, K.: Accurate image super-resolution
(2015). using very deep convolutional networks. In: CVPR. (2016)
[11] K. Simonyan and A. Zisserman. Very deep convolutional networks [17] Nao Takano, Gita Alaghband. SRGAN: Training Dataset Matters.
for large-scale image recognition. In International Conference on arXiv preprint arXiv:1903.09922, 2019.
Learning Representations (ICLR), 2015.
Original Image Bicubic Interpolation SRCNN SRGAN R-SRGAN

Fig.4 Method results comparison

605
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY WARANGAL. Downloaded on October 01,2023 at 11:37:24 UTC from IEEE Xplore. Restrictions apply.

You might also like