Elsevier

Neurocomputing

Volume 366, 13 November 2019, Pages 140-153
Neurocomputing

G-GANISR: Gradual generative adversarial network for image super resolution

https://doi.org/10.1016/j.neucom.2019.07.094Get rights and content

Abstract

Adversarial methods have demonsterated to be signifiant at generating realistic images. However, these approaches have a challenging training process which partially attributed to the performance of discriminator. In this paper, we proposed an efficient super-resolution model based on generative adversarial network (GAN), to effectively generate reprehensive information and improve the test quality of the real-world images. To overcome the current issues, we designed the discriminator of our model based on the Least Square Loss function. The proposed network is organized by a gradual learning process from simple to advanced, which means from the small upsampling factors to the large upsampling factor that helps to improve the overall stability of the training. In particular, to control the model parameters and mitigate the training difficulties, dense residual learning strategy is adopted. Indeed, the key idea of proposed methodology is (i) fully exploit all the image details without losing information by gradually increases the task of discriminator, where the output of each layer is gradually improved in the next layer. In this way the model efficiently generates a super-resolution image even up to high scaling factors (e.g. × 8). (ii) The model is stable during the learning process, as we use least square loss instead of cross-entropy. In addition, the effects of different objective function on training stability are compared. To evaluate the model we conducted two sets of experiments, by using the proposed gradual GAN and the regular GAN to demonstrate the efficiency and stability of the proposed model for both quantitative and qualitative benchmarks.

Introduction

Image super-resolution is a classic problem in computer vision. It aims to inscribe the details of an image, more details provide better resolution. Previously, this technology was not as attractive as it is today. However, over time with the growth of technologies, the need for resolution enhancement in some crucial applications cannot be overlooked in areas such as remote sensing [3], object recognition [5], security surveillance [1], and medical imaging [2]. High resolution (HR) images can easily produce their corresponding low resolution (LR) images by using resolution degradation. However, inverse mapping, restoration from LR to HR images is a difficult task due to the lack of image texture details and sharpness edges. Recently, large numbers of super-resolution methods have been proposed and those which use Deep learning are superior. Due to the nature of deep learning which is based on non-linearity and ability to imitate any transformation and mapping, it is considered as a good fit for super-resolution problems. Since then, progress has been done on image super-resolution, and several methods have been proposed not only for images but also for videos and range images, which mostly are based on convolution neural network (CNN). Even though the current CNN based methods cannot get fully satisfactory perceptual quality, because they have not fully exploited all features from the original input image (low-resolution), and some of the details may be lost during the training process. Thus, the corresponding results will be apparently undesirable. Another common issue in CNN models is an objective function. The CNN based super-resolution models used pixel-wise loss functions such as l2 (least square errors) in their structures, which aim to reduce the MSE (mean square error) while increasing the similarity metric PSNR (peak signal to noise ratio) between model estimation and the ground-truth image. However, as discussed in [19], [26], [35], [37], those metrics do not consider the visual quality of the image. Therefore, their results lead to overall blurring and low perceptual quality. Inspired by CNN, recently generative adversarial network (GAN) [15] has demonstrated impressive performance and gained immense popularity in a variety of computer vision tasks. GAN is a class of neural network that learns to generate samples from a particular image input. It is comprised of two networks: a generator G and a discriminator D, which are in competition with each other. In fact, the generator learns to generate new samples and the discriminator learns to-distinguish between the generated samples and the real data points. In the GAN model each network wishes to minimize its own cost function, i.e. fD(θD, θG) for the discriminator and fG(θD, θG) for the generator. Generating super resolution images is a difficult task. Firstly, due to the lack of capacity to obtain small details (which are simply visible in a super-resolution images), and secondly, since the training process is unstable and lengthy. Recently, it has been pointed out that the main reason for these issues is the high dimensional spaces, which could be handled by a proper objective function [37]. By using an indecent loss, the discriminator recognizes the forgery samples (the generated samples) as the real samples with the least errors, because the samples are in the correct side of the margin boundary. This wrong decision has a negative impact on the updating process of the generator. In addition, due to these complex nets, the GAN architecture is unstable and it is crucial to set up a network in the best way possible. To effectively settle the current issues in GAN based super-resolution models, we propose a new GAN model, which is based on an image-to-image model by organizing a gradual learning process from the small upsampling factors to the large upsampling factors. The loss function has the operative driver in the learning network. However, this key issue has not been properly considered before. Most existing methods try to improve the results by optimizing the network structure or designing new layers, and generally, they used the defaults loss [1], [3]. These local losses are poorly correlated with the image's quality as it is perceived by a human observer. If the discriminator is considered as an energy-based function, then we can improve GAN stability. Based on these observations, this paper centered largely on the loss function and we designed a new discriminator that used the least square loss function and gradually training following generator; the parameters of the proposed least square model is simple to implement and has a fast computation rate. We proved that our GAN model has ability to deal with multiscale factors (up to  × 8). In the end, we proved that the proposed model adopting the least square is more stable than using Wasserstein GAN. This proposed learning process (simple to advanced) allows us to significantly improve the training result and could retain all the image information. To improve the image resolution and obtain realistic results, we designed our discriminator based on a least square function. The features obtained from the discriminator are exploited in order to create a more robust objective function, in contrast with current GAN which uses a classification network to generate the loss function. Least square [42] has the ability to appropriately separate the fake samples from the real samples by marginalizing the fake samples. In fact, the least square function controls the samples based on their distance to the margin, and so it helps to find more real samples for updating the generator. In this paper, we proved the power of the least square function to alleviate the current problems, by generating more gradient for updating the generator.

Our contributions are four-fold: (i) we proposed a new variation of generative adversarial network with adopting least square loss function for the discriminator which enables a stepwise quality enhancement by using the output of the previous layer. (ii) Opposed to the existing methods, we replaced the batch normalization with instance normalization [43] to obtain all the vital information. (iii) We evaluated the proposed model over several datasets and conducted two sets of experiments, direct learning strategy via the gradual learning strategy. (iv) in addition, we observed that the residual learning is beneficial in our model, as it speeds up the convergence. Thus, we adopted dense residual learning (contains both dense and skip connections) in our proposed architecture to simplify the training process. In fact, our contribution mainly focuses on this ongoing discussion (apply densely connected residual network in the adversarial networks, and also adapting gradual learning strategy instead of direct learning). In order to show the effect of least square in adversarial networks, we evaluated the result of our network with different loss functions, including Wasserstein [13]. We believe, the discrminator of our model can be prevented from becoming over-confident by adopting least square loss and it enables the generator to generate higher quality images in comparison with other approaches. The rest of this paper is organized as follows. In Section 2, we discussed the related works. Section 3 presents the proposed model architecture. Section 4 shows the experimental results and evaluation results. Finally, Section 5 concludes the paper.

Section snippets

Related works

In this section, we present a brief description of the existing methods and the background concepts, which are helpful for understanding our model. The Generator adversarial network (GAN) was first introduced by Goodfellow et al. [15] and the main idea behind it was to define a mutual game between two networks: discriminator D and generator G. The generator input is noise that generates samples as output. While the discriminator receives the real and the generated samples, it is optimized to

Proposed method

Recently GAN [15] have demonstrated great performance in various tasks. However, in image super-resolution, the quality of images which are generated by GANs still does not meet the real images’ resolution. One of the main concerns in this regard can be the loss function; usually, the loss function which is used in some of GAN models only works properly at the initial steps. Consequently, the discriminator cannot provide the right information for updating the generator. In regular GAN, while

Experimental evaluation

In this section, we evaluate the performance of the proposed model and conduct a series of experiments to compare it with other prominent methods especially WGAN, ResGAN, GP-GAN, and DCGAN. This paper used four benchmark datasets for the experiments including; Set5, Set14, BSD100, and Urban-100. All experiments are achieved with the highest scale factors, 4×, 6× and 8× between low and high-resolution images. We have used the following measures to fairly evaluate the performance of different

Conclusion

In this paper, we address three well-known issues in image super resolution approaches; improving the image resolution in particular perceptual quality, because adversarial training generally produces artifacts in the outputs which can degrade the image textures. Second component lies on improving the training stability. And the third component is to improve the model in term of runtime. Thus, we proposed an efficient GAN model which is able to produce state of the art results based on

Declaration of Competing Interest

None.

Pourya Shamsolmoali, Received PhD degree in computer science and graduated from Jamia Hamdard University, India and Shanghai Jiao Tong University, China, from 2016 to 2017 he was Associate researcher at the Advanced Scientific Computing Division in Euro-Mediterranean Center on Climate Change Foundation, Italy. Currently he is a researcher at Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University. In 2018 he selected as a young talented scientist by China ministry

References (44)

  • H. Wu, S. Zheng, J. Zhang, K. Huang, GP-GAN: Towards realistic high-resolution image blending. arXiv preprint...
  • TongT. et al.

    Image super-resolution using dense skip connections

  • J. Zhao, M. Mathieu, Y. LeCun, Energy-based generative adversarial network. arXiv preprint arXiv:1609.03126,...
  • D.P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” CoRR, vol. abs/1412.6980,...
  • M. Arjovsky et al.

    Wasserstein generative adversarial networks

  • Guo-Jun Qi, Loss-sensitive generative adversarial networks on lipschitz densities. arXiv preprint arXiv:1701.06264,...
  • I. Goodfellow et al.

    Generative adversarial nets

  • A. Radford, L. Metz, S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial...
  • C. Villani

    Optimal transport: old and new

    Am. Math. Soc.

    (2009)
  • I. Gulrajani et al.

    Improved training of wasserstein gans

  • M. Zareapor et al.

    Diverse adversarial network for image super-resolution

    Signal Process. Image Commun.

    (2019)
  • S. Reed et al.

    Generative adversarial text-to-image synthesis

  • Cited by (63)

    • Blind image deblurring via content adaptive method

      2023, Signal Processing: Image Communication
    • Channel attention generative adversarial network for super-resolution of glioma magnetic resonance image

      2023, Computer Methods and Programs in Biomedicine
      Citation Excerpt :

      Wang et al. [12] added residual blocks to GANs and removed the batch norm (BN) operation, and proposed an enhanced SRGANs (ESRGANs), which significantly improved the quality of the reconstructed images and had more realistic detailed texture information. Shamsolmoali et al. [13] used progressive GANs (G- GANISR), and the discriminator used the least square loss function to replace the cross-entropy to improve the stability of training. Compared with the original GANs, G-GANISR has improved efficiency and stability.

    • Dilated Adversarial U-Net Network for automatic gross tumor volume segmentation of nasopharyngeal carcinoma

      2021, Applied Soft Computing
      Citation Excerpt :

      However, original GAN has poor stability and is prone to mode collapse in the training images [22]. Many variations of GANs have been proposed for improving the quality and stability of generated images, such as deep convolutional GAN (DCGAN) [23], conditional GAN (CGAN) [24], gradual GAN [25] and Wasserstein GAN (WGAN) [26]. In computer vision, GANs have made great achievements [27].

    View all citing articles on Scopus

    Pourya Shamsolmoali, Received PhD degree in computer science and graduated from Jamia Hamdard University, India and Shanghai Jiao Tong University, China, from 2016 to 2017 he was Associate researcher at the Advanced Scientific Computing Division in Euro-Mediterranean Center on Climate Change Foundation, Italy. Currently he is a researcher at Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University. In 2018 he selected as a young talented scientist by China ministry of education. His research activities focus on Machine learning, Image Processing, Computer Vision and Deep Learning.

    Masoumeh Zareapoor, received Ph.D in computer science from Jamia Hamdard University, New Delhi, India in 2015. Currently, she is working as associate researcher in Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University. Prior to that, she was associate researcher in Tokyo University of technology. Her research activities focus on Computer Vision, Image Processing, and Machine Learning.

    Ruili Wang received the Ph.D. degree in computer science from Dublin City University, Dublin, Ireland. He is currently a Professor of Artificial Intelligence with the School of Natural and Computational Sciences, Massey University, Auckland, New Zealand, and the Director of the Centre of Language and Speech Processing. His research interests include speed processing, language processing, image processing, data mining, and intelligent systems. Dr. Wang is an Associate Editor and an Editorial Board member for international journals, such as Knowledge and Information Systems, Applied Soft Computing, etc. He was the recipient of the Marsden Fund, one of the most prestigious research grants in New Zealand.

    Deepak Kumar Jain, received PhD. from National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences (CASIA), Beijing, China. His research interests include computer vision, artificial Intelligence and face recognition.

    Jie Yang, received his Ph.D. from the Department of Computer Science, Hamburg University, Germany, in 1994. Currently, he is a professor at the Institute of Image Processing and Pattern recognition, Shanghai Jiao Tong University, China. He has led many research projects (e.g., National Science Foundation, 863 National High Tech. Plan), had one book published in Germany, and authored more than 200 journal papers. His major research interests are object detection and recognition, data fusion and data mining, and medical image processing.

    View full text