G-GANISR: Gradual generative adversarial network for image super resolution
Introduction
Image super-resolution is a classic problem in computer vision. It aims to inscribe the details of an image, more details provide better resolution. Previously, this technology was not as attractive as it is today. However, over time with the growth of technologies, the need for resolution enhancement in some crucial applications cannot be overlooked in areas such as remote sensing [3], object recognition [5], security surveillance [1], and medical imaging [2]. High resolution (HR) images can easily produce their corresponding low resolution (LR) images by using resolution degradation. However, inverse mapping, restoration from LR to HR images is a difficult task due to the lack of image texture details and sharpness edges. Recently, large numbers of super-resolution methods have been proposed and those which use Deep learning are superior. Due to the nature of deep learning which is based on non-linearity and ability to imitate any transformation and mapping, it is considered as a good fit for super-resolution problems. Since then, progress has been done on image super-resolution, and several methods have been proposed not only for images but also for videos and range images, which mostly are based on convolution neural network (CNN). Even though the current CNN based methods cannot get fully satisfactory perceptual quality, because they have not fully exploited all features from the original input image (low-resolution), and some of the details may be lost during the training process. Thus, the corresponding results will be apparently undesirable. Another common issue in CNN models is an objective function. The CNN based super-resolution models used pixel-wise loss functions such as l2 (least square errors) in their structures, which aim to reduce the MSE (mean square error) while increasing the similarity metric PSNR (peak signal to noise ratio) between model estimation and the ground-truth image. However, as discussed in [19], [26], [35], [37], those metrics do not consider the visual quality of the image. Therefore, their results lead to overall blurring and low perceptual quality. Inspired by CNN, recently generative adversarial network (GAN) [15] has demonstrated impressive performance and gained immense popularity in a variety of computer vision tasks. GAN is a class of neural network that learns to generate samples from a particular image input. It is comprised of two networks: a generator G and a discriminator D, which are in competition with each other. In fact, the generator learns to generate new samples and the discriminator learns to-distinguish between the generated samples and the real data points. In the GAN model each network wishes to minimize its own cost function, i.e. fD(θD, θG) for the discriminator and fG(θD, θG) for the generator. Generating super resolution images is a difficult task. Firstly, due to the lack of capacity to obtain small details (which are simply visible in a super-resolution images), and secondly, since the training process is unstable and lengthy. Recently, it has been pointed out that the main reason for these issues is the high dimensional spaces, which could be handled by a proper objective function [37]. By using an indecent loss, the discriminator recognizes the forgery samples (the generated samples) as the real samples with the least errors, because the samples are in the correct side of the margin boundary. This wrong decision has a negative impact on the updating process of the generator. In addition, due to these complex nets, the GAN architecture is unstable and it is crucial to set up a network in the best way possible. To effectively settle the current issues in GAN based super-resolution models, we propose a new GAN model, which is based on an image-to-image model by organizing a gradual learning process from the small upsampling factors to the large upsampling factors. The loss function has the operative driver in the learning network. However, this key issue has not been properly considered before. Most existing methods try to improve the results by optimizing the network structure or designing new layers, and generally, they used the defaults loss [1], [3]. These local losses are poorly correlated with the image's quality as it is perceived by a human observer. If the discriminator is considered as an energy-based function, then we can improve GAN stability. Based on these observations, this paper centered largely on the loss function and we designed a new discriminator that used the least square loss function and gradually training following generator; the parameters of the proposed least square model is simple to implement and has a fast computation rate. We proved that our GAN model has ability to deal with multiscale factors (up to × 8). In the end, we proved that the proposed model adopting the least square is more stable than using Wasserstein GAN. This proposed learning process (simple to advanced) allows us to significantly improve the training result and could retain all the image information. To improve the image resolution and obtain realistic results, we designed our discriminator based on a least square function. The features obtained from the discriminator are exploited in order to create a more robust objective function, in contrast with current GAN which uses a classification network to generate the loss function. Least square [42] has the ability to appropriately separate the fake samples from the real samples by marginalizing the fake samples. In fact, the least square function controls the samples based on their distance to the margin, and so it helps to find more real samples for updating the generator. In this paper, we proved the power of the least square function to alleviate the current problems, by generating more gradient for updating the generator.
Our contributions are four-fold: (i) we proposed a new variation of generative adversarial network with adopting least square loss function for the discriminator which enables a stepwise quality enhancement by using the output of the previous layer. (ii) Opposed to the existing methods, we replaced the batch normalization with instance normalization [43] to obtain all the vital information. (iii) We evaluated the proposed model over several datasets and conducted two sets of experiments, direct learning strategy via the gradual learning strategy. (iv) in addition, we observed that the residual learning is beneficial in our model, as it speeds up the convergence. Thus, we adopted dense residual learning (contains both dense and skip connections) in our proposed architecture to simplify the training process. In fact, our contribution mainly focuses on this ongoing discussion (apply densely connected residual network in the adversarial networks, and also adapting gradual learning strategy instead of direct learning). In order to show the effect of least square in adversarial networks, we evaluated the result of our network with different loss functions, including Wasserstein [13]. We believe, the discrminator of our model can be prevented from becoming over-confident by adopting least square loss and it enables the generator to generate higher quality images in comparison with other approaches. The rest of this paper is organized as follows. In Section 2, we discussed the related works. Section 3 presents the proposed model architecture. Section 4 shows the experimental results and evaluation results. Finally, Section 5 concludes the paper.
Section snippets
Related works
In this section, we present a brief description of the existing methods and the background concepts, which are helpful for understanding our model. The Generator adversarial network (GAN) was first introduced by Goodfellow et al. [15] and the main idea behind it was to define a mutual game between two networks: discriminator D and generator G. The generator input is noise that generates samples as output. While the discriminator receives the real and the generated samples, it is optimized to
Proposed method
Recently GAN [15] have demonstrated great performance in various tasks. However, in image super-resolution, the quality of images which are generated by GANs still does not meet the real images’ resolution. One of the main concerns in this regard can be the loss function; usually, the loss function which is used in some of GAN models only works properly at the initial steps. Consequently, the discriminator cannot provide the right information for updating the generator. In regular GAN, while
Experimental evaluation
In this section, we evaluate the performance of the proposed model and conduct a series of experiments to compare it with other prominent methods especially WGAN, ResGAN, GP-GAN, and DCGAN. This paper used four benchmark datasets for the experiments including; Set5, Set14, BSD100, and Urban-100. All experiments are achieved with the highest scale factors, 4×, 6× and 8× between low and high-resolution images. We have used the following measures to fairly evaluate the performance of different
Conclusion
In this paper, we address three well-known issues in image super resolution approaches; improving the image resolution in particular perceptual quality, because adversarial training generally produces artifacts in the outputs which can degrade the image textures. Second component lies on improving the training stability. And the third component is to improve the model in term of runtime. Thus, we proposed an efficient GAN model which is able to produce state of the art results based on
Declaration of Competing Interest
None.
Pourya Shamsolmoali, Received PhD degree in computer science and graduated from Jamia Hamdard University, India and Shanghai Jiao Tong University, China, from 2016 to 2017 he was Associate researcher at the Advanced Scientific Computing Division in Euro-Mediterranean Center on Climate Change Foundation, Italy. Currently he is a researcher at Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University. In 2018 he selected as a young talented scientist by China ministry
References (44)
- et al.
A new framework for remote sensing image super resolution: sparse representation-based method by processing dictionaries with multi-type features
J. Syst. Archit.
(2016) - et al.
Image super-resolution via a densely connected recursive network
Neurocomputing
(2018) - et al.
Incorporating image priors with deep convolutional neural networks for image super-resolution
Neurocomputing
(2016) - et al.
Recovering realistic texture in image super-resolution by deep spatial feature transform
- et al.
Deep convolution network for surveillance records super-resolution
Multimed. Tools Appl.
(2018) - et al.
Novel example-based method for super-resolution and denoising of medical images
IEEE Trans. Image Process.
(2014) - et al.
Ultra-resolving face images by discriminative generative networks
- et al.
A high-throughput 16× super resolution processor for real-time object recognition soc
- et al.
Delving deep into rectifiers: surpassing human-level performance on imagenet classification
- et al.
ImageNet classification with deep convolutional neural networks
J. Commun. ACM
(2017)
Image super-resolution using dense skip connections
Wasserstein generative adversarial networks
Generative adversarial nets
Optimal transport: old and new
Am. Math. Soc.
Improved training of wasserstein gans
Diverse adversarial network for image super-resolution
Signal Process. Image Commun.
Generative adversarial text-to-image synthesis
Cited by (63)
Scale-Aware Frequency Attention network for super-resolution
2023, NeurocomputingBlind image deblurring via content adaptive method
2023, Signal Processing: Image CommunicationChannel attention generative adversarial network for super-resolution of glioma magnetic resonance image
2023, Computer Methods and Programs in BiomedicineCitation Excerpt :Wang et al. [12] added residual blocks to GANs and removed the batch norm (BN) operation, and proposed an enhanced SRGANs (ESRGANs), which significantly improved the quality of the reconstructed images and had more realistic detailed texture information. Shamsolmoali et al. [13] used progressive GANs (G- GANISR), and the discriminator used the least square loss function to replace the cross-entropy to improve the stability of training. Compared with the original GANs, G-GANISR has improved efficiency and stability.
Dilated Adversarial U-Net Network for automatic gross tumor volume segmentation of nasopharyngeal carcinoma
2021, Applied Soft ComputingCitation Excerpt :However, original GAN has poor stability and is prone to mode collapse in the training images [22]. Many variations of GANs have been proposed for improving the quality and stability of generated images, such as deep convolutional GAN (DCGAN) [23], conditional GAN (CGAN) [24], gradual GAN [25] and Wasserstein GAN (WGAN) [26]. In computer vision, GANs have made great achievements [27].
Pourya Shamsolmoali, Received PhD degree in computer science and graduated from Jamia Hamdard University, India and Shanghai Jiao Tong University, China, from 2016 to 2017 he was Associate researcher at the Advanced Scientific Computing Division in Euro-Mediterranean Center on Climate Change Foundation, Italy. Currently he is a researcher at Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University. In 2018 he selected as a young talented scientist by China ministry of education. His research activities focus on Machine learning, Image Processing, Computer Vision and Deep Learning.
Masoumeh Zareapoor, received Ph.D in computer science from Jamia Hamdard University, New Delhi, India in 2015. Currently, she is working as associate researcher in Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University. Prior to that, she was associate researcher in Tokyo University of technology. Her research activities focus on Computer Vision, Image Processing, and Machine Learning.
Ruili Wang received the Ph.D. degree in computer science from Dublin City University, Dublin, Ireland. He is currently a Professor of Artificial Intelligence with the School of Natural and Computational Sciences, Massey University, Auckland, New Zealand, and the Director of the Centre of Language and Speech Processing. His research interests include speed processing, language processing, image processing, data mining, and intelligent systems. Dr. Wang is an Associate Editor and an Editorial Board member for international journals, such as Knowledge and Information Systems, Applied Soft Computing, etc. He was the recipient of the Marsden Fund, one of the most prestigious research grants in New Zealand.
Deepak Kumar Jain, received PhD. from National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences (CASIA), Beijing, China. His research interests include computer vision, artificial Intelligence and face recognition.
Jie Yang, received his Ph.D. from the Department of Computer Science, Hamburg University, Germany, in 1994. Currently, he is a professor at the Institute of Image Processing and Pattern recognition, Shanghai Jiao Tong University, China. He has led many research projects (e.g., National Science Foundation, 863 National High Tech. Plan), had one book published in Germany, and authored more than 200 journal papers. His major research interests are object detection and recognition, data fusion and data mining, and medical image processing.