G-GANISR: Gradual generative adversarial network for image super resolution

doi:10.1016/j.neucom.2019.07.094

Neurocomputing

Volume 366, 13 November 2019, Pages 140-153

https://doi.org/10.1016/j.neucom.2019.07.094 Get rights and content

Abstract

Adversarial methods have demonsterated to be signifiant at generating realistic images. However, these approaches have a challenging training process which partially attributed to the performance of discriminator. In this paper, we proposed an efficient super-resolution model based on generative adversarial network (GAN), to effectively generate reprehensive information and improve the test quality of the real-world images. To overcome the current issues, we designed the discriminator of our model based on the Least Square Loss function. The proposed network is organized by a gradual learning process from simple to advanced, which means from the small upsampling factors to the large upsampling factor that helps to improve the overall stability of the training. In particular, to control the model parameters and mitigate the training difficulties, dense residual learning strategy is adopted. Indeed, the key idea of proposed methodology is (i) fully exploit all the image details without losing information by gradually increases the task of discriminator, where the output of each layer is gradually improved in the next layer. In this way the model efficiently generates a super-resolution image even up to high scaling factors (e.g. × 8). (ii) The model is stable during the learning process, as we use least square loss instead of cross-entropy. In addition, the effects of different objective function on training stability are compared. To evaluate the model we conducted two sets of experiments, by using the proposed gradual GAN and the regular GAN to demonstrate the efficiency and stability of the proposed model for both quantitative and qualitative benchmarks.

Introduction

Image super-resolution is a classic problem in computer vision. It aims to inscribe the details of an image, more details provide better resolution. Previously, this technology was not as attractive as it is today. However, over time with the growth of technologies, the need for resolution enhancement in some crucial applications cannot be overlooked in areas such as remote sensing [3], object recognition [5], security surveillance [1], and medical imaging [2]. High resolution (HR) images can easily produce their corresponding low resolution (LR) images by using resolution degradation. However, inverse mapping, restoration from LR to HR images is a difficult task due to the lack of image texture details and sharpness edges. Recently, large numbers of super-resolution methods have been proposed and those which use Deep learning are superior. Due to the nature of deep learning which is based on non-linearity and ability to imitate any transformation and mapping, it is considered as a good fit for super-resolution problems. Since then, progress has been done on image super-resolution, and several methods have been proposed not only for images but also for videos and range images, which mostly are based on convolution neural network (CNN). Even though the current CNN based methods cannot get fully satisfactory perceptual quality, because they have not fully exploited all features from the original input image (low-resolution), and some of the details may be lost during the training process. Thus, the corresponding results will be apparently undesirable. Another common issue in CNN models is an objective function. The CNN based super-resolution models used pixel-wise loss functions such as l2 (least square errors) in their structures, which aim to reduce the MSE (mean square error) while increasing the similarity metric PSNR (peak signal to noise ratio) between model estimation and the ground-truth image. However, as discussed in [19], [26], [35], [37], those metrics do not consider the visual quality of the image. Therefore, their results lead to overall blurring and low perceptual quality. Inspired by CNN, recently generative adversarial network (GAN) [15] has demonstrated impressive performance and gained immense popularity in a variety of computer vision tasks. GAN is a class of neural network that learns to generate samples from a particular image input. It is comprised of two networks: a generator G and a discriminator D, which are in competition with each other. In fact, the generator learns to generate new samples and the discriminator learns to-distinguish between the generated samples and the real data points. In the GAN model each network wishes to minimize its own cost function, i.e. f^D(θ^D, θ^G) for the discriminator and f^G(θ^D, θ^G) for the generator. Generating super resolution images is a difficult task. Firstly, due to the lack of capacity to obtain small details (which are simply visible in a super-resolution images), and secondly, since the training process is unstable and lengthy. Recently, it has been pointed out that the main reason for these issues is the high dimensional spaces, which could be handled by a proper objective function [37]. By using an indecent loss, the discriminator recognizes the forgery samples (the generated samples) as the real samples with the least errors, because the samples are in the correct side of the margin boundary. This wrong decision has a negative impact on the updating process of the generator. In addition, due to these complex nets, the GAN architecture is unstable and it is crucial to set up a network in the best way possible. To effectively settle the current issues in GAN based super-resolution models, we propose a new GAN model, which is based on an image-to-image model by organizing a gradual learning process from the small upsampling factors to the large upsampling factors. The loss function has the operative driver in the learning network. However, this key issue has not been properly considered before. Most existing methods try to improve the results by optimizing the network structure or designing new layers, and generally, they used the defaults loss [1], [3]. These local losses are poorly correlated with the image's quality as it is perceived by a human observer. If the discriminator is considered as an energy-based function, then we can improve GAN stability. Based on these observations, this paper centered largely on the loss function and we designed a new discriminator that used the least square loss function and gradually training following generator; the parameters of the proposed least square model is simple to implement and has a fast computation rate. We proved that our GAN model has ability to deal with multiscale factors (up to  × 8). In the end, we proved that the proposed model adopting the least square is more stable than using Wasserstein GAN. This proposed learning process (simple to advanced) allows us to significantly improve the training result and could retain all the image information. To improve the image resolution and obtain realistic results, we designed our discriminator based on a least square function. The features obtained from the discriminator are exploited in order to create a more robust objective function, in contrast with current GAN which uses a classification network to generate the loss function. Least square [42] has the ability to appropriately separate the fake samples from the real samples by marginalizing the fake samples. In fact, the least square function controls the samples based on their distance to the margin, and so it helps to find more real samples for updating the generator. In this paper, we proved the power of the least square function to alleviate the current problems, by generating more gradient for updating the generator.

Our contributions are four-fold: (i) we proposed a new variation of generative adversarial network with adopting least square loss function for the discriminator which enables a stepwise quality enhancement by using the output of the previous layer. (ii) Opposed to the existing methods, we replaced the batch normalization with instance normalization [43] to obtain all the vital information. (iii) We evaluated the proposed model over several datasets and conducted two sets of experiments, direct learning strategy via the gradual learning strategy. (iv) in addition, we observed that the residual learning is beneficial in our model, as it speeds up the convergence. Thus, we adopted dense residual learning (contains both dense and skip connections) in our proposed architecture to simplify the training process. In fact, our contribution mainly focuses on this ongoing discussion (apply densely connected residual network in the adversarial networks, and also adapting gradual learning strategy instead of direct learning). In order to show the effect of least square in adversarial networks, we evaluated the result of our network with different loss functions, including Wasserstein [13]. We believe, the discrminator of our model can be prevented from becoming over-confident by adopting least square loss and it enables the generator to generate higher quality images in comparison with other approaches. The rest of this paper is organized as follows. In Section 2, we discussed the related works. Section 3 presents the proposed model architecture. Section 4 shows the experimental results and evaluation results. Finally, Section 5 concludes the paper.

Section snippets

Related works

In this section, we present a brief description of the existing methods and the background concepts, which are helpful for understanding our model. The Generator adversarial network (GAN) was first introduced by Goodfellow et al. [15] and the main idea behind it was to define a mutual game between two networks: discriminator D and generator G. The generator input is noise that generates samples as output. While the discriminator receives the real and the generated samples, it is optimized to

Proposed method

Recently GAN [15] have demonstrated great performance in various tasks. However, in image super-resolution, the quality of images which are generated by GANs still does not meet the real images’ resolution. One of the main concerns in this regard can be the loss function; usually, the loss function which is used in some of GAN models only works properly at the initial steps. Consequently, the discriminator cannot provide the right information for updating the generator. In regular GAN, while

Experimental evaluation

In this section, we evaluate the performance of the proposed model and conduct a series of experiments to compare it with other prominent methods especially WGAN, ResGAN, GP-GAN, and DCGAN. This paper used four benchmark datasets for the experiments including; Set5, Set14, BSD100, and Urban-100. All experiments are achieved with the highest scale factors, 4×, 6× and 8× between low and high-resolution images. We have used the following measures to fairly evaluate the performance of different

Conclusion

In this paper, we address three well-known issues in image super resolution approaches; improving the image resolution in particular perceptual quality, because adversarial training generally produces artifacts in the outputs which can degrade the image textures. Second component lies on improving the training stability. And the third component is to improve the model in term of runtime. Thus, we proposed an efficient GAN model which is able to produce state of the art results based on

Declaration of Competing Interest

None.

Pourya Shamsolmoali, Received PhD degree in computer science and graduated from Jamia Hamdard University, India and Shanghai Jiao Tong University, China, from 2016 to 2017 he was Associate researcher at the Advanced Scientific Computing Division in Euro-Mediterranean Center on Climate Change Foundation, Italy. Currently he is a researcher at Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University. In 2018 he selected as a young talented scientist by China ministry

References (44)

WuW. et al.
A new framework for remote sensing image super resolution: sparse representation-based method by processing dictionaries with multi-type features
J. Syst. Archit.
(2016)
FengZ. et al.
Image super-resolution via a densely connected recursive network
Neurocomputing
(2018)
LiangY. et al.
Incorporating image priors with deep convolutional neural networks for image super-resolution
Neurocomputing
(2016)
X. Wang et al.
Recovering realistic texture in image super-resolution by deep spatial feature transform
P. Shamsolmoali et al.
Deep convolution network for surveillance records super-resolution
Multimed. Tools Appl.
(2018)
D.H. Trinh et al.
Novel example-based method for super-resolution and denoising of medical images
IEEE Trans. Image Process.
(2014)
YuX. et al.
Ultra-resolving face images by discriminative generative networks
ParkJ. et al.
A high-throughput 16× super resolution processor for real-time object recognition soc
HeK. et al.
Delving deep into rectifiers: surpassing human-level performance on imagenet classification
A. Krizhevsky et al.
ImageNet classification with deep convolutional neural networks
J. Commun. ACM
(2017)

H. Wu, S. Zheng, J. Zhang, K. Huang, GP-GAN: Towards realistic high-resolution image blending. arXiv preprint...

TongT. et al.

Image super-resolution using dense skip connections

J. Zhao, M. Mathieu, Y. LeCun, Energy-based generative adversarial network. arXiv preprint arXiv:1609.03126,...

D.P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” CoRR, vol. abs/1412.6980,...

M. Arjovsky et al.

Wasserstein generative adversarial networks

Guo-Jun Qi, Loss-sensitive generative adversarial networks on lipschitz densities. arXiv preprint arXiv:1701.06264,...

I. Goodfellow et al.

Generative adversarial nets

A. Radford, L. Metz, S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial...

C. Villani

Optimal transport: old and new

Am. Math. Soc.

(2009)

I. Gulrajani et al.

Improved training of wasserstein gans

M. Zareapor et al.

Diverse adversarial network for image super-resolution

Signal Process. Image Commun.

(2019)

S. Reed et al.

Generative adversarial text-to-image synthesis

Cited by (63)

Scale-Aware Frequency Attention network for super-resolution
2023, Neurocomputing
Scale-arbitrary super-resolution (SASR) has recently gained widespread attention in the community, which aims to super-resolve an image to arbitrary desired resolutions. Existing SASR methods always use a fixed convolution neural network (CNN) as the backbone to share the same image features for different resolutions, leading to limited reconstruction performance due to the lack of distinctive feature extraction for individual resolutions. Meanwhile, these CNN-based methods always tend to overfit low-frequency components while losing high-frequency ones. This frequency gap leads to blurred structures in the reconstructed image. To address these issues, we propose a Scale Aware Frequency Attention network (SAFANet), which consists of Scale-Aware Frequency Attention (SAFA) modules and a Scale-Guided Continuous Reconstruction (SGCR) module. Additionally, frequency-domain attention learning is introduced to optimize the scale-adapted features to explicitly focus on high-frequency components, thereby overcoming the frequency restoration gap. Finally, the SGCR module combines the input scale as guidance with the optimized features to reconstruct images with arbitrary resolutions by using an implicit reconstruction function, thus adaptively achieving continuous image restoration. Our proposed SAFANet can be easily incorporated into existing CNN-based SR networks to achieve interactive SR at arbitrary scales. Extensive experiments demonstrate that SAFANet achieves superior performance, which is competitive with or even better than the state-of-the-art methods across a vast scale range ( $\times 1 \sim \times 30$ ) via a single model.
Blind image deblurring via content adaptive method
2023, Signal Processing: Image Communication
Blind image deblurring aims to obtain a clear image and blur kernel from a blurred image. Most existing methods estimate the blur kernel through the entire image. However, different image information, such as image structure information, smooth area information and noise information, contribute differently to blur kernel estimation. The uniform processing of various image information will reduce the accuracy of blur kernel estimation. In this paper, we propose a new blind deblurring method based on the content-weighted data fidelity term, which can focus more on the sharp edge to restore image structure. Moreover, we construct a new image prior to constrain the weight matrix. However, the content-weighted data fidelity term is a non-convex function. In this work, we introduce the variable splitting method to replace content-weighted matrix, which can be optimized by alternating iteration method. A large number of experiments show that the proposed deblurring algorithm can obtain the best performance on natural images and text images.
Channel attention generative adversarial network for super-resolution of glioma magnetic resonance image
2023, Computer Methods and Programs in Biomedicine
Citation Excerpt :
Wang et al. [12] added residual blocks to GANs and removed the batch norm (BN) operation, and proposed an enhanced SRGANs (ESRGANs), which significantly improved the quality of the reconstructed images and had more realistic detailed texture information. Shamsolmoali et al. [13] used progressive GANs (G- GANISR), and the discriminator used the least square loss function to replace the cross-entropy to improve the stability of training. Compared with the original GANs, G-GANISR has improved efficiency and stability.
Glioma is the most common primary craniocerebral tumor caused by the cancelation of glial cells in the brain and spinal cord, with a high incidence and cure rate. Magnetic resonance imaging (MRI) is a common technique for detecting and analyzing brain tumors. Due to improper hardware and operation, the obtained brain MRI images are low-resolution, making it difficult to detect and grade gliomas accurately. However, super-resolution reconstruction technology can improve the clarity of MRI images and help experts accurately detect and grade glioma.
We propose a glioma magnetic resonance image super-resolution reconstruction method based on channel attention generative adversarial network (CGAN). First, we replace the base block of SRGAN with a residual dense block based on the channel attention mechanism. Second, we adopt a relative average discriminator to replace the discriminator in standard GAN. Finally, we add the mean squared error loss to the training, consisting of the mean squared error loss, the L1 norm loss, and the generator's adversarial loss to form the generator loss function.
On the Set5, Set14, Urban100, and glioma datasets, compared with the state-of-the-art algorithms, our proposed CGAN method has improved peak signal-to-noise ratio and structural similarity, and the reconstructed glioma images are more precise than other algorithms.
The experimental results show that our CGAN method has apparent improvements in objective evaluation indicators and subjective visual effects, indicating its effectiveness and superiority.
DSB-GAN: Generation of deep learning based synthetic biometric data
2022, Displays
Deep learning-based generative networks have brought a significant change in the generation of synthetic biometric data. Synthetic biometric data finds applications in developing biometric systems and testing them on a large amount of data to analyze their performance on extreme load scenarios or run simulation for health care personnel training. Generally, biometric datasets have fewer training samples, due to which deep learning models do not train well. In the proposed DSB-GAN, a generative model based on convolutional autoencoder (CAE) and generative adversarial network (GAN) is used to generate realistic synthetic biometrics for various modalities such as fingerprint, iris, and palmprint. This generated data ensures the availability of data that is not available in general due to various undesired factors like distortion and corruption of data. The model is resource efficient and generates diverse biometric samples as compared to state-of-the-art methods.
Close-set camera style distribution alignment for single camera person re-identification
2022, Neurocomputing
The purpose of person re-identification (ReID) is to find the same person under different cameras and the basic difficulty lies in the need for large amounts of cross-camera pedestrian annotations. In reality, annotating cross-camera pedestrians is time-consuming especially in large-scale surveillance camera networks. This paper focuses on addressing the ReID problem under single-camera training (SCT) setting, where each person of the training set only appears in one camera. Due to the lack of cross-camera pedestrian annotations, it is difficult to effectively eliminate the camera style interference by narrowing the distance between the image features of the same person. To address this problem, we propose a close-set camera style distribution alignment (C²SDA) framework for SCT ReID. In order to reduce the camera-style interference from both instance- and distribution-levels simultaneously, we first design an instance-distribution camera style alignment module that directly aligns the feature distribution of input images under each camera and then trains the model at instance level with the aligned features. Secondly, we further design an augment close-set camera style distribution module that transforms the camera feature distribution alignment problem from open-set into close-set one, which preserving the discriminative ability of features during the alignment process. Experimental results verify that our framework can significantly improve the ReID performance under SCT setting and surpass the current SOTA methods. The source code is available at https://github.com/HongweiZhang97/CCSDA.
Dilated Adversarial U-Net Network for automatic gross tumor volume segmentation of nasopharyngeal carcinoma
2021, Applied Soft Computing
Citation Excerpt :
However, original GAN has poor stability and is prone to mode collapse in the training images [22]. Many variations of GANs have been proposed for improving the quality and stability of generated images, such as deep convolutional GAN (DCGAN) [23], conditional GAN (CGAN) [24], gradual GAN [25] and Wasserstein GAN (WGAN) [26]. In computer vision, GANs have made great achievements [27].
Nasopharyngeal carcinoma (NPC) is a malignant tumor in the nasopharyngeal epithelium and is mainly treated by radiotherapy. The accurate delineation of the target tumor can greatly improve the radiotherapy effectiveness. However, due to the small size of the NPC imaging volume, the scarcity of labeled samples, the low signal-to-noise ratio in small target areas and the lack of detailed features, automatic gross tumor volume (GTV) delineation inspired by advances in domain adaption for high-resolution image processing has become a great challenge. In addition, since computed tomography (CT) images have the low resolution of soft tissues, it is difficult to identify small volume tumors, and segmentation accuracy of this kind of small GTV is very low. In this paper, we propose an automatic segmentation model based on adversarial network and U-Net for NPC delineation. Specifically, we embed adversarial classification learning into a segmentation network to balance the distribution differences between the small targets in the sample and the large target categories. To reduce the loss weight of large target categories with large samples, and simultaneously increase the weight of small target categories, we design a new U-Net based on focal loss as a GTV segmentation model for adjusting the effect of different categories on the final loss. This method can effectively solve the feature bias caused by the imbalance of the target volume distribution. Furthermore, we conduct a pre-processing of images using an algorithm based on distribution histograms to ensure that the same or approximate CT value represents the same organization. In order to evaluate our proposed method, we perform experiments on the open datasets from StructSeg2019 and the datasets provided by Sichuan Provincial Cancer Hospital. The results of the comparison with some typical up-to-date methods demonstrate that our model can significantly enhance detection accuracy and sensitivity for NPC segmentation.

View all citing articles on Scopus

Masoumeh Zareapoor, received Ph.D in computer science from Jamia Hamdard University, New Delhi, India in 2015. Currently, she is working as associate researcher in Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University. Prior to that, she was associate researcher in Tokyo University of technology. Her research activities focus on Computer Vision, Image Processing, and Machine Learning.

Ruili Wang received the Ph.D. degree in computer science from Dublin City University, Dublin, Ireland. He is currently a Professor of Artificial Intelligence with the School of Natural and Computational Sciences, Massey University, Auckland, New Zealand, and the Director of the Centre of Language and Speech Processing. His research interests include speed processing, language processing, image processing, data mining, and intelligent systems. Dr. Wang is an Associate Editor and an Editorial Board member for international journals, such as Knowledge and Information Systems, Applied Soft Computing, etc. He was the recipient of the Marsden Fund, one of the most prestigious research grants in New Zealand.

Deepak Kumar Jain, received PhD. from National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences (CASIA), Beijing, China. His research interests include computer vision, artificial Intelligence and face recognition.

Jie Yang, received his Ph.D. from the Department of Computer Science, Hamburg University, Germany, in 1994. Currently, he is a professor at the Institute of Image Processing and Pattern recognition, Shanghai Jiao Tong University, China. He has led many research projects (e.g., National Science Foundation, 863 National High Tech. Plan), had one book published in Germany, and authored more than 200 journal papers. His major research interests are object detection and recognition, data fusion and data mining, and medical image processing.

View full text

G-GANISR: Gradual generative adversarial network for image super resolution

Abstract

Introduction

Section snippets

Related works

Proposed method

Experimental evaluation

Conclusion

Declaration of Competing Interest

J. Syst. Archit.

Neurocomputing

Neurocomputing

Deep convolution network for surveillance records super-resolution

Multimed. Tools Appl.

Novel example-based method for super-resolution and denoising of medical images

IEEE Trans. Image Process.

Ultra-resolving face images by discriminative generative networks

A high-throughput 16× super resolution processor for real-time object recognition soc

Delving deep into rectifiers: surpassing human-level performance on imagenet classification

ImageNet classification with deep convolutional neural networks

J. Commun. ACM

Image super-resolution using dense skip connections

Wasserstein generative adversarial networks

Generative adversarial nets

Optimal transport: old and new

Am. Math. Soc.

Improved training of wasserstein gans

Diverse adversarial network for image super-resolution

Signal Process. Image Commun.

Generative adversarial text-to-image synthesis