Residual scale attention network for arbitrary scale image super-resolution

doi:10.1016/j.neucom.2020.11.010

Neurocomputing

Volume 427, 28 February 2021, Pages 201-211

https://doi.org/10.1016/j.neucom.2020.11.010 Get rights and content

Abstract

Research on super-resolution has achieved great success on synthetic data with deep convolutional neural networks. Some recent works tend to apply super-resolution to practical scenarios. Learning an accurate and flexible model for super-resolution of arbitrary scale factor is important for realistic applications, while most existing works only focus on integer scale factor. In this work, we present a residual scale attention network for super-resolution of arbitrary scale factor. Specifically, we design a scale attention module to learn discriminative features of low-resolution images by introducing the scale factor as prior knowledge. Then, we utilize quadratic polynomial of the coordinate information and scale factor to predict pixel-wise reconstruction kernels and achieve super-resolution of arbitrary scale factor. Besides, we use the predicted reconstruction kernels in image domain to interpolate low-resolution image and obtain coarse high-resolution image first, then make our main network learn high-frequency residual image from feature domain. Extensive experiments on both synthetic and real data show that the proposed method outperforms state-of-the-art super-resolution methods of arbitrary scale factor in terms of both objective metrics and subjective visual quality.

Introduction

Single image super-resolution (SISR) is a fundamental low-level vision task, and aims to restore a high-resolution (HR) image $I^{HR}$ from a single low-resolution (LR) image $I^{LR}$ . It is a severely ill-posed inverse problem and has been studied for decades. In the past few years, convolutional neural network (CNN) based methods have made a remarkable improvement on SISR since Dong et al. proposed a three-layers convolutional neural network called SRCNN [1].

Recently, research tends to seek for realistic applications, which needs an accurate and flexible model. To obtain the accurate model, some CNN-based super-resolution works [2], [3], [4] make efforts to capture real data for training the network model. For flexibility, some work [5] focuses on developing upsampling method.

To upsample an image, some CNN-based super-resolution methods [1], [6], [7] interpolate the LR image to the target resolution before feed it into the network. This kind of method can achieve super-resolution of arbitrary scale factor using interpolation algorithm such as bicubic [8], but it may introduce some side effects, e.g. noise amplification and blurring. More importantly, these methods increase the network’s calculation and make it hard to run in real-time for realistic applications.

To avoid these problems, some CNN-based methods [9], [10], [11] feed the LR image into a network and upsample it within the network, where deconvolution module [12] or sub-pixel convolution module [13] is usually utilized. These methods mostly upscale images with integer scale factor, which limits the flexibility of models for zooming in arbitrary scale factor in realistic applications. To tackle this limitation, Hu et al. [5] introduced the meta-upscale module to achieve super-resolution of arbitrary scale factor, but did not explicitly consider the effect of different scale factors during the feature learning process.

In this paper, we present a residual scale attention network for image super-resolution of arbitrary scale factor, where we introduce scale factor as prior knowledge for the network to distinguish different data distributions and generate appropriate parameters automatically to assist feature learning. In our scale attention module, the scale factor is learned with statistic encoding of feature maps, to rescale the convolution filters with attention coefficients adaptively. This mechanism can dynamically generate attention coefficients to guide the convolution filters to extract more informative features according to current LR data distribution. Then, we utilize the quadratic polynomial of coordinate offset and scale factor as encoding vectors dynamically to predict pixel-wise reconstruction kernels, which benefits the non-linearity learning of the mapping between the reconstruction kernels and coordinate information. Besides, residual learning strategy is utilized in our reconstruction process, where the predicted kernels in image domain directly interpolate the LR image to coarse super-resolution image, and the predicted kernels in feature domain transform the learned feature maps into residual image. This makes the main network focus on the recovery of high-frequency information. Experimental results of our network on both synthetic and real data outperform state-of-the-art super-resolution methods of arbitrary scale factor under both comprehensive quantitative metrics and perceptive quality.

In summary, our main contributions are that

1.
We present a scale attention module to assist the network to distinguish LR images of different downsampling scale factors, and adaptively rescale the convolution filters in terms of different scale factors and convolve it with feature maps. It improves the ability to learn discriminative features for arbitrary scale factor.
2.
We utilize the quadratic polynomial of coordinate information and scale factor to dynamically predict pixel-wise reconstruction kernels. The quadratic polynomial contributes to learning the non-linearity relationship between reconstruction kernels and coordinate information, which helps more accurate reconstruction.
3.
We combine the predicted pixel-wise reconstruction kernels with residual learning strategy to boost the reconstruction accuracy. Experimental results demonstrate the effectiveness of the proposed residual scale attention network on both synthetic and real datasets for image super-resolution of arbitrary scale factor.

Section snippets

Related work

Traditional SISR methods in the literature can be divided into three categories, i.e. interpolation-based methods [8], [14], reconstruction-based methods [15], [16], and learning-based methods [17], [18], [19]. Interpolation-based methods are simple and fast, but their performance is usually not good enough. Reconstruction-based methods generally have many hyper-parameters requiring manual tuning, and the solving process is complicated. Traditional learning-based methods require a database of

Residual scale attention network

In this section, we introduce details of the proposed residual scale attention network (RSAN) and analyze how this model boosts the performance in SISR with arbitrary scale factor. The problem of super-resolution of arbitrary scale factor is first formulated. Then we introduce the whole architecture, scale attention module, and the residual meta-upscale reconstruction. Finally, the network setting and learning details are provided.

Experiments

In this section, a series of experiments are conducted to evaluate the performance of the proposed method. We first describe the experimental details and then compare our method with several state-of-the-arts which could be used for SISR of arbitrary scale factor.

Conclusion

In this paper, we propose a residual scale attention network (RSAN) to enhance the capability for super-resolution of arbitrary scale factor in a single model. We develop a scale attention module that could adaptively rescale the convolution filters to help learn more informative features from low-resolution images according to the specified scale factor. This module can also replace the traditional convolution module to improve the learning ability of the network for super-resolution of

CRediT authorship contribution statement

Ying Fu: Conceptualization, Validation, Writing - review & editing. Jian Chen: Software, Investigation, Writing - original draft. Tao Zhang: Methodology, Investigation. Yonggang Lin: Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was partially supported by the National Natural Science Foundation of China under Grant No. 61672096.

Ying Fu received the B.S. degree in Electronic Engineering from Xidian University, China, in 2009, the M.S. degree in Automation from Tsinghua University, China, in 2012, and the Ph.D. degree in Information Science and Technology from the University of Tokyo, Japan, in 2015. She is currently a Professor in the School of Computer Science and Technology, Beijing Institute of Technology. Her research interests include physics-based vision, image and video processing, and computational photography.

References (41)

C. Dong et al.
Learning a deep convolutional network for image super-resolution
Proceedings of the European Conference on Computer Vision
(2014)
C. Chen et al.
Camera lens super-resolution
X. Zhang et al.
Zoom to learn, learn to zoom
J. Cai et al.
Toward real-world single image super-resolution: a new benchmark and a new model
X. Hu et al.
Meta-sr: a magnification-arbitrary network for super-resolution
J. Kim et al.
Accurate image super-resolution using very deep convolutional networks
Y. Tai et al.
Image super-resolution via deep recursive residual network
R. Keys
Cubic convolution interpolation for digital image processing
IEEE Trans. Acoust. Speech Signal Process.
(1981)
B. Lim et al.
Enhanced deep residual networks for single image super-resolution
Y. Zhang et al.
Residual dense network for image super-resolution

Y. Qiu et al.

Embedded block residual network: a recursive restoration model for single-image super-resolution

C. Dong et al.

Accelerating the super-resolution convolutional neural network

Proceedings of the European Conference on Computer Vision

(2016)

W. Shi et al.

Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network

C.E. Duchon

Lanczos filtering in one and two dimensions

J. Appl. Meteorol.

(1979)

J. Sun et al.

Image super-resolution using gradient profile prior

A. Marquina et al.

Image super-resolution by tv-regularization and bregman iteration

J. Sci. Comput.

(2008)

H. Chang, D.-Y. Yeung, Y. Xiong, Super-resolution through neighbor embedding, in: Proceedings of the IEEE Conference on...

J. Yang et al.

Image super-resolution via sparse representation

IEEE Trans. Image Process.

(2010)

C. Dang et al.

Fast single-image super-resolution via tangent space learning of high-resolution-patch manifold

IEEE Trans. Comput. Imag.

(2017)

C. Ledig et al.

Photo-realistic single image super-resolution using a generative adversarial network

Cited by (30)

Arbitrary-scale Super-resolution via Deep Learning: A Comprehensive Survey
2024, Information Fusion
Super-resolution (SR) is an essential class of low-level vision tasks, which aims to improve the resolution of images or videos in computer vision. In recent years, significant progress has been made in image and video super-resolution techniques based on deep learning. Nevertheless, most of the methods only consider SR with a few integer scale factors, which limits the application of the SR techniques to real-world problems. Recently, the methods to achieve arbitrary-scale super-resolution via a single model have attracted much attention. However, there is no work to thoroughly analyze the arbitrary-scale methods based on deep learning. In this work, we present a comprehensive and systematic review of 45 existing deep learning-based methods for arbitrary-scale image and video SR. We first classify the existing SR methods according to the resolved scale factors. Furthermore, we propose an in-depth taxonomy for state-of-the-art methods based on the core problem of how to achieve arbitrary-scale super-resolution, i.e., how to perform arbitrary-scale upsampling. Moreover, the performance of existing arbitrary-scale SR methods is compared, and their advantages and limitations are analyzed. We also provide some guidance for the selection of these methods in different real-world applications. Finally, we briefly discuss the future directions of arbitrary-scale super-resolution, which shows some inspirations for the progress of subsequent works on arbitrary-scale image and video super-resolution tasks. The repository of this work is available at https://github.com/Weepingchestnut/Arbitrary-Scale-SR.
An efficient multi-scale learning method for image super-resolution networks
2024, Neural Networks
The image super-resolution (SR) operation holds multiple solutions with the one-to-many mapping from low-resolution (LR) to high-resolution (HR) space. However, the SR of different scales for the same image is usually regarded as independent tasks in the existing SR networks. Therefore, these networks are inflexible to effectively utilize feature learning experience and require much more computing time to recover HR images in higher resolutions. Recent arbitrary scale SR methods still cannot solve these problems. To efficiently and effectively recover HR images, this paper presents an efficient multi-scale learning method for image SR networks based on a novel self-generating (SG) mechanism. This method (briefly named SG-SR) utilizes the feature learning results of SR networks to generate upscale filters by using the novel SG upscale module, which is proposed to replace the traditional upscale module. For each scale factor, the SG upscale module provides the corresponding amount of the spatial weights to filter the LR tensor and then converts filtered tensors with the original tensor to corresponding HR images. The proposed method is evaluated through extensive experiments and compared with state-of-the-art (SOTA) methods on widely used benchmark datasets. The experimental results show that our method has superior performance compared with SOTA methods, and the SG upscale module can improve the performance of existing SR networks effectively. What is more, our module has a much less calculation cost than the other upscale modules.
Multi-scale convolutional attention network for lightweight image super-resolution
2023, Journal of Visual Communication and Image Representation
Convolutional neural network (CNN) based methods have recently achieved extraordinary performance in single image super-resolution (SISR) tasks. However, most existing CNN-based approaches increase the model’s depth by stacking massive kernel convolutions, bringing expensive computational costs and limiting their application in mobile devices with limited resources. Furthermore, large kernel convolutions are rarely used in lightweight super-resolution designs. To alleviate the above problems, we propose a multi-scale convolutional attention network (MCAN), a lightweight and efficient network for SISR. Specifically, a multi-scale convolutional attention (MCA) is designed to aggregate the spatial information of different large receptive fields. Since the contextual information of the image has a strong local correlation, we design a local feature enhancement unit (LFEU) to further enhance the local feature extraction. Extensive experimental results illustrate that our proposed MCAN can achieve better performance with lower model complexity compared with other state-of-the-art lightweight methods.
Sea surface height data reconstruction via inter and intra layer features based on dual attention
2023, Neurocomputing
Understanding of geoscience relies on spatio-temporal continuous fields, such as Sea Surface Height (SSH). Data reconstruction for restoring spatio-temporal continuous and gridded maps from incomplete observations and inaccuracy interpolation products has long been a crucial challenge in the marine geoscience. Despite remarkable progress, most existing learning-based reconstruction methods neglect to fully and discriminatively utilize the inter-layer and intra-layer features. The inter-layer semantics are complementary, and the significance of intra-layer components varies with frequency. To address these issues, we propose a multi-layer Feature Combination Network based on Attention mechanism (FCANet) for SSH data reconstruction. Specifically, a novel trainable Multi-layer Feature Combination Block (MFCB) is developed to enrich the features through inter-layer dependencies. Dual attention mechanism is introduced into the MFCB to enhance saliency of intra-layer features by adaptively rescaling the spatial-wise and channel-wise features. Furthermore, experimental results on SSH variable demonstrate the superiority of our FCANet over state-of-the-art reconstruction methods in the metrics of RMSE and SSIM.
Progressive representation recalibration for lightweight super-resolution
2022, Neurocomputing
Citation Excerpt :
SISR technology arises in a wide range of fields, such as video processing [2,3], medicine [4,5], remote sensing [6,7], and mobile devices [8,9]. To pursue satisfactory performance, SISR models are designed with larger parameters and greater complexity [10]. It is highly challenging to deploy these methods on mobile devices with limited computing and storage resources.
Recently, the lightweight single-image super-resolution (SISR) task has received increasing attention due to the computational complexities and sizes of convolutional neural network (CNN)-based SISR models and the explosive demand in applications on resource-limited edge devices. Current algorithms reduce the number of layers and channels in CNNs to obtain lightweight models for this task. However, these algorithms may reduce the representation ability of the learned features due to information loss, inevitably leading to poor performance. In this work, we propose the progressive representation recalibration network (PRRN), a new lightweight SISR network to learn complete and representative feature representations. Specifically, a progressive representation recalibration block (PRRB) is developed to extract useful features from pixel and channel spaces in a two-stage approach. In the first stage, PRRB utilizes pixel and channel information to explore important feature regions. In the second stage, channel attention is further used to adjust the distribution of important feature channels. In addition, current channel attention mechanisms utilize nonlinear operations that may lead to information loss. In contrast, we design a shallow channel attention (SCA) mechanism that can learn the importance of each channel in a simpler yet more efficient way. Extensive experiments demonstrate the superiority of the proposed PRRN.
Super-resolution of very low-resolution face images with a wavelet integrated, identity preserving, adversarial network
2022, Signal Processing: Image Communication
Citation Excerpt :
This solution avoids gradient vanishing in deep network structures and speeds up networks convergence. The residual learning is widely used in super-resolution networks [3,5,18] to make very deep architectures possible. However, using deeper structures requires more device memory space.
Super-resolution of face images, known as Face Hallucination (FH), has been excessively studied in recent years. Modern FH methods use deep Convolution Neural Networks (CNN) with a pixel-wise MSE loss function to infer high-resolution facial images. The MSE-oriented approaches generate over-smooth results, particularly when dealing with very low-resolution images. Recently, Generative Adversarial Networks (GANs) have successfully been exploited to synthesize perceptually more pleasant images. However, the GAN-based models do not guarantee identity preservation during face super-resolution. To address these challenges, we have proposed a novel Wavelet-integrated, Identity Preserving, Adversarial (WIPA) approach. Specifically, we present Wavelet Prediction blocks attached to a Baseline CNN network to predict wavelet missing details of facial images. The extracted wavelet coefficients are concatenated with original feature maps in different scales to recover fine details. Unlike other wavelet-based FH methods, this algorithm exploits the wavelet-enriched feature maps as complementary information to facilitate the hallucination task. We introduce a wavelet prediction loss to push the network to generate wavelet coefficients. In addition to the wavelet-domain cost function, a combination of perceptual, adversarial, and identity loss functions has been utilized to achieve low-distortion and perceptually high-quality images while maintaining identity. The extensive experiments prove the superiority of the proposed approach over the state-of-the-art methods by achieving PSNR of 25.16 dB for CelebA dataset and verification rate of 86.1% for LFW dataset; both conducted on 8X magnification factor.

View all citing articles on Scopus

Jian Chen received the B.S. degree from the School of Computer Science and Technology, Beijing Institute of Technology, China, in 2018. He is currently pursuing the M.S. degree in the School of Computer Science and Technology, Beijing Institute of Technology, China. His main research interests include image processing, computer vision, and computational photography.

Tao Zhang received the B.S. degree from the School of Computer Science and Technology, Beijing Institute of Technology, China, in 2017. He is currently pursuing the Ph.D. degree in the School of Computer Science and Technology, Beijing Institute of Technology, China. His research interests include deep learning, image processing, and computational photography.

Yonggang Lin received the B.S. degree in Mechatronic Engineering from Harbin Institute of Technology, China, in 1997, the M.S. degree in Operation Science and Control Theory from Harbin Institute of Technology, China, in 1999. He is currently a lecturer in the School of Computer Science and Technology, Beijing Institute of Technology. His research interests include image processing and International Collegiate Programming Contest.

View full text

Residual scale attention network for arbitrary scale image super-resolution

Abstract

Introduction

Section snippets

Related work

Residual scale attention network

Experiments

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Learning a deep convolutional network for image super-resolution

Proceedings of the European Conference on Computer Vision

Camera lens super-resolution

Zoom to learn, learn to zoom

Toward real-world single image super-resolution: a new benchmark and a new model

Meta-sr: a magnification-arbitrary network for super-resolution

Accurate image super-resolution using very deep convolutional networks

Image super-resolution via deep recursive residual network

Cubic convolution interpolation for digital image processing

IEEE Trans. Acoust. Speech Signal Process.

Enhanced deep residual networks for single image super-resolution

Residual dense network for image super-resolution

Embedded block residual network: a recursive restoration model for single-image super-resolution

Accelerating the super-resolution convolutional neural network

Proceedings of the European Conference on Computer Vision

Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network

Lanczos filtering in one and two dimensions

J. Appl. Meteorol.

Image super-resolution using gradient profile prior

Image super-resolution by tv-regularization and bregman iteration

J. Sci. Comput.

Image super-resolution via sparse representation

IEEE Trans. Image Process.

Fast single-image super-resolution via tangent space learning of high-resolution-patch manifold

IEEE Trans. Comput. Imag.

Photo-realistic single image super-resolution using a generative adversarial network