Elsevier

Neurocomputing

Volume 427, 28 February 2021, Pages 201-211
Neurocomputing

Residual scale attention network for arbitrary scale image super-resolution

https://doi.org/10.1016/j.neucom.2020.11.010Get rights and content

Abstract

Research on super-resolution has achieved great success on synthetic data with deep convolutional neural networks. Some recent works tend to apply super-resolution to practical scenarios. Learning an accurate and flexible model for super-resolution of arbitrary scale factor is important for realistic applications, while most existing works only focus on integer scale factor. In this work, we present a residual scale attention network for super-resolution of arbitrary scale factor. Specifically, we design a scale attention module to learn discriminative features of low-resolution images by introducing the scale factor as prior knowledge. Then, we utilize quadratic polynomial of the coordinate information and scale factor to predict pixel-wise reconstruction kernels and achieve super-resolution of arbitrary scale factor. Besides, we use the predicted reconstruction kernels in image domain to interpolate low-resolution image and obtain coarse high-resolution image first, then make our main network learn high-frequency residual image from feature domain. Extensive experiments on both synthetic and real data show that the proposed method outperforms state-of-the-art super-resolution methods of arbitrary scale factor in terms of both objective metrics and subjective visual quality.

Introduction

Single image super-resolution (SISR) is a fundamental low-level vision task, and aims to restore a high-resolution (HR) image IHR from a single low-resolution (LR) image ILR. It is a severely ill-posed inverse problem and has been studied for decades. In the past few years, convolutional neural network (CNN) based methods have made a remarkable improvement on SISR since Dong et al. proposed a three-layers convolutional neural network called SRCNN [1].

Recently, research tends to seek for realistic applications, which needs an accurate and flexible model. To obtain the accurate model, some CNN-based super-resolution works [2], [3], [4] make efforts to capture real data for training the network model. For flexibility, some work [5] focuses on developing upsampling method.

To upsample an image, some CNN-based super-resolution methods [1], [6], [7] interpolate the LR image to the target resolution before feed it into the network. This kind of method can achieve super-resolution of arbitrary scale factor using interpolation algorithm such as bicubic [8], but it may introduce some side effects, e.g. noise amplification and blurring. More importantly, these methods increase the network’s calculation and make it hard to run in real-time for realistic applications.

To avoid these problems, some CNN-based methods [9], [10], [11] feed the LR image into a network and upsample it within the network, where deconvolution module [12] or sub-pixel convolution module [13] is usually utilized. These methods mostly upscale images with integer scale factor, which limits the flexibility of models for zooming in arbitrary scale factor in realistic applications. To tackle this limitation, Hu et al. [5] introduced the meta-upscale module to achieve super-resolution of arbitrary scale factor, but did not explicitly consider the effect of different scale factors during the feature learning process.

In this paper, we present a residual scale attention network for image super-resolution of arbitrary scale factor, where we introduce scale factor as prior knowledge for the network to distinguish different data distributions and generate appropriate parameters automatically to assist feature learning. In our scale attention module, the scale factor is learned with statistic encoding of feature maps, to rescale the convolution filters with attention coefficients adaptively. This mechanism can dynamically generate attention coefficients to guide the convolution filters to extract more informative features according to current LR data distribution. Then, we utilize the quadratic polynomial of coordinate offset and scale factor as encoding vectors dynamically to predict pixel-wise reconstruction kernels, which benefits the non-linearity learning of the mapping between the reconstruction kernels and coordinate information. Besides, residual learning strategy is utilized in our reconstruction process, where the predicted kernels in image domain directly interpolate the LR image to coarse super-resolution image, and the predicted kernels in feature domain transform the learned feature maps into residual image. This makes the main network focus on the recovery of high-frequency information. Experimental results of our network on both synthetic and real data outperform state-of-the-art super-resolution methods of arbitrary scale factor under both comprehensive quantitative metrics and perceptive quality.

In summary, our main contributions are that

  • 1.

    We present a scale attention module to assist the network to distinguish LR images of different downsampling scale factors, and adaptively rescale the convolution filters in terms of different scale factors and convolve it with feature maps. It improves the ability to learn discriminative features for arbitrary scale factor.

  • 2.

    We utilize the quadratic polynomial of coordinate information and scale factor to dynamically predict pixel-wise reconstruction kernels. The quadratic polynomial contributes to learning the non-linearity relationship between reconstruction kernels and coordinate information, which helps more accurate reconstruction.

  • 3.

    We combine the predicted pixel-wise reconstruction kernels with residual learning strategy to boost the reconstruction accuracy. Experimental results demonstrate the effectiveness of the proposed residual scale attention network on both synthetic and real datasets for image super-resolution of arbitrary scale factor.

Section snippets

Related work

Traditional SISR methods in the literature can be divided into three categories, i.e. interpolation-based methods [8], [14], reconstruction-based methods [15], [16], and learning-based methods [17], [18], [19]. Interpolation-based methods are simple and fast, but their performance is usually not good enough. Reconstruction-based methods generally have many hyper-parameters requiring manual tuning, and the solving process is complicated. Traditional learning-based methods require a database of

Residual scale attention network

In this section, we introduce details of the proposed residual scale attention network (RSAN) and analyze how this model boosts the performance in SISR with arbitrary scale factor. The problem of super-resolution of arbitrary scale factor is first formulated. Then we introduce the whole architecture, scale attention module, and the residual meta-upscale reconstruction. Finally, the network setting and learning details are provided.

Experiments

In this section, a series of experiments are conducted to evaluate the performance of the proposed method. We first describe the experimental details and then compare our method with several state-of-the-arts which could be used for SISR of arbitrary scale factor.

Conclusion

In this paper, we propose a residual scale attention network (RSAN) to enhance the capability for super-resolution of arbitrary scale factor in a single model. We develop a scale attention module that could adaptively rescale the convolution filters to help learn more informative features from low-resolution images according to the specified scale factor. This module can also replace the traditional convolution module to improve the learning ability of the network for super-resolution of

CRediT authorship contribution statement

Ying Fu: Conceptualization, Validation, Writing - review & editing. Jian Chen: Software, Investigation, Writing - original draft. Tao Zhang: Methodology, Investigation. Yonggang Lin: Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was partially supported by the National Natural Science Foundation of China under Grant No. 61672096.

Ying Fu received the B.S. degree in Electronic Engineering from Xidian University, China, in 2009, the M.S. degree in Automation from Tsinghua University, China, in 2012, and the Ph.D. degree in Information Science and Technology from the University of Tokyo, Japan, in 2015. She is currently a Professor in the School of Computer Science and Technology, Beijing Institute of Technology. Her research interests include physics-based vision, image and video processing, and computational photography.

References (41)

  • C. Dong et al.

    Learning a deep convolutional network for image super-resolution

    Proceedings of the European Conference on Computer Vision

    (2014)
  • C. Chen et al.

    Camera lens super-resolution

  • X. Zhang et al.

    Zoom to learn, learn to zoom

  • J. Cai et al.

    Toward real-world single image super-resolution: a new benchmark and a new model

  • X. Hu et al.

    Meta-sr: a magnification-arbitrary network for super-resolution

  • J. Kim et al.

    Accurate image super-resolution using very deep convolutional networks

  • Y. Tai et al.

    Image super-resolution via deep recursive residual network

  • R. Keys

    Cubic convolution interpolation for digital image processing

    IEEE Trans. Acoust. Speech Signal Process.

    (1981)
  • B. Lim et al.

    Enhanced deep residual networks for single image super-resolution

  • Y. Zhang et al.

    Residual dense network for image super-resolution

  • Y. Qiu et al.

    Embedded block residual network: a recursive restoration model for single-image super-resolution

  • C. Dong et al.

    Accelerating the super-resolution convolutional neural network

    Proceedings of the European Conference on Computer Vision

    (2016)
  • W. Shi et al.

    Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network

  • C.E. Duchon

    Lanczos filtering in one and two dimensions

    J. Appl. Meteorol.

    (1979)
  • J. Sun et al.

    Image super-resolution using gradient profile prior

  • A. Marquina et al.

    Image super-resolution by tv-regularization and bregman iteration

    J. Sci. Comput.

    (2008)
  • H. Chang, D.-Y. Yeung, Y. Xiong, Super-resolution through neighbor embedding, in: Proceedings of the IEEE Conference on...
  • J. Yang et al.

    Image super-resolution via sparse representation

    IEEE Trans. Image Process.

    (2010)
  • C. Dang et al.

    Fast single-image super-resolution via tangent space learning of high-resolution-patch manifold

    IEEE Trans. Comput. Imag.

    (2017)
  • C. Ledig et al.

    Photo-realistic single image super-resolution using a generative adversarial network

  • Cited by (30)

    • Multi-scale convolutional attention network for lightweight image super-resolution

      2023, Journal of Visual Communication and Image Representation
    • Progressive representation recalibration for lightweight super-resolution

      2022, Neurocomputing
      Citation Excerpt :

      SISR technology arises in a wide range of fields, such as video processing [2,3], medicine [4,5], remote sensing [6,7], and mobile devices [8,9]. To pursue satisfactory performance, SISR models are designed with larger parameters and greater complexity [10]. It is highly challenging to deploy these methods on mobile devices with limited computing and storage resources.

    • Super-resolution of very low-resolution face images with a wavelet integrated, identity preserving, adversarial network

      2022, Signal Processing: Image Communication
      Citation Excerpt :

      This solution avoids gradient vanishing in deep network structures and speeds up networks convergence. The residual learning is widely used in super-resolution networks [3,5,18] to make very deep architectures possible. However, using deeper structures requires more device memory space.

    View all citing articles on Scopus

    Ying Fu received the B.S. degree in Electronic Engineering from Xidian University, China, in 2009, the M.S. degree in Automation from Tsinghua University, China, in 2012, and the Ph.D. degree in Information Science and Technology from the University of Tokyo, Japan, in 2015. She is currently a Professor in the School of Computer Science and Technology, Beijing Institute of Technology. Her research interests include physics-based vision, image and video processing, and computational photography.

    Jian Chen received the B.S. degree from the School of Computer Science and Technology, Beijing Institute of Technology, China, in 2018. He is currently pursuing the M.S. degree in the School of Computer Science and Technology, Beijing Institute of Technology, China. His main research interests include image processing, computer vision, and computational photography.

    Tao Zhang received the B.S. degree from the School of Computer Science and Technology, Beijing Institute of Technology, China, in 2017. He is currently pursuing the Ph.D. degree in the School of Computer Science and Technology, Beijing Institute of Technology, China. His research interests include deep learning, image processing, and computational photography.

    Yonggang Lin received the B.S. degree in Mechatronic Engineering from Harbin Institute of Technology, China, in 1997, the M.S. degree in Operation Science and Control Theory from Harbin Institute of Technology, China, in 1999. He is currently a lecturer in the School of Computer Science and Technology, Beijing Institute of Technology. His research interests include image processing and International Collegiate Programming Contest.

    View full text