Elsevier

Knowledge-Based Systems

Volume 203, 5 September 2020, 106103
Knowledge-Based Systems

Lightweight multi-scale residual networks with attention for image super-resolution

https://doi.org/10.1016/j.knosys.2020.106103Get rights and content

Abstract

In recent years, constructing various deep convolutional neural networks (CNNs) for single-image super-resolution (SISR) tasks has made significant progress. Despite their high performance, numerous CNNs are limited in practical applications, owing to the requirement of heavy computation. This paper proposes a lightweight network for SISR, known as attention-based multi-scale residual network (AMSRN). In detail, a residual atrous spatial pyramid pooling (ASPP) block as well as a spatial and channel-wise attention residual (SCAR) block is stacked alternately to support the main framework of the entire network. The residual ASPP block utilizes parallel dilated convolutions of different dilation rates to achieve the purpose of capturing multi-scale features. The SCAR block adds the channel attention (CA) and spatial attention (SA) mechanisms based on a double-layer convolution residual block. In addition, group convolution is introduced in the SCAR block to further reduce the parameters while preventing over-fitting. Moreover, a multi-scale feature attention module is designed to provide instructive multi-scale attention information for shallow features. Particularly, we propose a novel upscale module, which adopts dual paths to upscale the features by jointly using sub-pixel convolution and nearest interpolation layers, instead of using deconvolution layer or sub-pixel convolution layer alone. The experimental results demonstrate that our method achieves comparable performance to the state-of-the-art methods, both quantitatively and qualitatively.

Introduction

Image super-resolution (SR), aimed at inferring high-resolution (HR) images from their low-resolution (LR) versions, is one of the key research issues in the field of computer vision. With the rapid development of deep learning, convolutional neural network (CNN)-based methods for SR exhibit better performance and potential than the earlier classic methods.

In recent years, CNN-based approaches for SR have received increasing attention, owing to the powerful approximation abilities of feedforward neural networks [1], [2]. Since the first successful introduction of a CNN framework into SR tasks by Dong et al. [3], a series of CNN-based models have emerged in an attempt to increase the depth of the network. Some examples are the residual network (ResNet) [4], very deep convolutional network for SR (VDSR) [5], and deeply recursive convolutional network (DRCN) [6]. Ledig et al. [7] proposed a deeper network called SRResNet based on ResNet, which achieved satisfactory results via a generative adversarial network (GAN). Inspired by the SRResNet, Lim et al. [8] designed two sophisticated networks with further improvements: enhanced deep ResNet for SR (EDSR) and multi-scale deep SR (MDSR). To emphasize the importance of the previous low-level features for the overall effect of reconstruction, shortcut connection-based approaches were proposed. For example, the dense convolutional network (DenseNet) [9], dense connected convolutional network (SRDenseNet) [10], persistent memory network (MemNet) [11], and residual dense network (RDN) [12]. All these methods took advantage of the features at different levels. Since then, numerous CNN-based networks, such as wide activation for efficient and accurate SR network (WDSR) [13] and adaptive weighted SR network (AWSRN) [14], have emerged ceaselessly to yield improved performances.

Apart from focusing on deepening the depth of network, a few network models are also dedicated to broadening the network width, to increase its receptive field. For example, the cascaded multi-scale cross network (CMSC) [15] and the multi-scale residual network (MSRN) [16] were built with the motivation to expand the width of network by combining convolution kernels at different scales to extract rich multi-scale information. The combination of complementary multi-scale information effectively improves the cross-layer information flow and network reconstruction performance. However, this type of network structure has certain drawbacks: although it works well, the numerous parameters required greatly increase the complexity of the model.

Recently, the attention mechanism has exhibited notable performance in various computer vision problems [17], [18], [19], [20], [21]. Inspired by the squeeze-and-excitation networks [22], Zhang et al. [23] proposed a very deep residual channel attention network (RCAN) to obtain a very deep trainable network and to adaptively rescale channel-wise features. It is worth mentioning that Cheng et al. [24] designed a triple attention mixed link network (TAN) to significantly enhance the feature representation; it consisted of three different aspects: kernel, spatial, and channel attention. Recently, Li et al. [25] presented an attention-based DenseNet with residual deconvolution (ADRD), in which a novel spatial attention (SA) module was proposed. Although the above-mentioned attention mechanisms slightly differ in structure, the general core concept is consistent. Therefore, identifying both the type of structure suitable for SR tasks and the method to combine channel attention with SA to increase the effectiveness of a model, is of tremendous significance.

To practically deal with these issues, this paper proposes a new lightweight attention-based multi-scale residual network (AMSRN). As shown in Fig. 7, our model is primarily composed of two basic blocks: a residual atrous spatial pyramid pooling (ASPP) block and a spatial and channel-wise attention residual (SCAR) block. The residual ASPP block is modified by ASPP to generate multi-scale features. Three different branches achieve the purpose of expanding the receptive field by adjusting the dilation rate of a particular convolution kernel. Thus, our residual ASPP block can not only extract multi-scale information but also reduce the network parameters by a certain extent. The SCAR block integrates two types of attention mechanisms: SA and channel-wise attention (CA), based on the original two-layer residual block. The addition of these two attention mechanisms can effectively adjust the space and channel-wise information. In particular, group convolution is adopted in the SCAR block to further reduce the parameters. Additionally, a novel multi-scale feature attention (MSFA) module is designed to provide pixel-wise attention for previous feature maps. It follows the multiple-branch structure of the ASPP and uses the valuable information at a deeper depth to guide the features of the front layer. It is particularly important to mention that we construct a dual-path upscale module. These two paths are responsible for upsampling the information on low and high frequencies, respectively. Note that four SCAR blocks are attached to the end of our network, aimed at increasing the network depth to further improve the reconstruction performance.

In summary, in this study, we establish a novel lightweight attention-based multi-scale residual network, which yields a satisfactory performance for SR with much fewer parameters. The contributions of this paper can be summarized as follows.

  • A lightweight block called the SCAR block is proposed, where spatial and channel attention integrated effectively. And the group convolution is utilized to release the heavy burden of the parameter. Consequently, the proposed SCARB can extract valuable features in a lightweight way.

  • A multi-scale feature attention module is presented to assign pixel-wise attention weights to the previous features. Different from the traditional self-attention mechanisms, we continue to convolve with dilated convolutions to achieve an attention guide of deeper feature information for the relatively lower-level features.

  • An innovative dual-path upscale module is proposed to explore the information on high and low frequencies for upsampling. Such that the feature information can be retained as sufficiently as possible.

  • A lightweight model, AMSRN, is proposed, which alternately utilizes an SCAR block and a residual ASPP block to recover high-quality images without sacrificing numerous parameters.

The remainder of this paper is organized as follows. Section 2 reviews the existing methods and the recent developments in SR tasks. In Section 3, we describe in detail the proposed network components and the entire framework of the model. Comparative experiments with other approaches and model analysis are presented in Section 4, and Section 5 concludes the paper with observations and discussions.

Section snippets

Related work and background

We will briefly review the recent developments in SR tasks that motivate our study from four different aspects.

Method

In this section, we first introduce the various components comprising the proposed network, and then describe the entire network architecture.

Experimental results

In this section, first the datasets and implementation details are introduced, and then ablation study to investigate and analyse the role of each network component is described. Finally, evaluations in terms of the quantitative evaluation, visual quality, efficiency, and running time are presented.

Conclusion

In summary, a lightweight and efficient SR model, attention-based multi-scale residual network (AMSRN), is proposed in this paper. We use a residual ASPP to fully extract the multi-scale information; it adopts dilated convolutions at different dilation rates to expand the receptive field while reducing the parameters. Simultaneously, a spatial and channel-wise attention residual (SCAR) block is constructed based on a traditional two-layer residual block. It is worth mentioning that group

CRediT authorship contribution statement

Huan Liu: Writing - original draft, Software, Data curation. Feilong Cao: Supervision, Conceptualization, Methodology, Writing - review & editing. Chenglin Wen: Investigation. Qinghua Zhang: Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This work was supported by the National Natural Science Foundation of China under grant 61933013.

References (55)

  • HuangG. et al.

    Densely connected convolutional networks

  • TongT. et al.

    Image super-resolution using dense skip connections

  • TaiY. et al.

    Memnet: a persistent memory network for image restoration

  • ZhangY. et al.

    Residual dense network for image super-resolution

  • YuJ. et al.

    Wide activation for efficient and accurate image super-resolution

    (2018)
  • WangC. et al.

    Lightweight image super-resolution with adaptive weighted learning network

    (2019)
  • HuY. et al.

    Single image super-resolution via Cascaded multi-scale cross network.

    (2018)
  • LiJ. et al.

    Multi-scale residual network for image super-resolution

  • ChenL. et al.

    SCA-CNN: spatial and channel-wise attention in convolutional networks for image Captioning

  • KimJ. et al.

    RAM: residual attention module for single image super-resolution.

  • GuJ. et al.

    Wider channel attention network for remote sensing image super-resolution

  • GaoP. et al.

    Siamese attentional keypoint network for high performance visual tracking

    Knowl.-Based Syst.

    (2019)
  • HuJ. et al.

    Squeeze-and-excitation networks

  • ZhangY. et al.

    Image super-resolution using very deep residual channel attention networks

  • ChengX. et al.

    Triple attention mixed link network for single image super resolution

    (2018)
  • LiZ.

    Image super-resolution using attention based densenet with residual deconvolution

    (2019)
  • DongC. et al.

    Accelerating the super-resolution convolutional neural network

  • Cited by (42)

    • Lightweight image super-resolution network based on extended convolution mixer

      2024, Engineering Applications of Artificial Intelligence
    View all citing articles on Scopus
    View full text