Lightweight multi-scale residual networks with attention for image super-resolution

doi:10.1016/j.knosys.2020.106103

Knowledge-Based Systems

Volume 203, 5 September 2020, 106103

https://doi.org/10.1016/j.knosys.2020.106103 Get rights and content

Abstract

In recent years, constructing various deep convolutional neural networks (CNNs) for single-image super-resolution (SISR) tasks has made significant progress. Despite their high performance, numerous CNNs are limited in practical applications, owing to the requirement of heavy computation. This paper proposes a lightweight network for SISR, known as attention-based multi-scale residual network (AMSRN). In detail, a residual atrous spatial pyramid pooling (ASPP) block as well as a spatial and channel-wise attention residual (SCAR) block is stacked alternately to support the main framework of the entire network. The residual ASPP block utilizes parallel dilated convolutions of different dilation rates to achieve the purpose of capturing multi-scale features. The SCAR block adds the channel attention (CA) and spatial attention (SA) mechanisms based on a double-layer convolution residual block. In addition, group convolution is introduced in the SCAR block to further reduce the parameters while preventing over-fitting. Moreover, a multi-scale feature attention module is designed to provide instructive multi-scale attention information for shallow features. Particularly, we propose a novel upscale module, which adopts dual paths to upscale the features by jointly using sub-pixel convolution and nearest interpolation layers, instead of using deconvolution layer or sub-pixel convolution layer alone. The experimental results demonstrate that our method achieves comparable performance to the state-of-the-art methods, both quantitatively and qualitatively.

Introduction

Image super-resolution (SR), aimed at inferring high-resolution (HR) images from their low-resolution (LR) versions, is one of the key research issues in the field of computer vision. With the rapid development of deep learning, convolutional neural network (CNN)-based methods for SR exhibit better performance and potential than the earlier classic methods.

In recent years, CNN-based approaches for SR have received increasing attention, owing to the powerful approximation abilities of feedforward neural networks [1], [2]. Since the first successful introduction of a CNN framework into SR tasks by Dong et al. [3], a series of CNN-based models have emerged in an attempt to increase the depth of the network. Some examples are the residual network (ResNet) [4], very deep convolutional network for SR (VDSR) [5], and deeply recursive convolutional network (DRCN) [6]. Ledig et al. [7] proposed a deeper network called SRResNet based on ResNet, which achieved satisfactory results via a generative adversarial network (GAN). Inspired by the SRResNet, Lim et al. [8] designed two sophisticated networks with further improvements: enhanced deep ResNet for SR (EDSR) and multi-scale deep SR (MDSR). To emphasize the importance of the previous low-level features for the overall effect of reconstruction, shortcut connection-based approaches were proposed. For example, the dense convolutional network (DenseNet) [9], dense connected convolutional network (SRDenseNet) [10], persistent memory network (MemNet) [11], and residual dense network (RDN) [12]. All these methods took advantage of the features at different levels. Since then, numerous CNN-based networks, such as wide activation for efficient and accurate SR network (WDSR) [13] and adaptive weighted SR network (AWSRN) [14], have emerged ceaselessly to yield improved performances.

Apart from focusing on deepening the depth of network, a few network models are also dedicated to broadening the network width, to increase its receptive field. For example, the cascaded multi-scale cross network (CMSC) [15] and the multi-scale residual network (MSRN) [16] were built with the motivation to expand the width of network by combining convolution kernels at different scales to extract rich multi-scale information. The combination of complementary multi-scale information effectively improves the cross-layer information flow and network reconstruction performance. However, this type of network structure has certain drawbacks: although it works well, the numerous parameters required greatly increase the complexity of the model.

Recently, the attention mechanism has exhibited notable performance in various computer vision problems [17], [18], [19], [20], [21]. Inspired by the squeeze-and-excitation networks [22], Zhang et al. [23] proposed a very deep residual channel attention network (RCAN) to obtain a very deep trainable network and to adaptively rescale channel-wise features. It is worth mentioning that Cheng et al. [24] designed a triple attention mixed link network (TAN) to significantly enhance the feature representation; it consisted of three different aspects: kernel, spatial, and channel attention. Recently, Li et al. [25] presented an attention-based DenseNet with residual deconvolution (ADRD), in which a novel spatial attention (SA) module was proposed. Although the above-mentioned attention mechanisms slightly differ in structure, the general core concept is consistent. Therefore, identifying both the type of structure suitable for SR tasks and the method to combine channel attention with SA to increase the effectiveness of a model, is of tremendous significance.

To practically deal with these issues, this paper proposes a new lightweight attention-based multi-scale residual network (AMSRN). As shown in Fig. 7, our model is primarily composed of two basic blocks: a residual atrous spatial pyramid pooling (ASPP) block and a spatial and channel-wise attention residual (SCAR) block. The residual ASPP block is modified by ASPP to generate multi-scale features. Three different branches achieve the purpose of expanding the receptive field by adjusting the dilation rate of a particular convolution kernel. Thus, our residual ASPP block can not only extract multi-scale information but also reduce the network parameters by a certain extent. The SCAR block integrates two types of attention mechanisms: SA and channel-wise attention (CA), based on the original two-layer residual block. The addition of these two attention mechanisms can effectively adjust the space and channel-wise information. In particular, group convolution is adopted in the SCAR block to further reduce the parameters. Additionally, a novel multi-scale feature attention (MSFA) module is designed to provide pixel-wise attention for previous feature maps. It follows the multiple-branch structure of the ASPP and uses the valuable information at a deeper depth to guide the features of the front layer. It is particularly important to mention that we construct a dual-path upscale module. These two paths are responsible for upsampling the information on low and high frequencies, respectively. Note that four SCAR blocks are attached to the end of our network, aimed at increasing the network depth to further improve the reconstruction performance.

In summary, in this study, we establish a novel lightweight attention-based multi-scale residual network, which yields a satisfactory performance for SR with much fewer parameters. The contributions of this paper can be summarized as follows.

•
A lightweight block called the SCAR block is proposed, where spatial and channel attention integrated effectively. And the group convolution is utilized to release the heavy burden of the parameter. Consequently, the proposed SCARB can extract valuable features in a lightweight way.
•
A multi-scale feature attention module is presented to assign pixel-wise attention weights to the previous features. Different from the traditional self-attention mechanisms, we continue to convolve with dilated convolutions to achieve an attention guide of deeper feature information for the relatively lower-level features.
•
An innovative dual-path upscale module is proposed to explore the information on high and low frequencies for upsampling. Such that the feature information can be retained as sufficiently as possible.
•
A lightweight model, AMSRN, is proposed, which alternately utilizes an SCAR block and a residual ASPP block to recover high-quality images without sacrificing numerous parameters.

The remainder of this paper is organized as follows. Section 2 reviews the existing methods and the recent developments in SR tasks. In Section 3, we describe in detail the proposed network components and the entire framework of the model. Comparative experiments with other approaches and model analysis are presented in Section 4, and Section 5 concludes the paper with observations and discussions.

Section snippets

Related work and background

We will briefly review the recent developments in SR tasks that motivate our study from four different aspects.

Method

In this section, we first introduce the various components comprising the proposed network, and then describe the entire network architecture.

Experimental results

In this section, first the datasets and implementation details are introduced, and then ablation study to investigate and analyse the role of each network component is described. Finally, evaluations in terms of the quantitative evaluation, visual quality, efficiency, and running time are presented.

Conclusion

In summary, a lightweight and efficient SR model, attention-based multi-scale residual network (AMSRN), is proposed in this paper. We use a residual ASPP to fully extract the multi-scale information; it adopts dilated convolutions at different dilation rates to expand the receptive field while reducing the parameters. Simultaneously, a spatial and channel-wise attention residual (SCAR) block is constructed based on a traditional two-layer residual block. It is worth mentioning that group

CRediT authorship contribution statement

Huan Liu: Writing - original draft, Software, Data curation. Feilong Cao: Supervision, Conceptualization, Methodology, Writing - review & editing. Chenglin Wen: Investigation. Qinghua Zhang: Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This work was supported by the National Natural Science Foundation of China under grant 61933013.

References (55)

HornikK. et al.
Multilayer feedforward networks are universal approximators
Neural Netw.
(1989)
GaoP. et al.
Learning reinforced attentional representation for end-to-end visual tracking
Inform. Sci.
(2020)
CaoF. et al.
New architecture of deep recursive convolution networks for super-resolution
Knowl.-Based Syst.
(2019)
CybenkoG.
Approximation by superpositions of a sigmoidal function
Math. Control Signals Systems
(1989)
DongC. et al.
Image super-resolution using deep convolutional networks
IEEE Trans. Pattern Anal. Mach. Intell.
(2016)
HeK. et al.
Deep residual learning for image recognition
KimJ. et al.
Accurate image super-resolution using very deep convolutional networks
KimJ. et al.
Deeply-recursive convolutional network for image super-resolution
LedigC. et al.
Photo-realistic single image super-resolution using a generative adversarial network
LimB. et al.
Enhanced deep residual networks for single image super-resolution

HuangG. et al.

Densely connected convolutional networks

TongT. et al.

Image super-resolution using dense skip connections

TaiY. et al.

Memnet: a persistent memory network for image restoration

ZhangY. et al.

Residual dense network for image super-resolution

YuJ. et al.

Wide activation for efficient and accurate image super-resolution

(2018)

WangC. et al.

Lightweight image super-resolution with adaptive weighted learning network

(2019)

HuY. et al.

Single image super-resolution via Cascaded multi-scale cross network.

(2018)

LiJ. et al.

Multi-scale residual network for image super-resolution

ChenL. et al.

SCA-CNN: spatial and channel-wise attention in convolutional networks for image Captioning

KimJ. et al.

RAM: residual attention module for single image super-resolution.

GuJ. et al.

Wider channel attention network for remote sensing image super-resolution

GaoP. et al.

Siamese attentional keypoint network for high performance visual tracking

Knowl.-Based Syst.

(2019)

HuJ. et al.

Squeeze-and-excitation networks

ZhangY. et al.

Image super-resolution using very deep residual channel attention networks

ChengX. et al.

Triple attention mixed link network for single image super resolution

(2018)

LiZ.

Image super-resolution using attention based densenet with residual deconvolution

(2019)

DongC. et al.

Accelerating the super-resolution convolutional neural network

Cited by (42)

Lightweight image super-resolution network based on extended convolution mixer
2024, Engineering Applications of Artificial Intelligence
The single image super-resolution (SISR) is a computer vision task needed in many real-world applications. There are many methods developed to solve ill-posed SISR problem; however, these methods are based on attention mechanisms that need a large computing processing cost. So, these attention-based models cannot be used in real-world applications that need fast models. Thus, we propose an enhanced convolution mixer (EConvMixer) module to solve this SISR problem by using lower computing convolution layers. The EConvMixer is designed based on utilizing three convolution types, namely the dilated depthwise convolution for increasing the receptive field, the depthwise convolution for mixing spatial locations, and the pointwise convolution for mixing channel locations. Based on using this EConvMixer layer, we build a lightweight extended convolution mixer network (EConvMixN) for SR images. The EConvMixN has the spirit of the transformer model but with a low computational complexity using only convolution layers. It is clear that our model achieves appealing visual quality and reconstruction accuracy. Also, the EConvMixN model is faster than the state-of-the-art results at different SR scales. Moreover, the EConvMixN achieves state-of-the-art runtime in multiple SR scales. Finally, our model improves PSNR compared to CoMoNet-S by 0.12 dB and 0.08 dB for datasets of Set5 and Set14 at the scale of $\times$ 3.
A lightweight multiscale convolutional neural network for garbage sorting
2023, Systems and Soft Computing
Waste sorting plays a vital role in establishing a sustainable society by effectively reducing resource waste and promoting its recycling. However, traditional garbage sorting heavily relies on manual labor, which is inefficient, costly, and constrained by limited human resources. To address these challenges, this paper employs the convolutional neural network technique in deep learning for intelligent waste sorting. Firstly, a multi-scale processing strategy is introduced to enhance the system's resilience and accuracy by considering feature information at various scales. Secondly, a lightweight approach using tiny convolutions instead of large convolutions is adopted to reduce model parameters. Combining the advantages of both, we constructed a lightweight multiscale convolution (LMConv) and experiments the Lightweight Multiscale Convolutional Neural Network (LMNet) based on LMConv, and its optimal convolutional architecture is determined through ablation experiments. The experiment results demonstrate that LMNet outperforms other well-known convolutional neural network models in the area of garbage sorting.
Multi-scale information distillation network for efficient image super-resolution
2023, Knowledge-Based Systems
Efficient image super-resolution (SR), being preferred in the resource-constrained scenarios, aims at not only higher super-resolving accuracy but also lower computational complexity. Taking the perception capability of deep networks into account, efficiently and effectively obtaining the large receptive field is a key principle for this task. Thus, in this paper, we integrate the multi-scale receptive field design with information distillation structure and attention mechanism, and develop a lightweight Multi-Scale Information Distillation (MSID) network. In detail, we design a multi-scale feature distillation (MSFD) block by employing multi-scale convolutions with different kernels into feature distillation connection, which effectively distills information from multiple receptive fields with low computational cost for better feature refinement. Moreover, we construct a scalable large kernel attention (SLKA) block via scaling attentive fields across network layers, that possesses large and scalable receptive field in attention to discriminatively enhance the distilled features. Extensive quantitative and qualitative evaluations on benchmark datasets validate the effectiveness of each proposed component and also demonstrate the superiority of our MSID network over state-of-the-art efficient SR methods. The code is available at https://github.com/YuanfeiHuang/MSID.
Lightweight image super-resolution based multi-order gated aggregation network
2023, Neural Networks
Recently, Transformer-based models are taken much focus on solving the task of image super-resolution (SR) due to their ability to achieve better performance. However, these models combined huge computational cost during the computing self-attention mechanism. To solve this problem, we proposed a multi-order gated aggregation super-resolution network (MogaSRN) for low-level vision based on the concept of the MogaNet that is developed for high-level vision. The concept of the MogaSRN model is based on spatial multi-order context aggregation and adaptive channel-wise reallocation with the aid of the multi-layer perceptron (MLP). In contrast to the MogaNet model, in which the resolution of each stage decreased by a factor of 2, the resolution of the MogaSRN is stayed fixed during the deep features extraction. Moreover, the structure of the MogaSRN model is built based on balancing the performance and the model complexity. We evaluated our model based on five benchmark datasets concluding that the MogaSRN model can achieve significant improvements compared to the state-of-the-art. Moreover, our model shows the good visual quality and accuracy of the reconstruction. Finally, our model has 3.7 $\times$ faster runtime at the scale of $\times$ 4 compared to LWSwinIR with better performance.
Image super-resolution with multi-scale fractal residual attention network
2023, Computers and Graphics (Pergamon)
Deep neural networks can significantly improve the quality of super-resolution. However, previous work has made insufficient use of low-resolution scale features and channel-wise information, hence hindering the representational ability of CNNs. To address these issues, a multi-scale fractal residual attention network (MFRAN) is proposed. Specifically, MFRAN consists of fractal residual blocks (FRBs), dual-enhanced channel attention (DECA), and dilated residual attention blocks (DRABs). Among them, FRB applies multi-scale extension rule to continuously expand into a fractal structure that detects multi-scale features; DRAB constructs a combined dilated convolution to learn a generalizable and expressive feature space with a larger receptive field; DECA employs one-dimensional convolution to achieve cross-channel information interaction, and enhance the flow of information between groups by channel shuffling. Then, we integrate horizontal feature representations via local residual and feature fusion. Extensive quantitative and qualitative evaluations of benchmark datasets show that our proposed approach outperforms state-of-the-art methods in terms of quantitative metrics and visual results.
Lightweight image super-resolution based on deep learning: State-of-the-art and future directions
2023, Information Fusion
Recently, super-resolution (SR) techniques based on deep learning have taken more and more attention, aiming to improve the images and videos resolutions. Most of the SR methods are related to other fields of computer vision such as image classification, image segmentation, and object detection. Based on the success of the image SR task, many image SR surveys are introduced to summarize the recent work in the image SR domains. However, there is no survey to summarize the SR models for the lightweight image SR domain. In this paper, we present a comprehensive survey of the state-of-the-art lightweight SR models based on deep learning. The SR techniques are grouped into six major categories: include convolution, residual, dense, distillation, attention, and extremely lightweight based models. Also, we cover some other issues related to the SR task, such as benchmark datasets and metrics for performance evaluation. Finally, we discuss some future directions and open problems, that may help other community researchers in the future.

View all citing articles on Scopus

View full text

Lightweight multi-scale residual networks with attention for image super-resolution

Abstract

Introduction

Section snippets

Related work and background

Method

Experimental results

Conclusion

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgement

Neural Netw.

Inform. Sci.

Knowl.-Based Syst.

Approximation by superpositions of a sigmoidal function

Math. Control Signals Systems

Image super-resolution using deep convolutional networks

IEEE Trans. Pattern Anal. Mach. Intell.

Deep residual learning for image recognition

Accurate image super-resolution using very deep convolutional networks

Deeply-recursive convolutional network for image super-resolution

Photo-realistic single image super-resolution using a generative adversarial network

Enhanced deep residual networks for single image super-resolution

Densely connected convolutional networks

Image super-resolution using dense skip connections

Memnet: a persistent memory network for image restoration

Residual dense network for image super-resolution

Wide activation for efficient and accurate image super-resolution

Lightweight image super-resolution with adaptive weighted learning network

Single image super-resolution via Cascaded multi-scale cross network.

Multi-scale residual network for image super-resolution

SCA-CNN: spatial and channel-wise attention in convolutional networks for image Captioning

RAM: residual attention module for single image super-resolution.

Wider channel attention network for remote sensing image super-resolution

Siamese attentional keypoint network for high performance visual tracking

Knowl.-Based Syst.

Squeeze-and-excitation networks

Image super-resolution using very deep residual channel attention networks

Triple attention mixed link network for single image super resolution

Image super-resolution using attention based densenet with residual deconvolution

Accelerating the super-resolution convolutional neural network