Lightweight multi-scale residual networks with attention for image super-resolution
Introduction
Image super-resolution (SR), aimed at inferring high-resolution (HR) images from their low-resolution (LR) versions, is one of the key research issues in the field of computer vision. With the rapid development of deep learning, convolutional neural network (CNN)-based methods for SR exhibit better performance and potential than the earlier classic methods.
In recent years, CNN-based approaches for SR have received increasing attention, owing to the powerful approximation abilities of feedforward neural networks [1], [2]. Since the first successful introduction of a CNN framework into SR tasks by Dong et al. [3], a series of CNN-based models have emerged in an attempt to increase the depth of the network. Some examples are the residual network (ResNet) [4], very deep convolutional network for SR (VDSR) [5], and deeply recursive convolutional network (DRCN) [6]. Ledig et al. [7] proposed a deeper network called SRResNet based on ResNet, which achieved satisfactory results via a generative adversarial network (GAN). Inspired by the SRResNet, Lim et al. [8] designed two sophisticated networks with further improvements: enhanced deep ResNet for SR (EDSR) and multi-scale deep SR (MDSR). To emphasize the importance of the previous low-level features for the overall effect of reconstruction, shortcut connection-based approaches were proposed. For example, the dense convolutional network (DenseNet) [9], dense connected convolutional network (SRDenseNet) [10], persistent memory network (MemNet) [11], and residual dense network (RDN) [12]. All these methods took advantage of the features at different levels. Since then, numerous CNN-based networks, such as wide activation for efficient and accurate SR network (WDSR) [13] and adaptive weighted SR network (AWSRN) [14], have emerged ceaselessly to yield improved performances.
Apart from focusing on deepening the depth of network, a few network models are also dedicated to broadening the network width, to increase its receptive field. For example, the cascaded multi-scale cross network (CMSC) [15] and the multi-scale residual network (MSRN) [16] were built with the motivation to expand the width of network by combining convolution kernels at different scales to extract rich multi-scale information. The combination of complementary multi-scale information effectively improves the cross-layer information flow and network reconstruction performance. However, this type of network structure has certain drawbacks: although it works well, the numerous parameters required greatly increase the complexity of the model.
Recently, the attention mechanism has exhibited notable performance in various computer vision problems [17], [18], [19], [20], [21]. Inspired by the squeeze-and-excitation networks [22], Zhang et al. [23] proposed a very deep residual channel attention network (RCAN) to obtain a very deep trainable network and to adaptively rescale channel-wise features. It is worth mentioning that Cheng et al. [24] designed a triple attention mixed link network (TAN) to significantly enhance the feature representation; it consisted of three different aspects: kernel, spatial, and channel attention. Recently, Li et al. [25] presented an attention-based DenseNet with residual deconvolution (ADRD), in which a novel spatial attention (SA) module was proposed. Although the above-mentioned attention mechanisms slightly differ in structure, the general core concept is consistent. Therefore, identifying both the type of structure suitable for SR tasks and the method to combine channel attention with SA to increase the effectiveness of a model, is of tremendous significance.
To practically deal with these issues, this paper proposes a new lightweight attention-based multi-scale residual network (AMSRN). As shown in Fig. 7, our model is primarily composed of two basic blocks: a residual atrous spatial pyramid pooling (ASPP) block and a spatial and channel-wise attention residual (SCAR) block. The residual ASPP block is modified by ASPP to generate multi-scale features. Three different branches achieve the purpose of expanding the receptive field by adjusting the dilation rate of a particular convolution kernel. Thus, our residual ASPP block can not only extract multi-scale information but also reduce the network parameters by a certain extent. The SCAR block integrates two types of attention mechanisms: SA and channel-wise attention (CA), based on the original two-layer residual block. The addition of these two attention mechanisms can effectively adjust the space and channel-wise information. In particular, group convolution is adopted in the SCAR block to further reduce the parameters. Additionally, a novel multi-scale feature attention (MSFA) module is designed to provide pixel-wise attention for previous feature maps. It follows the multiple-branch structure of the ASPP and uses the valuable information at a deeper depth to guide the features of the front layer. It is particularly important to mention that we construct a dual-path upscale module. These two paths are responsible for upsampling the information on low and high frequencies, respectively. Note that four SCAR blocks are attached to the end of our network, aimed at increasing the network depth to further improve the reconstruction performance.
In summary, in this study, we establish a novel lightweight attention-based multi-scale residual network, which yields a satisfactory performance for SR with much fewer parameters. The contributions of this paper can be summarized as follows.
- •
A lightweight block called the SCAR block is proposed, where spatial and channel attention integrated effectively. And the group convolution is utilized to release the heavy burden of the parameter. Consequently, the proposed SCARB can extract valuable features in a lightweight way.
- •
A multi-scale feature attention module is presented to assign pixel-wise attention weights to the previous features. Different from the traditional self-attention mechanisms, we continue to convolve with dilated convolutions to achieve an attention guide of deeper feature information for the relatively lower-level features.
- •
An innovative dual-path upscale module is proposed to explore the information on high and low frequencies for upsampling. Such that the feature information can be retained as sufficiently as possible.
- •
A lightweight model, AMSRN, is proposed, which alternately utilizes an SCAR block and a residual ASPP block to recover high-quality images without sacrificing numerous parameters.
The remainder of this paper is organized as follows. Section 2 reviews the existing methods and the recent developments in SR tasks. In Section 3, we describe in detail the proposed network components and the entire framework of the model. Comparative experiments with other approaches and model analysis are presented in Section 4, and Section 5 concludes the paper with observations and discussions.
Section snippets
Related work and background
We will briefly review the recent developments in SR tasks that motivate our study from four different aspects.
Method
In this section, we first introduce the various components comprising the proposed network, and then describe the entire network architecture.
Experimental results
In this section, first the datasets and implementation details are introduced, and then ablation study to investigate and analyse the role of each network component is described. Finally, evaluations in terms of the quantitative evaluation, visual quality, efficiency, and running time are presented.
Conclusion
In summary, a lightweight and efficient SR model, attention-based multi-scale residual network (AMSRN), is proposed in this paper. We use a residual ASPP to fully extract the multi-scale information; it adopts dilated convolutions at different dilation rates to expand the receptive field while reducing the parameters. Simultaneously, a spatial and channel-wise attention residual (SCAR) block is constructed based on a traditional two-layer residual block. It is worth mentioning that group
CRediT authorship contribution statement
Huan Liu: Writing - original draft, Software, Data curation. Feilong Cao: Supervision, Conceptualization, Methodology, Writing - review & editing. Chenglin Wen: Investigation. Qinghua Zhang: Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgement
This work was supported by the National Natural Science Foundation of China under grant 61933013.
References (55)
- et al.
Multilayer feedforward networks are universal approximators
Neural Netw.
(1989) - et al.
Learning reinforced attentional representation for end-to-end visual tracking
Inform. Sci.
(2020) - et al.
New architecture of deep recursive convolution networks for super-resolution
Knowl.-Based Syst.
(2019) Approximation by superpositions of a sigmoidal function
Math. Control Signals Systems
(1989)- et al.
Image super-resolution using deep convolutional networks
IEEE Trans. Pattern Anal. Mach. Intell.
(2016) - et al.
Deep residual learning for image recognition
- et al.
Accurate image super-resolution using very deep convolutional networks
- et al.
Deeply-recursive convolutional network for image super-resolution
- et al.
Photo-realistic single image super-resolution using a generative adversarial network
- et al.
Enhanced deep residual networks for single image super-resolution
Densely connected convolutional networks
Image super-resolution using dense skip connections
Memnet: a persistent memory network for image restoration
Residual dense network for image super-resolution
Wide activation for efficient and accurate image super-resolution
Lightweight image super-resolution with adaptive weighted learning network
Single image super-resolution via Cascaded multi-scale cross network.
Multi-scale residual network for image super-resolution
SCA-CNN: spatial and channel-wise attention in convolutional networks for image Captioning
RAM: residual attention module for single image super-resolution.
Wider channel attention network for remote sensing image super-resolution
Siamese attentional keypoint network for high performance visual tracking
Knowl.-Based Syst.
Squeeze-and-excitation networks
Image super-resolution using very deep residual channel attention networks
Triple attention mixed link network for single image super resolution
Image super-resolution using attention based densenet with residual deconvolution
Accelerating the super-resolution convolutional neural network
Cited by (42)
Lightweight image super-resolution network based on extended convolution mixer
2024, Engineering Applications of Artificial IntelligenceA lightweight multiscale convolutional neural network for garbage sorting
2023, Systems and Soft ComputingMulti-scale information distillation network for efficient image super-resolution
2023, Knowledge-Based SystemsLightweight image super-resolution based multi-order gated aggregation network
2023, Neural NetworksImage super-resolution with multi-scale fractal residual attention network
2023, Computers and Graphics (Pergamon)