Full length articleMulti-scale attention network for image super-resolution☆
Introduction
For decades, image super-resolution (SR) has been applied to solve the problem of reconstructing low-resolution (LR) images to high-resolution (HR) images. However, the task is inherently an ill-posed inverse issue that may generate multiple HR images from an identical LR image. Many SR methods have been proposed to address this problem, including interpolation-based [1], reconstruction-based [2], and learning-based [3], [4] methods.
Deep learning has recently played a significant role in image SR applications and has demonstrated brilliant reconstruction performance due to its robust nonlinear mapping. Dong et al. [5] first constructed a three-layer convolutional neural network (CNN) for SR tasks, acquiring a better performance when compared to conventional SR algorithms. Kim et al. [6] presented a very deep convolutional network for SR (VDSR) by increasing the network depth to 20 layers, which obtained a significant improvement. After that, progressively more CNN-based approaches have been developed to improve SR results further. Increasing the network depth can indeed enhance reconstruction accuracy, however, it inevitably introduces more parameters and heavy computational resources. For example, the enhanced deep SR network (EDSR) [7] and residual dense network (RDN) [8] were very deep networks that entailed 43M and 22M parameters, respectively. Residual channel attention network (RCAN) [9] has more than 400 convolutional layers with about 15.59M parameters. All these approaches exhibited nice SR performance, while their excessive parameters hindered their practical applications.
Concerning reducing the model parameters, some network architectures began to focus on the recursive mechanism and lightweight models. Deeply recursive convolutional network (DRCN) [10] and deep recursive residual network (DRRN) [11] referred to the recursive mechanism, in which a set of recursive blocks was applied to decrease the network parameters. Although the recursive mechanism strategy demonstrated favorable results with fewer parameters, it still has a huge memory footprint. Currently, to reach a better performance and model size trade-off, some lightweight methods were implemented. Cascading residual network (CARN) [12] adopted cascading residual architecture with group convolution to reduce the parameters, but it sacrificed the accuracy. A neural architecture search technique was used by Chu et al. [13] that allowed seeking lightweight networks. Nevertheless, the performance is limited owing to the constraints of search space. In spite of their success in fascinating results, there is still much room for further improvement between reconstruction accuracy and model capacity.
Moreover, broadening the network width to increase model receptive field, is also an effective way to boost performance. Li et al. [14] built a novel multi-scale residual network (MSRN) to obtain the image features on different scales. Hu et al. [15] proposed a deep cascaded multi-scale cross network (CMSC) by designing multi-scale cross modules to fuse multi-scale information. Both MSRN and CMSC were adopted to different convolution kernels to achieve the extraction of multi-scale information. However, due to the utilization of huge convolution kernels, the model size will increase dramatically.
Additionally, attention mechanism has been widely used in various computer vision tasks [16], [17]. It allows networks to focus on more valuable features, thereby enhancing their representational capability. Motivated by [16], Zhang et al. [9] designed the channel attention mechanism in RCAN that can rescale channel-wise features. Attention-based DenseNet with residual deconvolution (ADRD) [18] and residual feature aggregation network (RFANet) [19] learned the spatial context by designing a spatial attention module. In particular, channel attention and spatial attention mechanisms were jointly combined in residual attention SR network (SRRAM) [20] and multi-path adaptive modulation network (MAMNet) [21], exploring the relationship of both inter-and intra-channels. Nevertheless, since the attention mechanism models interdependencies across the whole channels, directly applying the attention mechanism to image SR involves unnecessary parameters and calculations. It remains to be explored how to build a compact network that strikes a balance between model performance and capacity.
To alleviate the above issues, we propose a lightweight and efficient multi-scale attention network (MSAN), whose overall architecture is depicted in Fig. 2. The MSAN consists of a shallow feature part, a sequence of multi-scale attention blocks (MSAB), and a reconstruction part. A sequence of MSABs is adaptively cascaded to infer informative features in a coarse-to-fine manner. In detail, MSAB is primarily composed of a multi-scale cross block (MSCB) and multi-path wide-activated attention block (MWAB). The MSCB utilizes three parallel convolutions with different dilation rates, which are connected hierarchically. The unique connection can effectively extract multi-scale and multi-level features while expanding the receptive field. Particularly, the dilated convolution replaces the large kernel convolution, achieving the reduction of model parameters. To further yield more expressive feature representations, we propose a MWAB to split the features produced by MSCB into three portions inhomogeneously, with each portion being in charge of a particular functionality in a heterogeneous manner. These particular functionalities can adequately establish mutual communication among channels to facilitate the diversified feature output. Moreover, these functionalities do not operate on the whole channels but partial channels, further decreasing the number of parameters and calculations. Compared with state-of-the-art SR networks, our proposed MSAN demonstrates higher performance as well as lower computation time, as shown in Fig. 1.
In summary, our main contributions are threefold:
- •
We propose a lightweight yet efficient MSAN by utilizing a set of MSABs for a more accurate image SR. Thanks to our MSAB consisting of MSCB and MWAB, we obtain better results with much fewer parameters and Mult-Adds.
- •
We propose a MSCB, based on the hierarchical connections among three parallel convolutions, that can fully learn multi-scale and multi-level features. The three parallel convolutions with different dilation rates effectively enlarge the receptive field while releasing parameters overhead.
- •
We design a MWAB to achieve internal communication among channel features, further improving performance and speeding up the training process. The channel features are divided into three portions unevenly, each of which goes through a distinctive pathway, i.e., original spatial (OS), spatial attention (SA), and channel attention (CA).
The rest of this study is organized as follows: Section 2 reviews the CNN-based and attention-based methods in SR tasks. Section 3 describes our proposed MSAN in detail. Section 4 provides model analysis and comparative experiments with other methods. We conclude our work in Section 5.
Section snippets
Single image super-resolution
CNN approaches based on deep learning have become a key technology for addressing image SR tasks. As pioneers, Dong et al. [5] introduced a three-layer CNN to learn the non-linear mapping between HR image and LR image, which was named SRCNN. Inspired by this strategy, Kim et al. [6] stacked 20 convolutional layers to increase the network depth and achieved a significant performance improvement. Later on, a series of CNN-based works was mostly devoted to deepening the network to boost
Overall network architecture
The overall architecture of our proposed MSAN is demonstrated in Fig. 2. Similar to most SR network architectures [6], [20], MSAN can be partitioned into three parts: (1) shallow feature extraction part, (2) the chained stacking MSABs part, and (3) reconstruction part. We denote as the LR input image and as the corresponding HR output image, where and are the height and the width of the input image, respectively. is a scale factor, and denotes the number of
Datasets and metrics
Following the recent methods [9], [20], we train our proposed MSAN using 800 high-quality images from the DIV2K [32] dataset. For testing, we conduct on five standard benchmark datasets, including Set5 [33], Set14 [34], B100 [35], Urban100 [36], and Manga109 [37]. To prove the superiority of our MSAN, we leverage three common degradation models with Bicubic (BI), Blur-downscale (BD), and Downscale-noise (DN). The SR results are evaluated by the peak signal-to-noise ratio (PSNR) and structure
Conclusion
In this study, we propose an efficient and lightweight MSAN for single image SR. We implement a sequence of MSABs as the backbone in MSAN to progressively refine diversified information. In MSAB, the efficient combination of MSCB and MWAB can greatly profit SR performance in addition to accelerating the training process. Specifically, the MSCB is designed to hierarchically connect among parallel dilated convolutions to catch advanced features at different scales and levels as well as to enlarge
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (No. 51979085, 61903124).
References (45)
- et al.
Densely connected network with improved pyramidal bottleneck residual units for super-resolution
J. Vis. Commun. Image Represent.
(2021) - et al.
MAMNet: Multi-path adaptive modulation network for image super-resolution
Neurocomputing
(2020) - et al.
Lightweight multi-scale residual networks with attention for image super-resolution
Knowl.-Based Syst.
(2020) - et al.
An edge-guided image interpolation algorithm via directional filtering and data fusion
IEEE Trans. Image Process.
(2006) - et al.
Single image super-resolution with non-local means and steering kernel regression
IEEE Trans. Image Process.
(2012) - W. Shi, J. Caballero, F. Huszar, J. Totz, A.P. Aitken, R. Bishop, D. Rueckert, Z. Wang, Real-time single image and...
- et al.
Image super-resolution using deep convolutional networks
IEEE Trans. Pattern Anal. Mach. Intell.
(2016) - J. Kim, J.K. Lee, K.M. Lee, Accurate image super-resolution using very deep convolutional networks, in: Proceedings of...
- B. Lim, S. Son, H. Kim, S. Nah, K.M. Lee, Enhanced deep residual networks for single image super-resolution, in:...
- Y. Zhang, Y. Tian, Y. Kong, B. Zhong, Y. Fu, Residual dense network for image super-resolution, in: Proceedings of the...
Deeply-recursive convolutional network for image super-resolution
Image super-resolution via deep recursive residual network
Fast, accurate and lightweight super-resolution with neural architecture search
Single image super-resolution via cascaded multi-scale cross network
Image super-resolution using attention based DenseNet with residual deconvolution
RAM: Residual attention module for single image super-resolution
Cited by (14)
AMP-BCS: AMP-based image block compressed sensing with permutation of sparsified DCT coefficients
2024, Journal of Visual Communication and Image RepresentationA super-resolution-based license plate recognition method for remote surveillance
2023, Journal of Visual Communication and Image RepresentationMPS-FFA: A multiplane and multiscale feature fusion attention network for Alzheimer's disease prediction with structural MRI
2023, Computers in Biology and MedicineFine-grained neural architecture search for image super-resolution
2022, Journal of Visual Communication and Image RepresentationRGBT Tracking via Challenge-Based Appearance Disentanglement and Interaction
2024, IEEE Transactions on Image Processing
- ☆
This paper has been recommended for acceptance by Zicheng Liu.