A very lightweight and efficient image super-resolution network
Introduction
Single image super-resolution (SISR) (Freeman, Pasztor, & Carmichael, 2000) is a classical task in the field of computer vision, aiming at recovering the high-resolution (HR) image from the corresponding low-resolution (LR) counterpart. It has a wide range of applications, such as video surveillance (Zou & Yuen, 2012), medical diagnosis (Shi et al., 2013) and remote sensing imaging (Thornton, Atkinson, & Holland, 2006).
Convolutional neural networks (CNNs) (Dong, Loy, He, & Tang, 2014) have significantly improved the performance of SISR and have dominated the current research on SISR. However, CNN-based SISR methods are heavily dependent on the size of the network, i.e., the depth (number of layers) and width (number of channels) of the network. Larger SISR networks are more expressive and, usually, have better performance. For example, EDSR (Lim, Son, Kim, Nah, & Mu Lee, 2017) has 65 convolutional layers and 43 M parameters, and RCAN (Zhang, Li et al., 2018) has more than 400 convolutional layers and 16 M parameters. While the methods such as EDSR and RCAN have good performance, they require high computational and memory costs and are difficult to apply to devices with limited resources (e.g., mobile phones). Lightweight SISR networks (generally considered to have less than 1 M parameters) are highly desirable. Current SISR methods, especially lightweight SISR methods, face a common challenge: The reconstructed super-resolution (SR) images often suffer from blurring and distortion due to loss of high-frequency information such as texture (Ahn, Kang, & Sohn, 2018).
The network architectures used by the current SISR methods mainly consist of residual connection (Lim et al., 2017), dense connection (Zhang, Tian, Kong, Zhong & Fu, 2018) and channel attention mechanism (Zhang, Li et al., 2018), which mostly remove the popular batch normalization and pooling layers (Huang et al., 2021, Lim et al., 2017) in classification networks (He, Zhang, Ren, & Sun, 2016) to reduce the feature information loss and improve the feature information utilization and expressiveness. The common dense connection schemes (Zhang, Tian et al., 2018), as shown in Fig. 1(a). The extracted hierarchical features are concatenated together. Qiu, Wang, Tao, and Cheng (2019) show that the shallow features of SISR networks contain more low-frequency information and the deep features contain more high-frequency information. Low-frequency information is composed of simpler structures and textures, where simpler functions are needed to recover them; high-frequency information is composed of complex structures and textures, which require more complex recovery functions. The residual and dense connections are not the best way to transfer the information of shallow layers to the deep layers, because the deep layers are prone to overfit the low-frequency information, resulting in distortion of the restored image. Recently, Luo et al. (2020) propose a novel backward sequential concatenation scheme, as shown in Fig. 1(b). A 1 × 1 convolutional layer is used to reduce the number of feature channels output by each FEB by half, and the dimensionality-reducing features are gradually concatenated from back to front, which can better restore the high-frequency information in the SR image.
In this paper, we propose a very lightweight and efficient image super-resolution network (VLESR). Our core contribution is that mainly inspired by the work of Luo et al. (2020), we propose to group and fuse high-/low-frequency features in pairs to better exploit the feature information. The features with the highest difference between low-frequencies and high-frequencies form the first group, the features with the next highest difference form the second group, and so on. Then, starting from the feature group with the smallest frequency difference, the features of each group are gradually fused until the feature group with the largest frequency difference. In addition, we propose a multi-way attention block (MWAB) to mine the multiple different clues of the feature information. In order to make our model lightweight and effective enough, we propose a lightweight residual concatenation block (LRCB), a lightweight convolutional block (LConv), and a progressive interactive group convolution (PIGC). The contributions of our work can be summarized as follows:
- •
Propose a very lightweight and efficient image super-resolution network (VLESR), which has a better balance of complexity and performance and outperforms the other state-of-the-art methods (please refer to Fig. 2).
- •
Propose a frequency grouping fusion block (FGFB), which can better fuse high-frequency and low-frequency feature information.
- •
Propose a multi-way attention block (MWAB), which can exploit the multiple different clues of the feature information.
- •
Propose a lightweight residual concatenation block (LRCB), which can combine the advantages of the residual connection and the channel concatenation.
- •
Propose a lightweight convolutional block (LConv) for image super-resolution, which can significantly reduce the number of parameters.
- •
Propose a progressive interactive group convolution (PIGC), which is more effective than the conventional group convolution.
The rest of the paper is organized as follows. In Section 2, we review related works. Section 3 describes our method. Section 4 illustrates detailed experimental results and comparisons against other state-of-the-art methods. Section 5 concludes the paper. For ease of reading, the main abbreviations in this paper are summarized in Table 1.
Section snippets
Lightweight SISR methods based on CNNs
Dong et al. (2014) propose the first CNN-based SISR method, called SRCNN. SRCNN, which only contains three convolutional layers, learns the non-linear mapping between LR and HR images end-to-end and outperforms the traditional SISR methods. Kim, Kwon Lee, and Mu Lee (2016a) propose VDSR, which uses skip connections to learn residual information and increases the number of convolutional layers to 20, which further improves the performance of SISR. In general, deeper (more convolutional layers)
Network architecture
Our VLESR network architecture, shown in Fig. 3(a), mainly consists of a 3 × 3 convolutional layer, a deep feature extraction block (DFEB), a frequency grouping fusion block (FGFB), and an Upsampler. DFEB contains four residual attention blocks (RABs), and Upsampler uses subpixel convolution (Shi et al., 2016). Assuming the input LR image is denoted as , the firstly passes through the 3 × 3 convolutional layer: where is the 3 × 3 convolutional function, and is its
Datasets
For a fair comparison with other state-of-the-art methods, the common DIV2K dataset (Timofte, Agustsson, Van Gool, Yang, & Zhang, 2017) was used as the training and validation dataset. DIV2K consists of 800 training images (001–800) and 100 validation images (801–900). The ten images (801–810) were used for validation, labeled as DIV2K-10. The original HR training images were bicubically downsampled to obtain the paired LR training images. Similar to other methods, we also randomly performed
Conclusions
In this paper, we propose a very lightweight and efficient image super-resolution network, called VLESR. The lightweight residual concatenation block (LRCB) can better propagate and fuse local features; the multi-way attention block (MWAB) can better combine the features of different cues and improve the feature expressiveness; the high-/low-frequency feature grouping fusion block (FGFB) can better fuse the high-/low-frequency feature information, and improve the quality of SR image
CRediT authorship contribution statement
Dandan Gao: Methodology, Software, Writing – original draft. Dengwen Zhou: Supervision, Project administration, Methodology, Writing – reviewing and editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (50)
- et al.
Dual-path attention network for single image super-resolution
Expert Systems with Applications
(2021) Deep learning using rectified linear units (relu)
(2018)- Ahn, N., Kang, B., & Sohn, K.-A. (2018). Fast, accurate, and lightweight super-resolution with cascading residual...
- et al.
Densely residual laplacian super-resolution
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2022) - et al.
Low-complexity single-image super-resolution based on nonnegative neighbor embedding
- et al.
Learning a deep convolutional network for image super-resolution
- et al.
Accelerating the super-resolution convolutional neural network
- et al.
Learning low-level vision
International Journal of Computer Vision
(2000) - Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., & Fang, Z., et al. (2019). Dual attention network for scene segmentation....
- He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE...
Single image super-resolution from transformed self-exemplars
A model of saliency-based visual attention for rapid scene analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
Adam: A method for stochastic optimization
Lapar: Linearly-assembled pixel-adaptive regression network for single image super-resolution and beyond
Advances in Neural Information Processing Systems
Residual feature distillation network for lightweight image super-resolution
European conference on computer vision
Cited by (23)
Multi-depth branch network for efficient image super-resolution
2024, Image and Vision ComputingDual contrastive attention-guided deformable convolutional network for single image super-resolution
2024, Journal of Visual Communication and Image RepresentationSpatial Bias for attention-free non-local neural networks
2024, Expert Systems with ApplicationsSelf-supervised medical slice interpolation network using controllable feature flow[Formula presented]
2024, Expert Systems with ApplicationsKernel adaptive memory network for blind video super-resolution
2024, Expert Systems with Applications
The code (and data) in this article has been certified as Reproducible by Code Ocean: (https://codeocean.com/). More information on the Reproducibility Badge Initiative is available at https://www.elsevier.com/physical-sciences-and-engineering/computer-science/journals.