A very lightweight and efficient image super-resolution network

https://doi.org/10.1016/j.eswa.2022.118898Get rights and content

Highlights

  • A very lightweight and efficient image super-resolution network has been proposed.

  • A high-/low-frequency feature grouping fusion block has been proposed.

  • A multi-way attention block has been proposed.

  • A lightweight residual concatenation block has been proposed.

  • A progressive interactive group convolution has been proposed.

Abstract

Deep convolutional neural networks significantly improve the performance of single image super-resolution (SISR). Generally, larger networks (i.e., deeper and wider) have better performance. However, larger networks require higher computing and storage costs, which limit their application on resource-constrained devices. Lightweight SISR networks with fewer parameters and smaller computational workloads are highly desirable. The key challenge is to obtain a better balance of model complexity and performance. In this paper, we propose a very lightweight and efficient SISR network. Our main contributions include: (1) Propose a frequency grouping fusion block (FGFB), which can better fuse high-/low-frequency feature information; (2) Propose a multi-way attention block (MWAB), which can exploit the multiple different clues of the feature information; (3) Propose a lightweight residual concatenation block (LRCB), which can combine the advantages of the residual connection and the channel concatenation; (4) Propose a lightweight convolutional block (LConv) for image super-resolution, which can significantly reduce the number of parameters; (5) Propose a progressive interactive group convolution (PIGC), which is more effective than the conventional group convolution. Extensive experimental results demonstrate that our method is significantly superior to other state-of-the-art methods currently available, with a better balance between model complexity and performance.

Introduction

Single image super-resolution (SISR) (Freeman, Pasztor, & Carmichael, 2000) is a classical task in the field of computer vision, aiming at recovering the high-resolution (HR) image from the corresponding low-resolution (LR) counterpart. It has a wide range of applications, such as video surveillance (Zou & Yuen, 2012), medical diagnosis (Shi et al., 2013) and remote sensing imaging (Thornton, Atkinson, & Holland, 2006).

Convolutional neural networks (CNNs) (Dong, Loy, He, & Tang, 2014) have significantly improved the performance of SISR and have dominated the current research on SISR. However, CNN-based SISR methods are heavily dependent on the size of the network, i.e., the depth (number of layers) and width (number of channels) of the network. Larger SISR networks are more expressive and, usually, have better performance. For example, EDSR (Lim, Son, Kim, Nah, & Mu Lee, 2017) has 65 convolutional layers and 43 M parameters, and RCAN (Zhang, Li et al., 2018) has more than 400 convolutional layers and 16 M parameters. While the methods such as EDSR and RCAN have good performance, they require high computational and memory costs and are difficult to apply to devices with limited resources (e.g., mobile phones). Lightweight SISR networks (generally considered to have less than 1 M parameters) are highly desirable. Current SISR methods, especially lightweight SISR methods, face a common challenge: The reconstructed super-resolution (SR) images often suffer from blurring and distortion due to loss of high-frequency information such as texture (Ahn, Kang, & Sohn, 2018).

The network architectures used by the current SISR methods mainly consist of residual connection (Lim et al., 2017), dense connection (Zhang, Tian, Kong, Zhong & Fu, 2018) and channel attention mechanism (Zhang, Li et al., 2018), which mostly remove the popular batch normalization and pooling layers (Huang et al., 2021, Lim et al., 2017) in classification networks (He, Zhang, Ren, & Sun, 2016) to reduce the feature information loss and improve the feature information utilization and expressiveness. The common dense connection schemes (Zhang, Tian et al., 2018), as shown in Fig. 1(a). The extracted hierarchical features are concatenated together. Qiu, Wang, Tao, and Cheng (2019) show that the shallow features of SISR networks contain more low-frequency information and the deep features contain more high-frequency information. Low-frequency information is composed of simpler structures and textures, where simpler functions are needed to recover them; high-frequency information is composed of complex structures and textures, which require more complex recovery functions. The residual and dense connections are not the best way to transfer the information of shallow layers to the deep layers, because the deep layers are prone to overfit the low-frequency information, resulting in distortion of the restored image. Recently, Luo et al. (2020) propose a novel backward sequential concatenation scheme, as shown in Fig. 1(b). A 1 × 1 convolutional layer is used to reduce the number of feature channels output by each FEB by half, and the dimensionality-reducing features are gradually concatenated from back to front, which can better restore the high-frequency information in the SR image.

In this paper, we propose a very lightweight and efficient image super-resolution network (VLESR). Our core contribution is that mainly inspired by the work of Luo et al. (2020), we propose to group and fuse high-/low-frequency features in pairs to better exploit the feature information. The features with the highest difference between low-frequencies and high-frequencies form the first group, the features with the next highest difference form the second group, and so on. Then, starting from the feature group with the smallest frequency difference, the features of each group are gradually fused until the feature group with the largest frequency difference. In addition, we propose a multi-way attention block (MWAB) to mine the multiple different clues of the feature information. In order to make our model lightweight and effective enough, we propose a lightweight residual concatenation block (LRCB), a lightweight convolutional block (LConv), and a progressive interactive group convolution (PIGC). The contributions of our work can be summarized as follows:

  • Propose a very lightweight and efficient image super-resolution network (VLESR), which has a better balance of complexity and performance and outperforms the other state-of-the-art methods (please refer to Fig. 2).

  • Propose a frequency grouping fusion block (FGFB), which can better fuse high-frequency and low-frequency feature information.

  • Propose a multi-way attention block (MWAB), which can exploit the multiple different clues of the feature information.

  • Propose a lightweight residual concatenation block (LRCB), which can combine the advantages of the residual connection and the channel concatenation.

  • Propose a lightweight convolutional block (LConv) for image super-resolution, which can significantly reduce the number of parameters.

  • Propose a progressive interactive group convolution (PIGC), which is more effective than the conventional group convolution.

The rest of the paper is organized as follows. In Section 2, we review related works. Section 3 describes our method. Section 4 illustrates detailed experimental results and comparisons against other state-of-the-art methods. Section 5 concludes the paper. For ease of reading, the main abbreviations in this paper are summarized in Table 1.

Section snippets

Lightweight SISR methods based on CNNs

Dong et al. (2014) propose the first CNN-based SISR method, called SRCNN. SRCNN, which only contains three convolutional layers, learns the non-linear mapping between LR and HR images end-to-end and outperforms the traditional SISR methods. Kim, Kwon Lee, and Mu Lee (2016a) propose VDSR, which uses skip connections to learn residual information and increases the number of convolutional layers to 20, which further improves the performance of SISR. In general, deeper (more convolutional layers)

Network architecture

Our VLESR network architecture, shown in Fig. 3(a), mainly consists of a 3 × 3 convolutional layer, a deep feature extraction block (DFEB), a frequency grouping fusion block (FGFB), and an Upsampler. DFEB contains four residual attention blocks (RABs), and Upsampler uses subpixel convolution (Shi et al., 2016). Assuming the input LR image is denoted as ILR, the ILR firstly passes through the 3 × 3 convolutional layer: F0=f3×3(ILR)where f3×3() is the 3 × 3 convolutional function, and F0 is its

Datasets

For a fair comparison with other state-of-the-art methods, the common DIV2K dataset (Timofte, Agustsson, Van Gool, Yang, & Zhang, 2017) was used as the training and validation dataset. DIV2K consists of 800 training images (001–800) and 100 validation images (801–900). The ten images (801–810) were used for validation, labeled as DIV2K-10. The original HR training images were bicubically downsampled to obtain the paired LR training images. Similar to other methods, we also randomly performed

Conclusions

In this paper, we propose a very lightweight and efficient image super-resolution network, called VLESR. The lightweight residual concatenation block (LRCB) can better propagate and fuse local features; the multi-way attention block (MWAB) can better combine the features of different cues and improve the feature expressiveness; the high-/low-frequency feature grouping fusion block (FGFB) can better fuse the high-/low-frequency feature information, and improve the quality of SR image

CRediT authorship contribution statement

Dandan Gao: Methodology, Software, Writing – original draft. Dengwen Zhou: Supervision, Project administration, Methodology, Writing – reviewing and editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (50)

  • HuangZ. et al.

    Dual-path attention network for single image super-resolution

    Expert Systems with Applications

    (2021)
  • AgarapA.F.

    Deep learning using rectified linear units (relu)

    (2018)
  • Ahn, N., Kang, B., & Sohn, K.-A. (2018). Fast, accurate, and lightweight super-resolution with cascading residual...
  • AnwarS. et al.

    Densely residual laplacian super-resolution

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2022)
  • BevilacquaM. et al.

    Low-complexity single-image super-resolution based on nonnegative neighbor embedding

  • DongC. et al.

    Learning a deep convolutional network for image super-resolution

  • DongC. et al.

    Accelerating the super-resolution convolutional neural network

  • FreemanW.T. et al.

    Learning low-level vision

    International Journal of Computer Vision

    (2000)
  • Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., & Fang, Z., et al. (2019). Dual attention network for scene segmentation....
  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE...
  • Hu, J., Shen, L., & Sun, G. (2020). Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer...
  • HuangJ.-B. et al.

    Single image super-resolution from transformed self-exemplars

  • Hui, Z., Gao, X., Yang, Y., & Wang, X. (2019). Lightweight image super-resolution with information multi-distillation...
  • Hui, Z., Wang, X., & Gao, X. (2018). Fast and accurate single image super-resolution via information distillation...
  • IttiL. et al.

    A model of saliency-based visual attention for rapid scene analysis

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1998)
  • Kim, J., Kwon Lee, J., & Mu Lee, K. (2016a). Accurate image super-resolution using very deep convolutional networks. In...
  • Kim, J., Kwon Lee, J., & Mu Lee, K. (2016b). Deeply-recursive convolutional network for image super-resolution. In...
  • KingmaD.P. et al.

    Adam: A method for stochastic optimization

    (2014)
  • Lai, W.-S., Huang, J.-B., Ahuja, N., & Yang, M.-H. (2017). Deep laplacian pyramid networks for fast and accurate...
  • Li, J., Fang, F., Mei, K., & Zhang, G. (2018). Multi-scale residual network for image super-resolution. In Proceedings...
  • Li, Z., Yang, J., Liu, Z., Yang, X., Jeon, G., & Wu, W. (2019). Feedback Network for Image Super-Resolution. In...
  • LiW. et al.

    Lapar: Linearly-assembled pixel-adaptive regression network for single image super-resolution and beyond

    Advances in Neural Information Processing Systems

    (2020)
  • Lim, B., Son, S., Kim, H., Nah, S., & Mu Lee, K. (2017). Enhanced deep residual networks for single image...
  • Liu, J.-J., Hou, Q., Cheng, M.-M., Wang, C., & Feng, J. (2020). Improving convolutional networks with self-calibrated...
  • LiuJ. et al.

    Residual feature distillation network for lightweight image super-resolution

    European conference on computer vision

    (2020)
  • Cited by (23)

    • Spatial Bias for attention-free non-local neural networks

      2024, Expert Systems with Applications
    View all citing articles on Scopus

    The code (and data) in this article has been certified as Reproducible by Code Ocean: (https://codeocean.com/). More information on the Reproducibility Badge Initiative is available at https://www.elsevier.com/physical-sciences-and-engineering/computer-science/journals.

    View full text