ESKN: Enhanced selective kernel network for single image super-resolution
Introduction
Single image super-resolution (SISR) aims at reconstructing a high-resolution (HR) image with abundant details and textures from its low-resolution (LR) version. It provides an effective technique to increase the spatial resolution of optical sensors and thus has attracted considerable attention from both the academic and industrial communities. Recently, many machine learning-based SISR algorithms have been developed. However, SISR remains a challenging ill-posed problem, as one specific LR input can correspond to many possible HR versions and the mapping space is too vast to explore.
In the past few years, convolution neural networks (CNNs) have achieved great performance in SISR, and a recent research direction toward better super-resolution is through designing more sophisticated modules or pipelines. Specifically, extracting multi-scale context via parallel convolutional streams has been actively studied [1], [2], [3] and they have achieved very competitive SR performance. However, most of these approaches employ a simple concatenation layer and a convolutional layer to linearly aggregate features coming from multiple branches [4]. Such a linear aggregation may result in networks with insufficient adaption capacity. Selective Kernel Module (SKM) is proposed to integrate multi-scale features by calculating input-specific importance scores via channel attention mechanism and a operation [4]. However, the channel-wise weights are computed based on the extracted features after performing the global average pooling (GAP) operation, thus the normal channel attention mechanism would not satisfactorily boost the insignificant but important features. Moreover, imposes a competition relationship between multi-scale features, which might be sub-optimal for SISR tasks.
Deeper networks have been proved effective in SISR. But as the depth increases, the network usually suffers from training difficulty and limited performance gain [5]. The underlying cause of this phenomenon is a lack of long-term memory [6], e.g., high-level features from later layers do not contain low-level information from earlier layers in the pipeline. Many methods have been developed to overcome this issue by forwarding low-level features to subsequent layers via skip connections [1], [5], [6], [7]. To avoid the neglect of useful low-level features, a simple and natural strategy is to pass features extracted in every layer to the end of the network, as proposed in [1], [7]. However, concatenating all these features at the end of the pipeline results in a huge amount of redundant information, and the computational cost of the subsequent convolution significantly increase with the growth of module numbers. In addition, most skip connection-based methods integrate low-level and high-level features by directly adding/concatenating them together, neglecting the difference between local information (low-level) and semantic information (high-level). In fact, a pixel in the high-level features is corresponding to a region of pixels in the low-level features.
To tackle these aforementioned critical issues, we firstly design an enhanced feature extraction module (i.e., ESKM) based on the selective kernel module (SKM) [4]. Since the channel-wise weights are calculated based on the extracted features after applying the global average pooling (GAP) operation, corresponding weights for insignificant but important features might be very small. Some of the learned filters can extract certain important local structures (e.g., textures or details). Whether their corresponding features have strong or weak signals, these filters and corresponding channels are important. The key of the proposed ESKM is to integrate a filter-oriented weights re-calibration process to calculate the extra weights for different filters, thus can better extract insignificant but important features (e.g., textures or details) which are critical for high-accuracy SISR tasks. To further improve the performance of ESKM, we replace the function with , and remove the dimension reduction/expansion to preserve important information for the subsequent channel attention-based feature re-calibration. Such enhanced selective kernel module (ESKM) allows our proposed SISR model to generate highly discriminative features for high-quality SISR by emphasizing the important features. Secondly, we also propose a novel connection scheme, named symmetric connection scheme (SCS), which adds low-level features with corresponding high-level features in the symmetric position. Since the spatial information encoded in the low-level features is very different from the semantic information encoded in the high-level features, simple addition may cause unstable training process. As an effective remedy, the low-level features with rich spatial information are first adjusted by a spatial attention module before adding with the extracted high-level features to further emphasize their consistency for the effective fusion of hierarchical features. Compared with existing connection schemes, SCS can make better use of the spatial information encoded in low-level features, improve the gradient flow, and support more consistent fuse of hierarchical features. Thus, it can reduce the training difficulty of the network and enhance super-resolution results. By stacking a sequence of ESKMs via SCS, we propose an enhanced selective kernel network (ESKN) to better extract and aggregate multi-scale features and ease the training difficulty. Fig. 1 shows the architecture of the proposed ESKN.
This work has the following three main contributions.
- •
We design an enhanced selective kernel module (ESKM) based on SKM. The most significant improvement is to integrate a re-calibration process to adjust the weights for different filters thus can better characterize insignificant but important features (e.g., textures or details). Moreover, we replace the operation by operation for more flexible weights learning, and remove dimension reduction/expansion component to build a direct correspondence between channels and their weights. Such ESKM can better adaptively fuse multi-scale features extracted from multiple branches with different kernel sizes.
- •
To better utilize the hierarchical features extracted by ESKMs, we design a symmetric connection scheme (SCS) to pass the low-level features via skip connections and fuse with high-level features in the symmetric position. Different from the existing skip connection-based designs which directly add/concatenate low-level and high-level features, we add an extra spatial attention module to adjust the low-level features for more consistent/stable fusion with the corresponding high-level features. This connection scheme makes better use of low-level features and improves the gradient flow, and hence, enhances the network performance.
- •
By integrating ESKMs using SCS, we compose a compact but powerful network (ESKN) for high-quality SISR. The proposed ESKN model show superior performance over the state-of-the-art SISR methods [7], [8] on multiple benchmark datasets, achieving more accurate image restoration results with fewer parameters.
The remainder of this paper is organized as follows. We firstly review related deep learning-based SISR methods in Section 2. Then Section 3 elaborates details of key components in our ESKN model and the implementation settings. We evaluate our model and conduct qualitative and quantitative comparisons with state-of-the-art methods in Section 4, and conclude the paper in Section 5.
Section snippets
Related work
Over the past decades, developing effective SISR techniques to reconstruct an HR image from its corresponding single LR version has attracted extensive attention from both the academic and the industrial communities. Recently, CNN-based approaches have achieved the state-of-the-art performance in SISR. Therefore, we mainly focus on reviewing the CNN-based methods.
Approach
Figure 1 shows the pipeline of our proposed ESKN model. Our ESKN is composed of three sub-networks: an initial feature extraction sub-network (IFENet) to learn feature maps from low-resolution input , a feature mapping sub-network (FMNet) to transform low-level features into high-level ones, and a reconstruction sub-network (RNet) to reconstruct the super-resolved high-resolution image . The core of ESKN is the FMNet designed to learn more informative features for SR. The FMNet is
Datasets and metrics
Training. Like most recent SISR methods, we made use of 800 training images from the DIVerse 2K resolution image dataset (i.e., DIV2K) [40] to train our ESKN. In each training batch, 16 LR RGB patches with the size of and corresponding HR patches are randomly cropped. They are then randomly augmented by horizontal or vertical flips and rotations. We pre-processed all the images by subtracting the mean RGB value of the DIV2K dataset.
Testing. Five commonly used public benchmark datasets
Conclusions
We present a novel enhanced selective kernel network (ESKN) for single image super-resolution. Specifically, we design an enhanced selective kernel module (ESKM) by revising the selective kernel module (SKM) (introduced by [4] for image classification). For ESKM, we (1) introduce a new self-learned filter-oriented weight, (2) use to avoid unnecessary competitions introduced by operation from SKM, and (3) simplify the two FC layers to one to avoid hyper-parameter tuning and
CRediT authorship contribution statement
Zewei He: Conceptualization, Methodology, Software, Writing – original draft. Guizhong Fu: Software, Data curation. Yanpeng Cao: Writing – review & editing, Formal analysis, Funding acquisition. Yanlong Cao: Supervision, Project administration. Jiangxin Yang: Supervision, Project administration. Xin Li: Conceptualization, Writing – original draft.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was supported in part by the National Key Research and Development Program of China (2020YFB1711400) and the National Natural Science Foundation of China (52075485).
References (42)
- et al.
Fast and accurate single image super-resolution via an energy-aware improved deep residual network
Signal Process.
(2019) - et al.
Multi-scale residual network for image super-resolution
ECCV
(2018) - et al.
MRFN: multi-receptive-field network for fast and accurate single image super-resolution
IEEE Trans. Multimedia
(2020) - et al.
Single image super-resolution via cascaded multi-scale cross network
arXiv preprint
(2018) - et al.
Selective kernel networks
CVPR
(2019) - et al.
MemNet: a persistent memory network for image restoration
ICCV
(2017) - et al.
Residual dense network for image super-resolution
CVPR
(2018) - et al.
Enhanced deep residual networks for single image super-resolution
CVPR workshop
(2017) - et al.
Learning a deep convolutional network for image super-resolution
ECCV
(2014) - et al.
Image super-resolution using deep convolutional networks
IEEE Trans. Pattern Anal. Mach. Intell.
(2016)
Accurate image super-resolution using very deep convolutional networks
CVPR
Deeply-recursive convolutional network for image super-resolution
CVPR
Image super-resolution via deep recursive residual network
CVPR
Accelerating the super-resolution convolutional neural network
ECCV
Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network
CVPR
Deep Laplacian pyramid networks for fast and accurate super-resolution
CVPR
Deep Laplacian pyramid networks for fast and accurate super-resolution
IEEE Trans. Pattern Anal. Mach. Intell.
Photo-realistic single image super-resolution using a generative adversarial network
CVPR
Image super-resolution using dense skip connections
ICCV
Deep residual learning for image recognition
CVPR
NTIRE 2017 challenge on single image super-resolution: methods and results
CVPR Workshop
Cited by (7)
Multi-scale non-local attention network for image super-resolution
2024, Signal ProcessingSKND-TSACNN: A novel time-scale adaptive CNN framework for fault diagnosis of rotating machinery
2023, Knowledge-Based SystemsSingle image super‐resolution based on progressive fusion of orientation‐aware features
2023, Pattern RecognitionCitation Excerpt :Single image super-resolution (SISR) aims to restore a high-resolution (HR) image containing abundant details and textures based on its low-resolution (LR) version [1–4].
Multi-fidelity and learning-regularization for single image super resolution
2022, Journal of the Franklin InstituteDEA-Net: Single Image Dehazing Based on Detail-Enhanced Convolution and Content-Guided Attention
2024, IEEE Transactions on Image Processing