Elsevier

Neurocomputing

Volume 394, 21 June 2020, Pages 136-145
Neurocomputing

Super-resolution using multi-channel merged convolutional network

https://doi.org/10.1016/j.neucom.2019.04.089Get rights and content

Abstract

Single-image super-resolution (SISR) has been an important topic due to the demand for high-quality virtual images in the field of visual artificial intelligence. Methods based on deep learning have achieved great success based on the excellent capability of grasping complicated features of deep convolutional networks. The performance can be improved slightly but not obviously by simply widening or deepening the network. In this paper, we propose a merged convolutional network for super-resolution, which extracts more adequate details to restore high-resolution images. We used dense blocks for feature extraction to concatenate deep features with shallow features in depth. We also designed two sub-nets with distinct convolution kernels as different branches of the network, which can widen the network and improve the performance of the system. Finally, we employed sub-pixel layers to avoid feature distortion for up-sampling at the very end. Our method was evaluated using several standard benchmark datasets. The results demonstrate superior performance and good robustness compared with state-of-the-art methods.

Introduction

As a typical ill-posed inverse problem, single-image super resolution (SISR) [1] is aimed at reconstructing a high-resolution (HR) image with abundant high-frequency details based on its corresponding low-resolution (LR) image. This method is widely used to generate high-definition virtual samples for data-based artificial intelligence in fields such as medical image diagnosis [2], face hallucination [3], and surveillance [4].

Since Tsai et al. first proposed SISR in 1984 [5], a variety of SISR algorithms have been proposed to solve this ill-posed problem. Super-resolution methods can be divided into the following three categories according to the different basic theories for solving the inverse issue: interpolation-based methods, reconstruction-based methods, and learning-based methods. Interpolation-based methods [6], [7] are based on a smoothness assumptions, including bicubic and bilinear, and they are simple and fast. Reconstruction-based methods [8], [9] use LR images with regularizations to reconstruct HR images. However, due to the inadequate high-frequency information, these methods produce obvious blurring and aliasing artifacts along the salient edges and show clumsy performance on non-smooth regions.

In recent years, learning-based SR methods such as random forest [10] and sparse representation [11], [12] have been an important topic. These methods exploit datasets composed of LR and HR image patch pairs to infer the mapping between LR and HR feature spaces or directly between the image exemplar pairs. Schulter et al. [10] proposed a method to directly map LR to HR patches using the random forest method. They show the close relation of previous work on SISR to locally linear regression and demonstrate how random forests fit in this framework.

Yang et al. [11] presented a sparsity-based model of SR reconstruction, which hypothesizes that LR patches could be sparsely and linearly represented by a training set. They use first- and second-order derivatives to extract features in order to gain frequency information for training an over-complete dictionary. Based on this, Zeyde et al. [13] introduced principal component analysis and boot-strapping to the sparse representation framework to reduce the computational workload and increase robustness. Huang et al. [14] introduced a difference-curvature-based SPP method to partition the whole feature space into several groups for further processing and proposed a mixed matching strategy to enhance the performance in super-resolution.

Inspired by the great success achieved with deep convolutional neural networks (CNNs) in the field of artificial intelligence, deep-learning-based super-resolution algorithms have become mainstream methods. Using fundamental differences from external example-based approaches, Dong et al. [15], [16] proposed a CNN model for SISR called Super-resolution Convolutional Neural Network (SRCNN). They designed a shallow 3-layer CNN to learn the mapping between LR and HR images directly. They achieved promising performance and demonstrated that deep learning is useful in the classical computer vision problem of super-resolution.

Dong et al. later proposed a fast version based on their existing structure to learn the mapping from LR to HR directly via a deconvolution layer that replaces the upscaling operator with interpolation. This method is called Fast Super-resolution Convolutional Neural Network (FSRCNN), which shows that enabling the network to learn the upscaling filters directly can further increase the accuracy and speed [17]. Shi et al. [18] designed an efficient sub-pixel convolution layer that is used at only the very end of the network, which learns an array of upscaling filters to upscale the final LR feature maps into the HR output. The Efficient Sub-Pixel Convolutional Neural Network (ESPCN) eliminates the need to perform most of the SR operation in the far larger HR resolution.

Many other CNN-based methods have also emerged. Inspired by the employment of VGG-net [19] for ImageNet classification, Kim et al. [20] designed a very deep convolutional network called VDSR. They used a 20-layer convolutional network with residual learning and showed that skip-connection and recursive convolution alleviate the burden of convolutional network. Hu et al. [21] proposed a cascaded multi-scale cross network in which a sequence of subnetworks is cascaded to infer high resolution features in a coarse-to-fine manner. Hui et al. [22] proposed a compact convolutional network to reduce the computational complexity and memory consumption in super-resolution. SRResNet [23] uses 16 residual blocks that are optimized for the mean squared error (MSE). They demonstrated that identity mappings [24] show great performance in super-resolution.

Lim et al. [25] presented a method called EDSR, which is a deeper and wider model based on SRResNet obtained by removing the batch normalization (BN) layers [26]. They indicated that the model without BN layers saves approximately 40% memory usage during training without performance degradation. Furthermore, exposing deeper networks with short-connections showed better reconstruction accuracy. However, deepening networks aggravates the vanishing-gradient problem, while simply widening networks induces much more redundant calculations, and the performance improvement decreases. Besides, the reconstruction of sharp details is also unsatisfactory.

Therefore, increasing the scale of convolutional networks by simply deepening or widening the network cannot improve the performance of reconstruction obviously. In order to solve this problem, we propose a novel multi-channel Merged Network for Super-Resolution architecture (MNSR) based on dense blocks [27], which solves the aforementioned problems by concatenating shallow and deep feature maps via identity mappings [28]. We established a structure with two sub-nets with different convolution kernels to extract features. The channel with small kernels extracts features with smaller receptive fields and more frequent feature reuse. In contrast, the channel with larger kernels extracts feature with bigger receptive fields.

An LR image is embedded in these two sub-nets by a prior feature extractor, and then the outputs of the sub-nets are merged into a new set of feature maps with a fusion module that uses bottleneck layers. The results are then aggregated with the prior feature map by a long skip-connection to collect lower-level pixel features. Our up-sampling layer in this model carries out sub-pixel [18] operation, followed by a mapping module to generate a HR image. Furthermore, we propose a flexible multi-loss mechanism to optimize the model to improve robustness of the reconstruction system. The proposed model was trained through LR and HR patch pairs and evaluated in different datasets. The results show state-of-the-art performance in terms of sharp detail reconstruction and good robustness.

Briefly, the contributions of this paper are the following:

  • We propose a novel MNSR model to combine detailed texture features and large-view shape binding forecast to collect a more adequate feature atlas for improving SR performance.

  • We deepen the sub-net by using dense blocks with identity mappings to concatenate shallow and deep feature maps and simultaneously promote feature reuse.

  • We propose a novel fusion method with a bottleneck layer to reduce parameters while keeping high reconstruction accuracy.

Section snippets

Related work

Methods based on deep learning [29], [30] have been an important research topic for a long time and have led to great success in many fields. Feature extraction using convolution layers before mapping has widely been used in SR [31], [32], [33] and has outperformed other methods. SRCNN [15], [16] is a shallow neural network model with three convolution layers and first achieved great success in SR by using deep learning. There have subsequently been a variety of methods based on CNN [31], [32],

Proposed method

As shown in Fig. 1, the proposed model conceptually comprises a prior feature extractor; two channels of sub-nets, which use 16 dense blocks with large or small size kernels in the convolution layers; a fusion module; a long skip-connection; an up-sampling layer; and a mapping module for reconstruction. We also propose a multi-loss mechanism for the model optimization.

Datasets

We used the DIV2K dataset [39], [40] to train and test the model. DIV2K is a newly proposed high-quality image dataset which has 2K pixels horizontally or vertically. DIV2K contains 800 training images, 100 validation images, and 100 test images. Because the ground truth HR images of the test dataset have not been released, we used the validation dataset for the test. To compare our network with other methods, we also used four standard benchmarks during the evaluation: Set5 [41], Set14 [42],

Conclusion

We have proposed a super-resolution method using a multi-channel merged network based on dense blocks, which can extract complementary features via convolution networks with different kernel size. We used a fusion module to assemble the feature maps, as well as a sub-pixel layer that does not omit or distort information to reconstruct the SR image. The quantitative and qualitative experimental results obtained using benchmark datasets demonstrated the superior performance of our method.

Declaration of interests

Tianjin Research Program of Application Foundation and Advanced Technology (17ZXRGGX00180).

Jinghui Chu received the B.Eng. degree in radio technology, and M.Eng. and Ph.D. degrees in signal and information processing all from Tianjin University, Tianjin, China, in 1991, 1997, and 2006, respectively. She is currently an associate professor in the School of Electronic Information Engineering, Tianjin University. Her teaching and research interests include digital video technology and pattern recognition.

References (50)

  • WangL. et al.

    Fast image upsampling via the displacement field

    IEEE Trans. Image Process.

    (2014)
  • S.A.A. Karim et al.

    Positivity preserving interpolation using rational bicubic spline

    Appl. Math.

    (2015)
  • S. Mallat et al.

    Super-resolution with sparse mixing estimators

    IEEE Trans. Image Process.

    (2010)
  • DaiS. et al.

    Softcuts: a soft edge smoothness prior for color image super-resolution

    IEEE Trans. Image Process.

    (2009)
  • S. Schulter et al.

    Fast and accurate image upscaling with super-resolution forests

    Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Boston, MA, USA, June 7–12

    (2015)
  • YangJ. et al.

    Image super-resolution as sparse representation of raw image patches

    Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR, 24–26 June

    (2008)
  • LiX. et al.

    A super-resolution algorithm based on adaptive sparse representation

    Proceedings of International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2015, Zhangjiajie, China, August 15–17

    (2015)
  • R. Zeyde et al.

    On single image scale-up using sparse-representations

    Proceedings of International Conference on Curves and Surfaces, Avignon, France, June 24–30

    (2010)
  • HuangY. et al.

    Single image super-resolution via multiple mixture prior models

    IEEE Trans. Image Process.

    (2018)
  • DongC. et al.

    Image super-resolution using deep convolutional networks

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2016)
  • DongC. et al.

    Learning a deep convolutional network for image super-resolution

    Proceedings of European Conference on Computer Vision, ECCV September 6–12

    (2014)
  • DongC. et al.

    Accelerating the super-resolution convolutional neural network

    Proceedings of European Conference on Computer Vision, ECCV, Amsterdam, The Netherlands, October 11–14

    (2016)
  • ShiW. et al.

    Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network

    Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Las Vegas, NV, USA, June 27–30

    (2016)
  • K. Simonyan et al.

    Very deep convolutional networks for large-scale image recognition

    Proceedings of International Conference on Learning Representations, ICLR, San Diego, CA, USA, May 7–9

    (2015)
  • KimJ. et al.

    Accurate image super-resolution using very deep convolutional networks

    Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Las Vegas, NV, USA, June 27–30

    (2016)
  • Cited by (12)

    • Blind image super-resolution based on prior correction network

      2021, Neurocomputing
      Citation Excerpt :

      It is a fundamental ill-posed inverse problem in low-level vision and has various real-world applications, such as medical imaging [48], image recognition [39,40], object detection [4], and others applications [38,7]. Convolutional neural networks (CNNs) based SISR methods [8,16] have made significant progress in improving SR performance in recent years. Most of these advanced SR methods train LR-HR image pairs end-to-end, where the LR image is obtained with a fixed and known downscaling blur kernel (e.g., bicubic downsampling) from the corresponding HR image.

    • Behavior of diesel particulate matter transport from subsidiary transportation vehicle in mine

      2021, Environmental Pollution
      Citation Excerpt :

      Owing to the strong maneuverability and rapid transportation of high loads, mine trackless rubber-tire vehicles are playing a key role in large-scale mine subsidiary transportation systems (Cui et al., 2019; Hu et al., 2016; Chen et al., 2020). A trackless rubber-tire vehicle generally uses a high-emission diesel engine for power supply and travels frequently during mining operations (Chu et al., 2020). Therefore, much discharged diesel particulate matter (DPM) accumulates in tunnels.

    View all citing articles on Scopus

    Jinghui Chu received the B.Eng. degree in radio technology, and M.Eng. and Ph.D. degrees in signal and information processing all from Tianjin University, Tianjin, China, in 1991, 1997, and 2006, respectively. She is currently an associate professor in the School of Electronic Information Engineering, Tianjin University. Her teaching and research interests include digital video technology and pattern recognition.

    Xiaochuan Li received his B.Eng. degree in Electronic Engineering from Tianjin University, Tianjin, China in 2016. He is now a master candidate in the School of Electronic and Information Engineering in Tianjin University, Tianjin, China. His research involves deep learning, pattern recognition, and digital image processing.

    Jiaqi Zhang received the B.Eng. degree in Electronic Engineering from Hohai University, Nanjing, China in 2016. She is currently a master candidate in the School of Electronic and Information Engineering in Tianjin University, Tianjin, China. Her research involves deep learning, computer vision and digital image processing.

    Wei Lu received his B.Eng. degree in Electronic Engineering, and Ph.D. degree in signal and information processing from Tianjin University, Tianjin, China, in 1998 and 2003, respectively. He is currently an associate professor in the School of Electronic Information Engineering, Tianjin University. His teaching and research interests include digital filter design, digital multimedia technology, embedded system design, Web application design, and pattern recognition. He is now a senior member of the Chinese Institute of Electronics.

    View full text