Elsevier

Neurocomputing

Volume 275, 31 January 2018, Pages 267-277
Neurocomputing

A two-channel convolutional neural network for image super-resolution

https://doi.org/10.1016/j.neucom.2017.08.041Get rights and content

Abstract

A two-channel convolutional neural network (including one shallow and one deep channel) is proposed for the single image super-resolution (SISR). Most existing methods based on convolution neural networks (CNNs) for super resolution have a shallow channel which easily loses the detailed information. And these methods need preprocessing such as bicubic interpolation enlarging LR images to the size of HR images, which may introduce new noises. Meanwhile, most of them use only one fixed filter during the reconstruction. The proposed algorithm solves the above problems, which is named shallow and deep convolutional networks for image super-resolution (SDSR). First, the proposed method uses two channels: shallow and deep channel. The shallow channel mainly restores the general outline of the image. On the contrast, the deep channel extracts the detailed texture information. Second, the proposed method directly learns an end-to-end mapping between low-resolution (LR) and high-resolution (HR) images, which does not need hand-designed preprocessing. The upsampling of the network by deconvolution is embedded in the two channels, which leads to much more efficient and effective training, reducing the computational complexity of the overall SR operation. Finally, during the last period of reconstruction, the deep channel adopts multi-scale manner, which can extract both the short- and long-scale texture information simultaneously. Our model is evaluated on the different public datasets including images and videos. Experimental results demonstrate that the proposed method outperforms the existing methods in accuracy and visual impression.

Introduction

As a ill-posed nature of the underdetermined problem, single image super-resolution (SISR) aims at restoring the high resolution (HR) image with abundant highfrequency details from the low resolution (LR) observation. Deep learning is a new field in the study of machine learning, the motive is to establish neural networks to imitate the human brain mechanism to explain the data, such as images, sound and text. Convolutional neural network (CNN) is a kind of machine learning model of deep learning under the supervision. Recently, CNN has shown excellent performance in various computer vision tasks, such as image classification, object detection, semantic segmentation, and action recognition [1], [2], [3], [4], [5], [6]. Recently, CNN also shows excellent performance in the single image super-resolution (SISR) [7], [8], [9], [10], [11]. SR is inherently illposed with insufficient knowledge. The ill-posed nature is particularly pronounced for high upscaling factors, for which texture detail in the reconstructed SR images is typically absent. But it is required in some areas to get high resolution images. For example, we can use deep learning technologies to promote the perception and positioning function of the unmanned system.

Traditional SISR methods are based on interpolation, such as bicubic interpolation and Lanczos resampling [12]. Then algorithms based on reconstruction constraint are widely studied, including the iterative back projection (IBP) [13], maximum a posteriori probability (MAP) [14], projections onto convex sets (POCS) [15], which can merge more prior knowledge and be used in varieties motion models. These spatial domain SR methods are time-consuming and need large amount of iterative calculations.

Lately, learning-based methods have been extensively used to model a mapping from LR to HR patches. Neighbor embedding and locally linear embedding (NE+LLE) [16] method interpolates the patch subspace. Sparse coding (SC) [17] method uses a learned compact dictionary based on sparse signal representation. Sparse coding based network (SCN) [18] achieves notable improvement over the generic SC model. The cascade of SCNs (CSCN) [19] also benefits from the end-to-end training of deep network with a specially designed multi-scale cost function. But most of them rely on hand-designed features to characterize LR images. The speed of restoring images is slow. Therefore, most of them are with high computational complexity and cannot achieve an end-to-end direct amplification. Recently, a new trend of combining neural network with traditional algorithm is on the rise. Extreme learning machine autoencoder (ELM-AE) [20] adds to a new deep neural network. Autoencoders imbedded into network [21], [22] are also used in human pose recovery. [23] presents an AdaBoost-based learning method to learn a non-linear feed-forward artificial neural network with a single hidden layer and a single output neuron.

CNN-based method, as a biologically inspired learning model, has provided a new inspiration and direction for SR problem. Super-Resolution Convolutional Neural Network (SRCNN) [7] proposed by Dong et al. drew considerable attention due to its simple network structure and excellent restoration quality. The authors shortly accelerated the algorithm by reducing network parameters, proposing Fast Super-Resolution Convolutional Neural Networks (FSRCNN) [9]. However, there are still some drawbacks. First, as a pre-processing step, the original LR image is upsampled to the desired size by bicubic interpolation to form the input of the network. Second, Reconstruction of the detailed information is still unsatisfactory. So a two-channel method is proposed to solve the above problems. The shallow channel mainly restores the general outline of the image. On the contrast, the deep channel extracts the detailed texture information. An end-to-end model is established by embedding the upsampling into the two channels, which avoids introducing new errors. The upsampling is carried out by deconvolution. For getting more detailed information, the multi-scale manner is adopted to restore the different scale (including long- and short-scale)texture information in the deep channel. At last, the article combines the deep and shallow channel to get the final HR image. The proposed model is evaluate in the different databases, which shows good robustness, whether on images or highway videos reconstruction, laying a foundation to the unmanned systems.

Section snippets

Related work

There have been numerous publications over the last five years using deep learning on the SR areas. Compared with traditional SR methods, which depend on handcrafted features, deep learning may further improve the performance. CNN-based methods, as a biologically inspired learning model, have provided a new inspiration and direction for SR problem.

As in most existing SR methods [24], [25]via deep learning, SRCNN is a shallow network, having only three convolution layers. And each layer has

Proposed method

In order to get accurate and efficient images, a two-channel method is proposed, which mainly contains very deep channel and a shallow channel. The deep channel consists of 19 layers, and the shallow channel includes 3 layers. Fig. 3 shows the architecture of the proposed network. The deep channel conceptually consists of four steps: feature extraction (3 layers), mapping (5 layers), upsampling (3 layers) and multi-scale reconstruction (8 layers). The deep layers with deep hierarchy constantly

Datasets for tTraining and tTesting

Training dataset The 91-image dataset proposed in [17] is widely used as the training set in learning based SR methods [7], [8], [9], [34]. As big data generally get better results, we also use General-100 dataset [9] that contains 100 bmp-format images with no compression, which are very suitable for the SR training. Therefore, the original data is totally 191 images. To make the dataset more efficient, we augment the original images with two steps. 1) Scaling: each image is scaled by the

Conclusion

In this paper, we have presented a super-resolution method using shallow and deep networks, which directly extracts features from the original LR images, and learns to upscale the resolution in the latent feature space. For the reconstruction with multi-scale manner, it can restore more details both on images and videos. As to lay a good foundation for self-driving technology. Experimental results both visually and objectively support that our SDSR method outperforms state-of-the-art SR

Acknowledgments

This paper is supported by the National Natural Science Foundation of China (NSFC), one is the key international (regional) cooperation research projects -the study of 3D image/video coding, content processing, key techniques of quality evaluation under grant number 61520106002. Another is the general project -based on the motion and depth perception of stereoscopic visual comfort study (61471262). We gratefully acknowledge the support of NSFC.

Sumei Li received her Ph. D. degree from the Nankai University, Tianjin, China, in 2004. She jointed Tianjin University, China, in 2006, where she is currently an associate professor in School of electrical automation and information engineering. Her research interests are the area of (3D) digital image processing, visual quality evaluation, pattern recognition and neural network.

References (41)

  • JiS. et al.

    3d convolutional neural networks for human action recognition

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2013)
  • DongC. et al.

    Learning a deep convolutional network for image super-resolution

    European Conference on Computer Vision

    (2014)
  • DongC. et al.

    Image super-resolution using deep convolutional networks

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2016)
  • DongC. et al.

    Accelerating the super-resolution convolutional neural network

    European Conference on Computer Vision

    (2016)
  • ShiW. et al.

    Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2016)
  • C.E. Duchon

    Lanczos filtering in one and two dimensions

    J. Appl. Meteorol.

    (1979)
  • R.R. Schultz et al.

    Extraction of high-resolution frames from video sequences

    IEEE Trans. image Process.

    (1996)
  • H.H. Bauschke et al.

    On projection algorithms for solving convex feasibility problems

    SIAM Rev.

    (1996)
  • ChangH. et al.

    Super-resolution through neighbor embedding

    Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004.

    (2004)
  • YangJ. et al.

    Image super-resolution via sparse representation

    IEEE Trans. Image Process.

    (2010)
  • Cited by (29)

    • A practical generative adversarial network architecture for restoring damaged character photographs

      2021, Neurocomputing
      Citation Excerpt :

      One usually uses image processing software, such as Photoshop (PS) and Metuxiu, to repair old and damaged photographs, at the cost of a long time and a lot of human resources. In recent years, artificial intelligence (AI) and deep learning (DL) have been widely applied in the field of image processing [1], such as object detection [2], image segmentation [3], super-resolution [4], image denoising [5], and image deblur [6]. Compared to the traditional image processing method, the image processing method based on DL has achieved excellent performance [7].

    • Joint blur kernel estimation and CNN for blind image restoration

      2020, Neurocomputing
      Citation Excerpt :

      Conventional image restoration techniques include the Wiener and Kalman Filtering technique, the regularization technique, the variational regularization technique, joint statistical regularization technique, and the fusion regularization technique under the assumption that the blur kernel is known [3–6]. Recently, it was reported that convolutional neural networks can be well applied to non-blind image restoration and super-resolution [7–16]. For example, Schuler et al. presented a two-step procedure based on a neural network [7].

    • Super-resolution using multi-channel merged convolutional network

      2020, Neurocomputing
      Citation Excerpt :

      As shown in Fig. 4, the fusion module comprises a concatenation layer and three bottleneck layers. The pixel-level summation has many uncertain factors, so bad results could arise from feature assembly using the element-wise operation, which adds the output of both sub-nets for combination [32]. In contrast, the concatenation operation can integrate all the features without omission.

    View all citing articles on Scopus

    Sumei Li received her Ph. D. degree from the Nankai University, Tianjin, China, in 2004. She jointed Tianjin University, China, in 2006, where she is currently an associate professor in School of electrical automation and information engineering. Her research interests are the area of (3D) digital image processing, visual quality evaluation, pattern recognition and neural network.

    Ru Fan received the B.S. degree in communication engineering from Hebei University, Baoding, China, in 2015, and she is currently working towards the master’s degree at the school of Electrical and Information Engineering, Tianjin University, Tianjin, China. His research interests include image processing, stereo matching and deep learning

    Guoqing Lei graduated from Changchun University of Science and Technology in 2016, now studying at the University of Tianjin for a master’s degree. Her research interests are the area of image reconstruction and neural network.

    Guanghui Yue received the B.S. degree in communication engineering from Tianjin University, Tianjin, China, in 2014, and he is currently working towards the Ph.D. degree at the school of Electrical and Information Engineering, Tianjin University, Tianjin, China. His research interests include bioelectrical signal processing, image quality assessment and 3-D image visual discomfort prediction.

    Chunping Hou received the M.Eng. and Ph.D. degrees, both in electronic engineering, from Tianjin University, Tianjin, China, in 1986 and 1998, respectively. Since 1986, she has been the faculty of the School of Electronic and Information Engineering, Tianjin University, where she is currently a Full Professor and the Director of the Broadband Wireless Communications and 3D Imaging Institute. Her current research interests include 3D image processing, 3D display, wireless communication, and the design and applications of communication systems.

    View full text