A two-channel convolutional neural network for image super-resolution
Introduction
As a ill-posed nature of the underdetermined problem, single image super-resolution (SISR) aims at restoring the high resolution (HR) image with abundant highfrequency details from the low resolution (LR) observation. Deep learning is a new field in the study of machine learning, the motive is to establish neural networks to imitate the human brain mechanism to explain the data, such as images, sound and text. Convolutional neural network (CNN) is a kind of machine learning model of deep learning under the supervision. Recently, CNN has shown excellent performance in various computer vision tasks, such as image classification, object detection, semantic segmentation, and action recognition [1], [2], [3], [4], [5], [6]. Recently, CNN also shows excellent performance in the single image super-resolution (SISR) [7], [8], [9], [10], [11]. SR is inherently illposed with insufficient knowledge. The ill-posed nature is particularly pronounced for high upscaling factors, for which texture detail in the reconstructed SR images is typically absent. But it is required in some areas to get high resolution images. For example, we can use deep learning technologies to promote the perception and positioning function of the unmanned system.
Traditional SISR methods are based on interpolation, such as bicubic interpolation and Lanczos resampling [12]. Then algorithms based on reconstruction constraint are widely studied, including the iterative back projection (IBP) [13], maximum a posteriori probability (MAP) [14], projections onto convex sets (POCS) [15], which can merge more prior knowledge and be used in varieties motion models. These spatial domain SR methods are time-consuming and need large amount of iterative calculations.
Lately, learning-based methods have been extensively used to model a mapping from LR to HR patches. Neighbor embedding and locally linear embedding (NE+LLE) [16] method interpolates the patch subspace. Sparse coding (SC) [17] method uses a learned compact dictionary based on sparse signal representation. Sparse coding based network (SCN) [18] achieves notable improvement over the generic SC model. The cascade of SCNs (CSCN) [19] also benefits from the end-to-end training of deep network with a specially designed multi-scale cost function. But most of them rely on hand-designed features to characterize LR images. The speed of restoring images is slow. Therefore, most of them are with high computational complexity and cannot achieve an end-to-end direct amplification. Recently, a new trend of combining neural network with traditional algorithm is on the rise. Extreme learning machine autoencoder (ELM-AE) [20] adds to a new deep neural network. Autoencoders imbedded into network [21], [22] are also used in human pose recovery. [23] presents an AdaBoost-based learning method to learn a non-linear feed-forward artificial neural network with a single hidden layer and a single output neuron.
CNN-based method, as a biologically inspired learning model, has provided a new inspiration and direction for SR problem. Super-Resolution Convolutional Neural Network (SRCNN) [7] proposed by Dong et al. drew considerable attention due to its simple network structure and excellent restoration quality. The authors shortly accelerated the algorithm by reducing network parameters, proposing Fast Super-Resolution Convolutional Neural Networks (FSRCNN) [9]. However, there are still some drawbacks. First, as a pre-processing step, the original LR image is upsampled to the desired size by bicubic interpolation to form the input of the network. Second, Reconstruction of the detailed information is still unsatisfactory. So a two-channel method is proposed to solve the above problems. The shallow channel mainly restores the general outline of the image. On the contrast, the deep channel extracts the detailed texture information. An end-to-end model is established by embedding the upsampling into the two channels, which avoids introducing new errors. The upsampling is carried out by deconvolution. For getting more detailed information, the multi-scale manner is adopted to restore the different scale (including long- and short-scale)texture information in the deep channel. At last, the article combines the deep and shallow channel to get the final HR image. The proposed model is evaluate in the different databases, which shows good robustness, whether on images or highway videos reconstruction, laying a foundation to the unmanned systems.
Section snippets
Related work
There have been numerous publications over the last five years using deep learning on the SR areas. Compared with traditional SR methods, which depend on handcrafted features, deep learning may further improve the performance. CNN-based methods, as a biologically inspired learning model, have provided a new inspiration and direction for SR problem.
As in most existing SR methods [24], [25]via deep learning, SRCNN is a shallow network, having only three convolution layers. And each layer has
Proposed method
In order to get accurate and efficient images, a two-channel method is proposed, which mainly contains very deep channel and a shallow channel. The deep channel consists of 19 layers, and the shallow channel includes 3 layers. Fig. 3 shows the architecture of the proposed network. The deep channel conceptually consists of four steps: feature extraction (3 layers), mapping (5 layers), upsampling (3 layers) and multi-scale reconstruction (8 layers). The deep layers with deep hierarchy constantly
Datasets for tTraining and tTesting
Training dataset The 91-image dataset proposed in [17] is widely used as the training set in learning based SR methods [7], [8], [9], [34]. As big data generally get better results, we also use General-100 dataset [9] that contains 100 bmp-format images with no compression, which are very suitable for the SR training. Therefore, the original data is totally 191 images. To make the dataset more efficient, we augment the original images with two steps. 1) Scaling: each image is scaled by the
Conclusion
In this paper, we have presented a super-resolution method using shallow and deep networks, which directly extracts features from the original LR images, and learns to upscale the resolution in the latent feature space. For the reconstruction with multi-scale manner, it can restore more details both on images and videos. As to lay a good foundation for self-driving technology. Experimental results both visually and objectively support that our SDSR method outperforms state-of-the-art SR
Acknowledgments
This paper is supported by the National Natural Science Foundation of China (NSFC), one is the key international (regional) cooperation research projects -the study of 3D image/video coding, content processing, key techniques of quality evaluation under grant number 61520106002. Another is the general project -based on the motion and depth perception of stereoscopic visual comfort study (61471262). We gratefully acknowledge the support of NSFC.
Sumei Li received her Ph. D. degree from the Nankai University, Tianjin, China, in 2004. She jointed Tianjin University, China, in 2006, where she is currently an associate professor in School of electrical automation and information engineering. Her research interests are the area of (3D) digital image processing, visual quality evaluation, pattern recognition and neural network.
References (41)
- et al.
G-ms2f: googlenet based multi-stage feature fusion of deep cnn for scene recognition
Neurocomputing
(2017) - et al.
A survey of deep neural network architectures and their applications
Neurocomputing
(2017) - et al.
Incorporating image priors with deep convolutional neural networks for image super-resolution
Neurocomputing
(2016) - et al.
Motion analysis for image enhancement: resolution, occlusion, and transparency
J. Vis. Commun. Image Represent.
(1993) - et al.
Multimodal deep autoencoder for human pose recovery
IEEE Trans. Image Process.
(2015) - et al.
RAISR: Rapid and accurate image super resolution
IEEE Trans. Comput. Imaging
(2017) - et al.
Compression artifacts reduction by a deep convolutional network
Proceedings of the IEEE International Conference on Computer Vision
(2015) - et al.
Imagenet classification with deep convolutional neural networks
Adv. Neural Inf. Process. Syst.
(2012) - et al.
Rich feature hierarchies for accurate object detection and semantic segmentation
Proceedings of the IEEE conference on computer vision and pattern recognition,
(2014) - et al.
Simultaneous detection and segmentation
European Conference on Computer Vision
(2014)
3d convolutional neural networks for human action recognition
IEEE Trans. Pattern Anal. Mach. Intell.
Learning a deep convolutional network for image super-resolution
European Conference on Computer Vision
Image super-resolution using deep convolutional networks
IEEE Trans. Pattern Anal. Mach. Intell.
Accelerating the super-resolution convolutional neural network
European Conference on Computer Vision
Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Lanczos filtering in one and two dimensions
J. Appl. Meteorol.
Extraction of high-resolution frames from video sequences
IEEE Trans. image Process.
On projection algorithms for solving convex feasibility problems
SIAM Rev.
Super-resolution through neighbor embedding
Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004.
Image super-resolution via sparse representation
IEEE Trans. Image Process.
Cited by (29)
SRDiff: Single image super-resolution with diffusion probabilistic models
2022, NeurocomputingA practical generative adversarial network architecture for restoring damaged character photographs
2021, NeurocomputingCitation Excerpt :One usually uses image processing software, such as Photoshop (PS) and Metuxiu, to repair old and damaged photographs, at the cost of a long time and a lot of human resources. In recent years, artificial intelligence (AI) and deep learning (DL) have been widely applied in the field of image processing [1], such as object detection [2], image segmentation [3], super-resolution [4], image denoising [5], and image deblur [6]. Compared to the traditional image processing method, the image processing method based on DL has achieved excellent performance [7].
Joint blur kernel estimation and CNN for blind image restoration
2020, NeurocomputingCitation Excerpt :Conventional image restoration techniques include the Wiener and Kalman Filtering technique, the regularization technique, the variational regularization technique, joint statistical regularization technique, and the fusion regularization technique under the assumption that the blur kernel is known [3–6]. Recently, it was reported that convolutional neural networks can be well applied to non-blind image restoration and super-resolution [7–16]. For example, Schuler et al. presented a two-step procedure based on a neural network [7].
Super-resolution using multi-channel merged convolutional network
2020, NeurocomputingCitation Excerpt :As shown in Fig. 4, the fusion module comprises a concatenation layer and three bottleneck layers. The pixel-level summation has many uncertain factors, so bad results could arise from feature assembly using the element-wise operation, which adds the output of both sub-nets for combination [32]. In contrast, the concatenation operation can integrate all the features without omission.
Sumei Li received her Ph. D. degree from the Nankai University, Tianjin, China, in 2004. She jointed Tianjin University, China, in 2006, where she is currently an associate professor in School of electrical automation and information engineering. Her research interests are the area of (3D) digital image processing, visual quality evaluation, pattern recognition and neural network.
Ru Fan received the B.S. degree in communication engineering from Hebei University, Baoding, China, in 2015, and she is currently working towards the master’s degree at the school of Electrical and Information Engineering, Tianjin University, Tianjin, China. His research interests include image processing, stereo matching and deep learning
Guoqing Lei graduated from Changchun University of Science and Technology in 2016, now studying at the University of Tianjin for a master’s degree. Her research interests are the area of image reconstruction and neural network.
Guanghui Yue received the B.S. degree in communication engineering from Tianjin University, Tianjin, China, in 2014, and he is currently working towards the Ph.D. degree at the school of Electrical and Information Engineering, Tianjin University, Tianjin, China. His research interests include bioelectrical signal processing, image quality assessment and 3-D image visual discomfort prediction.
Chunping Hou received the M.Eng. and Ph.D. degrees, both in electronic engineering, from Tianjin University, Tianjin, China, in 1986 and 1998, respectively. Since 1986, she has been the faculty of the School of Electronic and Information Engineering, Tianjin University, where she is currently a Full Professor and the Director of the Broadband Wireless Communications and 3D Imaging Institute. Her current research interests include 3D image processing, 3D display, wireless communication, and the design and applications of communication systems.