Single image resolution enhancement by efficient dilated densely connected residual network☆
Introduction
Creating a high-resolution (HR) image or video from its corresponding low-resolution (LR) input is referred to as super-resolution (SR) [1]. Recently, SR has been used to deal with image resolution enhancement in several applications such as face recognition [2], remote sensing imaging [3], and video surveillance [4]. In particular, single image super-resolution (SISR) approaches have achieved impressive results by learning and mapping from LR to HR images by using an up _sampling function in convolution neural networks (CNN).
Current SISR methods adopt one of the following approaches. The first approach upscales the LR image with one of the interpolation methods, for example, bicubic at the first step and then using a learning model to deblur it [5], [6], [7]. The second approach uses upscaling after the learning process, generally using a sub-pixel convolution layer [8] or rearranged convolution layer to improve the HR result [9], [10]. The first approach has a high computational cost as it operates on upscaled images; the second approach is challenging to achieve high-quality results for the large upscaling factors.
Between them, learning-based methods have attracted more attention from the community recently. Through learning a mapping function from corresponding pairs of LR–HR image patches, superior performance in image SR is offered. In addition, deep convolutional neural networks (DNNs) have been effectively used for image SR and yielded large improvements in accuracy [11], [12] and [13].
Currently, there are several studies using CNN and DNN for SR. However, the study of accelerating deep networks for SR is very limited. Shi et al. [14] encouraged the acceleration of the SR process by using a sub-pixel convolution layer to hold the information of HR images while reducing the size feature maps. Dong et al. [15] also proposed a model that uses the deconvolution layer instead of the bicubic interpolation. Further, they compact the original SR-CNN structure to speed up the process. But, still, much domain knowledge and many distinct experiments are required to design a faster architecture.
CNN models crush the image in order to process it. In this work, we proved that this is not required. We propose a progressive dilated model to accelerate dense network (Densenet) for SR without losing performance. In [16], they compared different networks with several depths for SR performance and indicated that wider and deeper networks have better performance because of high nonlinearities and wide receptive field. Additionally, it has been proved that the size of the receptive field has a more significant effect on SR performance compared to varying the depth of the network. The high receptive field provides more contextual information which affects the reconstruction result. Hence, we propose a progressive dilated residual Densenet for the SR (DRDN). In this model, the resolution of the network’s output increases by changing a subset of internal down-sampling layers with dilation [17], [18] Compared to standard convolution, dilated convolution provides. the exponential increase of the receptive field with the same filter size.
We also used a different mechanism for improving dense network performance [19], [20]. Generally, most models increase the network depth to extract the deep features. However, we motivated to build an efficient nonlinear activations model. Currently, the architectures mostly use the per-pixel activation units, for instance, sigmoids [21], rectified linear units (ReLUs) [22], etc. Therefore, we propose to change those individual units with optimized Unit (OUnits) that have the learnable ability. The OUnit calculates a weight map sequence values that used as a gateway to its input. Hence, it uses much fewer layers to adjust the CNN performance with ReLU units. As the overall results proved, the proposed model improved the tradeoff between performance and efficiency.
The proposed OUnit has a bunch of parameters that have learning ability. The total number of parameters in OUnits is around at the expense of some of the convolutional layers in the network. At present, most CNN models never invest any parameter in the activations. In this paper, illustrated that fewer amount of parameters should be invested for spatial activation to have optimal performance.
As mentioned in [22], unlimited increases in dilation factors may fail to collect the local features due to the sparseness of the kernel, and this is harmful to tiny objects. Accordingly, we suggested to gently growing the dilation rate along the convolutions within all blocks to reduce sparsity in the dilated kernels, hence enabling the capture of more contexts while keeping the resolution of the analyzed area. This approach allows us to span broader sections of the input images without including large dilation rates that can reduce resolution performance.
As a result, shallower networks with fewer parameters are able to achieve a receptive field the same size as very deep networks. By using dilated convolution in Densenet and progressive increase of image size, we show the proposed method can perform better than state-of-the-art approaches, with less than half the parameters and computation costs.
The main contributions of this paper are as follows:
- •
We introduce a progressive dilated convolution network for SR.
- •
We prove the receptive field is one of the major factors in the SR task. We showed the networks with the same receptive field size but with different depths produce similar results.
- •
By using progressive dilated convolution, the proposed model yields better performance with less computation cost and faster speed.
The rest of the paper is organized as follows. In Section 2, we present some SR-related works. Section 3 describes the details of progressive dilated convolution network. Section 4 shows the experimental results and the performance of the proposed model. Finally, Section 5 concludes the paper.
Section snippets
Related works
In this section, we present a brief description of the existing models for SR and background concepts, which are helpful for understanding the proposed model. CNN is successfully applied in a wide range of computer vision areas, such as classification, recognition, detection, and SR. Therefore, we placed special emphases on the most recent prominent works based on deep learning in image super-resolution.
Dong et al. [15], [23] proposed the deep convolutional neural network for the image SR by
Proposed methods
Currently, CNNs are one of the best types of neural networks that learn a hierarchy of complex features by sequential convolution, (max or average) pooling and non-linear activation function [42]. The first CNN was designed for image recognition and classification. However, currently, CNNs are used in image SR, semantic segmentation and different computer vision tasks. One method follows a sliding-window approach where regions defined by the window are processed individually. This system has
Experiments
In this section, the details of datasets, implementations and experimental results are presented.
Conclusions
We have introduced the dilated residual dense neural network to accelerate the speed of deep networks for image super-resolution. Firstly, we proved the receptive field is a key factor in image SR. The networks with a similar receptive field but with different depths produce similar HR results. Moreover, we propose the dilated convolution network instead of the standard convolution operation. Dilated convolution has a better performance for collecting a large receptive field. We also presented
Acknowledgment
This work was supported in part by the National Science Foundation of China under Grants 61572315 and 6151101179, in part by the 973 Plan China , under Grant 2015CB856004.
References (52)
- et al.
Diverse adversarial network for image super-resolution
Signal Process., Image Commun.
(2019) - et al.
Image super resolution by dilated dense progressive network
Image Vis. Comput.
(2019) - et al.
SGCRSR: Sequential gradient constrained regression for single image super-resolution
Signal Process., Image Commun.
(2018) - et al.
Sparse representation and adaptive mixed samples regression for single image super-resolution
Signal Process., Image Commun.
(2018) - et al.
Detail enhancement of image super-resolution based on detail synthesis
Signal Process., Image Commun.
(2017) - et al.
Single image superresolution via locally regularized anchored neighborhood regression and nonlocal means
IEEE Trans. Multimedia
(2017) - et al.
Image super-resolution using deep convolutional networks
IEEE Trans. Pattern Anal. Mach. Intell.
(2016) - et al.
A progressively enhanced network for video satellite imagery superresolution
IEEE Signal Process. Lett.
(2018) - W. Shi, J. Caballero, F. Huszar, J. Totz, A.P. Aitken, R. Bishop, D. Rueckert, Z. Wang, Real-time single image and...
- R. Timofte, E. Agustsson, L. Van Gool, M.-H. Yang, L. Zhang, B. Lim, S. Son, H. Kim, S. Nah, K.M. Lee, Ntire 2017...
Dense xUnit networks
CoRR
Neural Networks: Tricks of the Trade
Cited by (0)
- ☆
No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.image.2019.08.008.