Single image resolution enhancement by efficient dilated densely connected residual network

https://doi.org/10.1016/j.image.2019.08.008Get rights and content

Highlights

  • We introduce a progressive dilated convolution network for image super-resolution.

  • Receptive field performance in super-resolution tasks.

  • Efficient feature extraction and feature reuse.

Abstract

Convolution Neural Networks have been widely applied in single image super-resolution (SR). Recent works have shown the superior performance of deep networks for SR tasks. With just an increase in the model’s depth, more features and parameters (which lead to high computational cost) can be practically extracted. In this paper, we leverage the ground-truth high-resolution (HR) image as a useful guide for learning and present an effective model based on progressive dilated densely connected and a novel activation function, which is appropriate for image SR problems. Different to the common per-pixel activation functions, like Sigmoids and ReLUs, the proposed activation unit has a nonlinear learnable function with some short connections. These strategies help the network to obtain deep and complex features, consequently, the network demanding a much smaller number of layers to have similar performance for image SR, which supports the exponential growth of the receptive field, parallel by increasing the filter size. The dense connectivity facilitates feature extraction in the network and residual connections facilitate feature re-use that both are required to improve the performance of the network. Based on the experimental results, the proposed model accelerates 2 times faster than the current deep network approaches; the proposed network also achieves higher SR performance as compared to state-of-the-art results.

Introduction

Creating a high-resolution (HR) image or video from its corresponding low-resolution (LR) input is referred to as super-resolution (SR) [1]. Recently, SR has been used to deal with image resolution enhancement in several applications such as face recognition [2], remote sensing imaging [3], and video surveillance [4]. In particular, single image super-resolution (SISR) approaches have achieved impressive results by learning and mapping from LR to HR images by using an up _sampling function in convolution neural networks (CNN).

Current SISR methods adopt one of the following approaches. The first approach upscales the LR image with one of the interpolation methods, for example, bicubic at the first step and then using a learning model to deblur it [5], [6], [7]. The second approach uses upscaling after the learning process, generally using a sub-pixel convolution layer [8] or rearranged convolution layer to improve the HR result [9], [10]. The first approach has a high computational cost as it operates on upscaled images; the second approach is challenging to achieve high-quality results for the large upscaling factors.

Between them, learning-based methods have attracted more attention from the community recently. Through learning a mapping function from corresponding pairs of LR–HR image patches, superior performance in image SR is offered. In addition, deep convolutional neural networks (DNNs) have been effectively used for image SR and yielded large improvements in accuracy [11], [12] and [13].

Currently, there are several studies using CNN and DNN for SR. However, the study of accelerating deep networks for SR is very limited. Shi et al. [14] encouraged the acceleration of the SR process by using a sub-pixel convolution layer to hold the information of HR images while reducing the size feature maps. Dong et al. [15] also proposed a model that uses the deconvolution layer instead of the bicubic interpolation. Further, they compact the original SR-CNN structure to speed up the process. But, still, much domain knowledge and many distinct experiments are required to design a faster architecture.

CNN models crush the image in order to process it. In this work, we proved that this is not required. We propose a progressive dilated model to accelerate dense network (Densenet) for SR without losing performance. In [16], they compared different networks with several depths for SR performance and indicated that wider and deeper networks have better performance because of high nonlinearities and wide receptive field. Additionally, it has been proved that the size of the receptive field has a more significant effect on SR performance compared to varying the depth of the network. The high receptive field provides more contextual information which affects the reconstruction result. Hence, we propose a progressive dilated residual Densenet for the SR (DRDN). In this model, the resolution of the network’s output increases by changing a subset of internal down-sampling layers with dilation [17], [18] Compared to standard convolution, dilated convolution provides. the exponential increase of the receptive field with the same filter size.

We also used a different mechanism for improving dense network performance [19], [20]. Generally, most models increase the network depth to extract the deep features. However, we motivated to build an efficient nonlinear activations model. Currently, the architectures mostly use the per-pixel activation units, for instance, sigmoids [21], rectified linear units (ReLUs) [22], etc. Therefore, we propose to change those individual units with optimized Unit (OUnits) that have the learnable ability. The OUnit calculates a weight map sequence values that used as a gateway to its input. Hence, it uses much fewer layers to adjust the CNN performance with ReLU units. As the overall results proved, the proposed model improved the tradeoff between performance and efficiency.

The proposed OUnit has a bunch of parameters that have learning ability. The total number of parameters in OUnits is around at the expense of some of the convolutional layers in the network. At present, most CNN models never invest any parameter in the activations. In this paper, illustrated that fewer amount of parameters should be invested for spatial activation to have optimal performance.

As mentioned in [22], unlimited increases in dilation factors may fail to collect the local features due to the sparseness of the kernel, and this is harmful to tiny objects. Accordingly, we suggested to gently growing the dilation rate along the convolutions within all blocks to reduce sparsity in the dilated kernels, hence enabling the capture of more contexts while keeping the resolution of the analyzed area. This approach allows us to span broader sections of the input images without including large dilation rates that can reduce resolution performance.

As a result, shallower networks with fewer parameters are able to achieve a receptive field the same size as very deep networks. By using dilated convolution in Densenet and progressive increase of image size, we show the proposed method can perform better than state-of-the-art approaches, with less than half the parameters and computation costs.

The main contributions of this paper are as follows:

  • We introduce a progressive dilated convolution network for SR.

  • We prove the receptive field is one of the major factors in the SR task. We showed the networks with the same receptive field size but with different depths produce similar results.

  • By using progressive dilated convolution, the proposed model yields better performance with less computation cost and faster speed.

The rest of the paper is organized as follows. In Section 2, we present some SR-related works. Section 3 describes the details of progressive dilated convolution network. Section 4 shows the experimental results and the performance of the proposed model. Finally, Section 5 concludes the paper.

Section snippets

Related works

In this section, we present a brief description of the existing models for SR and background concepts, which are helpful for understanding the proposed model. CNN is successfully applied in a wide range of computer vision areas, such as classification, recognition, detection, and SR. Therefore, we placed special emphases on the most recent prominent works based on deep learning in image super-resolution.

Dong et al. [15], [23] proposed the deep convolutional neural network for the image SR by

Proposed methods

Currently, CNNs are one of the best types of neural networks that learn a hierarchy of complex features by sequential convolution, (max or average) pooling and non-linear activation function [42]. The first CNN was designed for image recognition and classification. However, currently, CNNs are used in image SR, semantic segmentation and different computer vision tasks. One method follows a sliding-window approach where regions defined by the window are processed individually. This system has

Experiments

In this section, the details of datasets, implementations and experimental results are presented.

Conclusions

We have introduced the dilated residual dense neural network to accelerate the speed of deep networks for image super-resolution. Firstly, we proved the receptive field is a key factor in image SR. The networks with a similar receptive field but with different depths produce similar HR results. Moreover, we propose the dilated convolution network instead of the standard convolution operation. Dilated convolution has a better performance for collecting a large receptive field. We also presented

Acknowledgment

This work was supported in part by the National Science Foundation of China under Grants 61572315 and 6151101179, in part by the 973 Plan China , under Grant 2015CB856004.

References (52)

  • R. Timofte, R. Rothe, L. Van Gool, Seven ways to improve example-based single image super resolution, in: CVPR,...
  • H. Zhang, V.M. Patel, Density-aware single image deraining using a multi-stream dense network, in: CVPR,...
  • C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunning-ham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, W. Shi,...
  • N. Akhtar, F. Shafait, A. Mian, Bayesian sparse representation for hyperspectral image super resolution, in: CVPR 2015,...
  • C. Dong, C.C. Loy, X. Tang, Accelerating the superresolution convolutional neural network, in: ECCV,...
  • Y. Zhang, Y. Tian, Y. Kong, B. Zhong, Y. Fu, Residual Dense Network for Image Super-Resolution, in: CVPR...
  • Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, Y. Fu, Image super-resolution using very deep residual channel attention...
  • T. Tong, G. Li, X. Liu, Q. Gao, Image super-resolution using dense skip connections, in: ICCV,...
  • W. Shi, J. Caballero, C. Ledig, X. Zhuang, W. Bai, K. Bhatia, A.M.S.M. de Marvao, T. Dawes, D. ORegan, D. Rueckert,...
  • C. Dong, C.C. Loy, K. He, X. Tang, Learning a deep convolutional network for image super-resolution, in: ECCV,...
  • B. Lim, S. Son, H. Kim, S. Nah, K.M. Lee, Enhanced deep residual networks for single image super-resolution, in: CVPRW,...
  • Z. Huang, L. Wang, G. Meng, C. Pan, Image super-resolution via deep dilated convolutional networks, in: ICIP...
  • Fisher Yu, Vladlen Koltun, Multi-Scale Context Aggregation by Dilated Convolutions, Vol. abs/1511.07122, ICLR...
  • I. Kligvasser, T.R. Shaham, T. Michaeli, xUnit: Learning a Spatial Activation Function for Efficient Image Restoration,...
  • KligvasserI. et al.

    Dense xUnit networks

    CoRR

    (2018)
  • OrrG.B. et al.

    Neural Networks: Tricks of the Trade

    (2003)
  • Cited by (0)

    No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.image.2019.08.008.

    View full text