Abstract
In recent years, CNN has been used for single image super-resolution (SR) with its success of in the field of computer vision. However, in the recovery process, there are always some high-frequency components that cant be recovered from low-resolution images to high-resolution ones by using existing CNN-based methods. In this paper, we propose an image super-resolution method based on CNN, which uses a two-level residual learning network to learn residual components, i.e., high-frequency components. We use the Super-Resolution Convolutional Neural Network (SRCNN) as the network structure in each level so that our proposed method can achieve the high-resolution images with high-frequency components that cant be obtained by the existing methods. In addition, we analyze the proposed method with considering three kinds of residual learning networks, which are different in the structure and superimposed layers of the residual learning network. In the experiments, we investigate the performance of the proposed method with various residual learning networks and the effect of image super-resolution to image captioning task.
Similar content being viewed by others
References
Bevilacqua M, Roumy A, Guillemot C, Alberi-Morel ML (2012) Low-complexity single-image super-resolution based on nonnegative neighbor embedding
Chang X, Yu Y-L, Yang Y, Xing EP (2017) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Machine Intell 39(8):1617–1632
Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chua T-S (2017) Sca-cnn: spatial and channel-wise attention in convolutional networks for image captioning[C]. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 6298–6306.
Dong C, Loy CC, He K, Tang X (2014) Learning a deep convolutional network for image super-resolution. In: European conference on computer vision. Springer, pp 184–199
Fang H, Gupta S, Iandola F, Srivastava RK, Deng L, Dollár P, Gao J, He X, Mitchell M, Platt JC et al (2015) From captions to visual concepts and back. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1473–1482
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Irani M, Peleg S (1991) Improving resolution by image registration. CVGIP: Graph Models Image Process 53(3):231–239
Jia X, Gavves E, Fernando B, Tuytelaars T (2016) Guiding long-short term memory for image caption generation[C]. In: IEEE international conference on computer vision. IEEE, pp. 2407–2415
Kim J, Kwon Lee J, Lee K (2016) Accurate image super-resolution using very deep convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1646–1654
Kulkarni G, Premraj V, Ordonez V, Dhar S, Li S, Choi Y, Berg AC, Berg TL (2013) Babytalk: Understanding and generating simple image descriptions. IEEE Trans Pattern Anal Machine Intell 35(12): 2891–2903
Lu J, Xiong C, Parikh D, Socher R (2017) Knowing when to look: Adaptive attention via a visual sentinel for image captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol 6, p 2
Ma Z, Chang X, Yang Y, Sebe N, Hauptmann AG (2017) The many shades of negativity. IEEE Trans Multimedia 19(7):1558–1568
Mao X, Shen C, Yang Y-B (2016) Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. In: Advances in neural information processing systems, pp 2802–2810
Rousseau F (2010) A non-local approach for image super-resolution using intermodality priors. Med Image Anal 14(4):594–605
Shi W, Caballero J, Ledig C, Zhuang X, Bai W, Bhatia K, de Marvao AMSM, Dawes T, ORegan D, Rueckert D (2013) Cardiac image super-resolution with global correspondence using multi-atlas patchmatch. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 9–16
Sun J, Xu Z, Shum H-Y (2008) Image super-resolution using gradient profile prior. In: IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008. IEEE, pp 1–8
Tai Y-W, Liu S, Brown MS, Lin S (2010) Super resolution using edge prior and single image detail synthesis. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 2400–2407
Thornton MW, Atkinson PM, Holland D (2006) Sub-pixel mapping of rural land cover objects from fine spatial resolution satellite sensor imagery using super-resolution pixel-swapping. Int J Remote Sens 27(3):473–491
Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164
Wang S, Li X, Yao L, Sheng QZ, Long G et al (2017) Learning multiple diagnosis codes for icu patients with local disease correlation mining. ACM Trans Knowl Discovery Data (TKDD) 11(3):31
Wang Z, Liu D, Yang J, Han W, Huang T (2015) Deep networks for image super-resolution with sparse prior. In: Proceedings of the IEEE international conference on computer vision, pp 370–378
Wu Q, Shen C, Liu L, Dick A, van den Hengel A (2016) What value do explicit high level concepts have in vision to language problems?. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 203–212
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057
Yang J, Lin Z, Cohen S (2013) Fast image super-resolution based on in-place example regression. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1059–1066
Yao T, Pan Y, Li Y, Qiu Z, Mei T (2017) Boosting image captioning with attributes. In: IEEE International conference on computer vision, ICCV, pp 22–29
Zhou F, Yang W, Liao Q (2012) Single image super-resolution using incoherent sub-dictionaries learning. IEEE Trans Consumer Electron, 58(3)
Zhou L, Xu C, Koch P, Corso JJ (2017) Watch what you just said: Image captioning with text-conditional attention[C]. In: Proceedings of the on thematic workshops of ACM multimedia 2017. ACM, pp. 305–313
Zou WW, Yuen PC (2012) Very low resolution face recognition problem. IEEE Trans Image Process 21(1):327–340
Acknowledgments
This work is supported by Natural Science Foundation for Distinguished Young Scholars of Shandong Province (JQ201718), Key Research and Development Foundation of Shandong Province (2016GGX101009), the Natural Science Foundation of China (U1736122) and Shandong Provincial. Key Research and Development Plan (2017CXGC1504). And we gratefully acknowledge the support of NVIDIA Corporation with the donation of the TITAN X GPU used for this research. The contact author is Jiande Sun (jiandesun@hotmail.com).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gao, M., Han, XH., Li, J. et al. Image super-resolution based on two-level residual learning CNN. Multimed Tools Appl 79, 4831–4846 (2020). https://doi.org/10.1007/s11042-018-6751-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6751-5