Abstract
A good semantic segmentation method for visual scene understanding should consider both accuracy and efficiency. However, the existing networks tend to concentrate only on segmentation results but not on simplifying the network. As a result, a heavy network will be made and it is difficult to deploy such heavy network on some hardware with limited memory. To address this problem, we in this paper develop a novel architecture by involving the recursive block to reduce parameters and improve prediction, as recursive block can improve performance without introducing new parameters for additional convolutions. In detail, for the purpose of mitigating the difficulty of training recursive block, we have adopted a residual unit to give the data more choices to flow through and utilize concatenation layer to combine the output maps of the recursive convolution layers with same resolution but different field-of-views. As a result, richer semantic information can be included in the feature maps, which is good to achieve satisfying pixel-wise prediction. Meriting from the above strategy, we also extend it to enhance Mask-RCNN for instance segmentation. Extensive simulations based on different benchmark datasets, such as DeepFashion, Cityscapes and PASCAL VOC 2012, show that our method can improve segmentation results as well as reduce the parameters.









Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Bai M, Urtasun R (2017) Deep watershed transform for instance segmentation. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 2858–2866
Cao X, Chen Y, Zhao Q, Meng D, Wang Y, Wang D, Xu Z (2015) Low-rank matrix factorization under general mixture noise distributions. In: Proceedings of the IEEE international conference on computer vision, pp 1493–1501
Chaurasia A, Culurciello E (2017) Linknet: exploiting encoder representations for efficient semantic segmentation. In: Visual communications and image processing (VCIP), 2017 IEEE, IEEE, pp 1–4
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Denton EL, Zaremba W, Bruna J, LeCun Y, Fergus R (2014) Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in neural information processing systems, pp 1269–1277
Erhan D, Szegedy C, Toshev A, Anguelov D (2014) Scalable object detection using deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2147–2154
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2012) The PASCAL visual object classes challenge 2012 (VOC2012) results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html
Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Han S, Mao H, Dally WJ (2015a) Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:151000149
Han S, Pool J, Tran J, Dally W (2015b) Learning both weights and connections for efficient neural network. In: Advances in neural information processing systems, pp 1135–1143
He K, Zhang X, Ren S, Sun J (2016a) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
He K, Zhang X, Ren S, Sun J (2016b) Identity mappings in deep residual networks. In: European conference on computer vision, Springer, pp 630–645
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In: 2017 IEEE international conference on computer vision (ICCV), IEEE, pp 2980–2988
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:170404861
Huang FJ, Boureau YL, LeCun Y, et al (2007) Unsupervised learning of invariant feature hierarchies with applications to object recognition. In: IEEE conference on computer vision and pattern recognition, 2007, CVPR’07, IEEE, pp 1–8
Kim J, Kwon Lee J, Mu Lee K (2016) Deeply-recursive convolutional network for image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1637–1645
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Kundu A, Vineet V, Koltun V (2016) Feature space optimization for semantic video segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3168–3175
Lafferty J, McCallum A, Pereira FC (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. Academic Press, New York
Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z, et al. (2017) Photo-realistic single image super-resolution using a generative adversarial network. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 105–114
Liu Z, Luo P, Qiu S, Wang X, Tang X (2016) Deepfashion: powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1096–1104
McIntosh L, Maheswaranathan N, Sussillo D, Shlens J (2017) Recurrent segmentation for variable computational budgets. arXiv preprint arXiv:171110151
Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY (2011) Multimodal deep learning. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 689–696
Paszke A, Chaurasia A, Kim S, Culurciello E (2016) Enet: a deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:160602147
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention, Springer, pp 234–241
Salvador A, Bellver M, Campos V, Baradad M, Marques F, Torres J, Giro-i Nieto X (2017) Recurrent neural networks for semantic instance segmentation. arXiv preprint arXiv:171200617
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556
Srinivas S, Babu RV (2015) Data-free parameter pruning for deep neural networks. arXiv preprint arXiv:150706149
Szegedy C, Toshev A, Erhan D (2013) Deep neural networks for object detection. In: Advances in neural information processing systems, pp 2553–2561
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Tai Y, Yang J, Liu X (2017) Image super-resolution via deep recursive residual network. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 2790–2798
Toderici G, O’Malley SM, Hwang SJ, Vincent D, Minnen D, Baluja S, Covell M, Sukthankar R (2015) Variable rate image compression with recurrent neural networks. arXiv preprint arXiv:151106085
Xingjian S, Chen Z, Wang H, Yeung DY, Wong WK, Woo Wc (2015) Convolutional lstm network: a machine learning approach for precipitation nowcasting. In: Advances in neural information processing systems, pp 802–810
Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6848–6856
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 2881–2890
Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr PH (2015) Conditional random fields as recurrent neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 1529–1537
Zhu S, Urtasun R, Fidler S, Lin D, Change Loy C (2017) Be your own prada: fashion synthesis with structural coherence. In: Proceedings of the IEEE international conference on computer vision, pp 1680–1688
Acknowledgements
The work is supported by the National Natural Science Foundation of China (Nos. 61971121, 61601112), the Fundamental Research Funds for the Central Universities and DHU Distinguished Young Professor Program.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, Y., Li, X., Lin, M. et al. Deep-recursive residual network for image semantic segmentation. Neural Comput & Applic 32, 12935–12947 (2020). https://doi.org/10.1007/s00521-020-04738-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-04738-5