Abstract
Great success of scene parsing (also known as, semantic segmentation) has been achieved with the pipeline of fully convolutional networks (FCNs). Nevertheless, there are a lot of segmentation failures caused by large similarities between local appearances. To alleviate the problem, most of existing methods attempt to improve the global view of FCNs by introducing different contextual modules. Though the reconstructed high resolution output of these methods is of rich semantics, it cannot faithfully recover the fine image details owing to lack of desired precise low-level information. To overcome the problem, we propose to improve the spatial decoding process through embedding possibly lost low-level information in a principled way. To this end, we make the following three contributions. First, we propose a semantics conformity module to make low-level features variations agnostic. Second, we introduce semantics into the conformed low level features through guidance from semantically aware features. Finally, we institute the availability of various possible contextual features at feature fusion to enrich context information. The proposed approach demonstrates competitive performance on challenging PASCAL VOC 2012, Cityscapes, and ADE20K benchmarks in comparison to the state-of-the-art methods.
Similar content being viewed by others
References
Chen S T, Jian Z Q, Huang Y H, et al. Autonomous driving: cognitive construction and situation understanding. Sci China Inf Sci, 2019, 62: 081101
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 3431–3440
Lin G, Shen C, van Den Hengel A, et al. Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 3194–3203
Eigen D, Fergus R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision, Santiago, 2015. 2650–2658
Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation. In: Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, 2015. 234–241
Shah S, Ghosh P, Davis L-S, et al. Stacked U-Nets: a no-frills approach to natural image segmentation. 2018. ArXiv: 1804.10343
Zhou Q, Wang Y, Liu J, et al. An open-source project for real-time image semantic segmentation. Sci China Inf Sci, 2019, 62: 227101
Huang T T, Xu Y C, Bai S, et al. Feature context learning for human parsing. Sci China Inf Sci, 2019, 62: 220101
Chen L-C, Papandreou G, Kokkinos I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs. 2014. ArXiv: 1412.7062
Chen L C, Papandreou G, Kokkinos I, et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 834–848
Schwing A-G, Urtasun R. Fully connected deep structured networks. 2015. ArXiv: 1503.02351
Zheng S, Jayasumana S, Romera-Paredes B, et al. Conditional random fields as recurrent neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, Santiago, 2015. 1529–1537
Sun H Q, Pang Y W. GlanceNets—efficient convolutional neural networks with adaptive hard example mining. Sci China Inf Sci, 2018, 61: 109101
Liu W, Rabinovich A, Berg A-C. Parsenet: looking wider to see better. 2015. ArXiv: 1506.04579
Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 2881–2890
Fu J, Liu J, Tian H, et al. Dual attention network for scene segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 3146–3154
Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. 2015. ArXiv: 1511.07122
Chen L-C, Papandreou G, Schroff F, et al. Rethinking atrous convolution for semantic image segmentation. 2017. ArXiv: 1706.05587
Yang M, Yu K, Zhang C, et al. Denseaspp for semantic segmentation in street scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 3684–3692
Chen Y, Rohrbach M, Yan Z, et al. Graph-based global reasoning networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 433–442
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 770–778
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. 2014. ArXiv: 1409.1556
Chen J, Lian Z H, Wang Y Z, et al. Irregular scene text detection via attention guided border labeling. Sci China Inf Sci, 2019, 62: 220103
Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, Santiago, 2015. 1520–1528
Jgou S, Drozdzal M, Vazquez D, et al. The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 11–19
Ghiasi G, Fowlkes C-C. Laplacian pyramid reconstruction and refinement for semantic segmentation. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 519–534
Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 483–499
Liu N, Han J, Yang M-H. PiCANet: learning pixel-wise contextual attention for saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 3089–3098
Shrivastava A, Sukthankar R, Malik J, et al. Beyond skip connections: top-down modulation for object detection. 2016. ArXiv: 1612.06851
Lin T-Y, Dollr P, Girshick R, et al. Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 2117–2125
Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 7132–7141
Zhang H, Dana K, Shi J, et al. Context encoding for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 7151–7160
Zhu Z, Xu M, Bai S, et al. Asymmetric non-local neural networks for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, Seoul, 2019. 593–602
Zhou Q, Zheng B, Zhu W, et al. Multi-scale context for scene labeling via flexible segmentation graph. Pattern Recogn, 2016, 59: 312–324
Zhou Q, Yang W, Gao G, et al. Multi-scale deep context convolutional neural networks for semantic segmentation. World Wide Web, 2019, 22: 555–570
Dai J, Qi H, Xiong Y, et al. Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, Venice, 2017. 764–773
Huang G, Liu Z, van Der Maaten L, et al. Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 4700–4708
Everingham M, Eslami S M A, van Gool L, et al. The pascal visual object classes challenge: a retrospective. Int J Comput Vis, 2015, 111: 98–136
Hariharan B, Arbelez P, Bourdev L, et al. Semantic contours from inverse detectors. In: Proceedings of the IEEE International Conference on Computer Vision, Barcelona, 2011. 991–998
Cordts M, Omran M, Ramos S, et al. The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 3213–3223
Zhou B, Zhao H, Puig X, et al. Scene parsing through ADE20K dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 633–641
Chen L-C, Zhu Y, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 801–818
Liu Z, Li X, Luo P, et al. Semantic image segmentation via deep parsing network. In: Proceedings of the IEEE International Conference on Computer Vision, Santiago, 2015. 1377–1385
Wu Z, Shen C, van den Hengel A. Wider or deeper: revisiting the resnet model for visual recognition. Pattern Recogn, 2019, 90: 119–133
Lin G, Milan A, Shen C, et al. Refinenet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 1925–1934
Peng C, Zhang X, Yu G, et al. Large Kernel matters improve semantic segmentation by global convolutional network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 4353–4361
Wang P, Chen P, Yuan Y, et al. Understanding convolution for semantic segmentation. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Lake Tahoe, 2018. 1451–1460
Ke T-W, Hwang J-J, Liu Z, et al. Adaptive affinity fields for semantic segmentation. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 587–602
Badrinarayanan V, Kendall A, Cipolla R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 2481–2495
Zhou B, Zhao H, Puig X, et al. Semantic understanding of scenes through the ADE20K dataset. Int J Comput Vis, 2019, 127: 302–321
Acknowledgements
This work was supported by National Natural Science Foundation of China (Grant No. 61632018) and Science and Technology Innovation 2030: the Key Project of Next Generation of Artificial Intelligence (Grant No. 2018AAA01028).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ma, S., Pang, Y., Pan, J. et al. Preserving details in semantics-aware context for scene parsing. Sci. China Inf. Sci. 63, 120106 (2020). https://doi.org/10.1007/s11432-019-2738-y
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-019-2738-y