Preserving details in semantics-aware context for scene parsing

Ma, Shuai; Pang, Yanwei; Pan, Jing; Shao, Ling

doi:10.1007/s11432-019-2738-y

Preserving details in semantics-aware context for scene parsing

Research Paper
Special Focus on Deep Learning for Computer Vision
Published: 15 January 2020

Volume 63, article number 120106, (2020)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Shuai Ma¹,
Yanwei Pang¹,
Jing Pan² &
…
Ling Shao^1,3

212 Accesses
16 Citations
Explore all metrics

Abstract

Great success of scene parsing (also known as, semantic segmentation) has been achieved with the pipeline of fully convolutional networks (FCNs). Nevertheless, there are a lot of segmentation failures caused by large similarities between local appearances. To alleviate the problem, most of existing methods attempt to improve the global view of FCNs by introducing different contextual modules. Though the reconstructed high resolution output of these methods is of rich semantics, it cannot faithfully recover the fine image details owing to lack of desired precise low-level information. To overcome the problem, we propose to improve the spatial decoding process through embedding possibly lost low-level information in a principled way. To this end, we make the following three contributions. First, we propose a semantics conformity module to make low-level features variations agnostic. Second, we introduce semantics into the conformed low level features through guidance from semantically aware features. Finally, we institute the availability of various possible contextual features at feature fusion to enrich context information. The proposed approach demonstrates competitive performance on challenging PASCAL VOC 2012, Cityscapes, and ADE20K benchmarks in comparison to the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cross-CBAM: a lightweight network for real-time scene segmentation

Article 24 February 2024

Semantic Flow for Fast and Accurate Scene Parsing

HSNet: hierarchical semantics network for scene parsing

Article 03 May 2022

References

Chen S T, Jian Z Q, Huang Y H, et al. Autonomous driving: cognitive construction and situation understanding. Sci China Inf Sci, 2019, 62: 081101
Article Google Scholar
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 3431–3440
Lin G, Shen C, van Den Hengel A, et al. Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 3194–3203
Eigen D, Fergus R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision, Santiago, 2015. 2650–2658
Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation. In: Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, 2015. 234–241
Shah S, Ghosh P, Davis L-S, et al. Stacked U-Nets: a no-frills approach to natural image segmentation. 2018. ArXiv: 1804.10343
Zhou Q, Wang Y, Liu J, et al. An open-source project for real-time image semantic segmentation. Sci China Inf Sci, 2019, 62: 227101
Article Google Scholar
Huang T T, Xu Y C, Bai S, et al. Feature context learning for human parsing. Sci China Inf Sci, 2019, 62: 220101
Article Google Scholar
Chen L-C, Papandreou G, Kokkinos I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs. 2014. ArXiv: 1412.7062
Chen L C, Papandreou G, Kokkinos I, et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 834–848
Article Google Scholar
Schwing A-G, Urtasun R. Fully connected deep structured networks. 2015. ArXiv: 1503.02351
Zheng S, Jayasumana S, Romera-Paredes B, et al. Conditional random fields as recurrent neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, Santiago, 2015. 1529–1537
Sun H Q, Pang Y W. GlanceNets—efficient convolutional neural networks with adaptive hard example mining. Sci China Inf Sci, 2018, 61: 109101
Article Google Scholar
Liu W, Rabinovich A, Berg A-C. Parsenet: looking wider to see better. 2015. ArXiv: 1506.04579
Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 2881–2890
Fu J, Liu J, Tian H, et al. Dual attention network for scene segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 3146–3154
Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. 2015. ArXiv: 1511.07122
Chen L-C, Papandreou G, Schroff F, et al. Rethinking atrous convolution for semantic image segmentation. 2017. ArXiv: 1706.05587
Yang M, Yu K, Zhang C, et al. Denseaspp for semantic segmentation in street scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 3684–3692
Chen Y, Rohrbach M, Yan Z, et al. Graph-based global reasoning networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 433–442
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 770–778
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. 2014. ArXiv: 1409.1556
Chen J, Lian Z H, Wang Y Z, et al. Irregular scene text detection via attention guided border labeling. Sci China Inf Sci, 2019, 62: 220103
Article Google Scholar
Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, Santiago, 2015. 1520–1528
Jgou S, Drozdzal M, Vazquez D, et al. The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 11–19
Ghiasi G, Fowlkes C-C. Laplacian pyramid reconstruction and refinement for semantic segmentation. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 519–534
Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 483–499
Liu N, Han J, Yang M-H. PiCANet: learning pixel-wise contextual attention for saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 3089–3098
Shrivastava A, Sukthankar R, Malik J, et al. Beyond skip connections: top-down modulation for object detection. 2016. ArXiv: 1612.06851
Lin T-Y, Dollr P, Girshick R, et al. Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 2117–2125
Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 7132–7141
Zhang H, Dana K, Shi J, et al. Context encoding for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 7151–7160
Zhu Z, Xu M, Bai S, et al. Asymmetric non-local neural networks for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, Seoul, 2019. 593–602
Zhou Q, Zheng B, Zhu W, et al. Multi-scale context for scene labeling via flexible segmentation graph. Pattern Recogn, 2016, 59: 312–324
Article Google Scholar
Zhou Q, Yang W, Gao G, et al. Multi-scale deep context convolutional neural networks for semantic segmentation. World Wide Web, 2019, 22: 555–570
Article Google Scholar
Dai J, Qi H, Xiong Y, et al. Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, Venice, 2017. 764–773
Huang G, Liu Z, van Der Maaten L, et al. Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 4700–4708
Everingham M, Eslami S M A, van Gool L, et al. The pascal visual object classes challenge: a retrospective. Int J Comput Vis, 2015, 111: 98–136
Article Google Scholar
Hariharan B, Arbelez P, Bourdev L, et al. Semantic contours from inverse detectors. In: Proceedings of the IEEE International Conference on Computer Vision, Barcelona, 2011. 991–998
Cordts M, Omran M, Ramos S, et al. The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016. 3213–3223
Zhou B, Zhao H, Puig X, et al. Scene parsing through ADE20K dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 633–641
Chen L-C, Zhu Y, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 801–818
Liu Z, Li X, Luo P, et al. Semantic image segmentation via deep parsing network. In: Proceedings of the IEEE International Conference on Computer Vision, Santiago, 2015. 1377–1385
Wu Z, Shen C, van den Hengel A. Wider or deeper: revisiting the resnet model for visual recognition. Pattern Recogn, 2019, 90: 119–133
Article Google Scholar
Lin G, Milan A, Shen C, et al. Refinenet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 1925–1934
Peng C, Zhang X, Yu G, et al. Large Kernel matters improve semantic segmentation by global convolutional network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 4353–4361
Wang P, Chen P, Yuan Y, et al. Understanding convolution for semantic segmentation. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Lake Tahoe, 2018. 1451–1460
Ke T-W, Hwang J-J, Liu Z, et al. Adaptive affinity fields for semantic segmentation. In: Proceedings of European Conference on Computer Vision, Munich, 2018. 587–602
Badrinarayanan V, Kendall A, Cipolla R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 2481–2495
Article Google Scholar
Zhou B, Zhao H, Puig X, et al. Semantic understanding of scenes through the ADE20K dataset. Int J Comput Vis, 2019, 127: 302–321
Article Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant No. 61632018) and Science and Technology Innovation 2030: the Key Project of Next Generation of Artificial Intelligence (Grant No. 2018AAA01028).

Author information

Authors and Affiliations

Tianjin Key Laboratory of Brain-Inspired Intelligence Technology, School of Electrical and Information Engineering, Tianjin University, Tianjin, 300072, China
Shuai Ma, Yanwei Pang & Ling Shao
School of Electronic Engineering, Tianjin University of Technology and Education, Tianjin, 300222, China
Jing Pan
Inception Institute of Artificial Intelligence, Abu Dhabi, 999041, UAE
Ling Shao

Authors

Shuai Ma
View author publications
You can also search for this author in PubMed Google Scholar
Yanwei Pang
View author publications
You can also search for this author in PubMed Google Scholar
Jing Pan
View author publications
You can also search for this author in PubMed Google Scholar
Ling Shao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanwei Pang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ma, S., Pang, Y., Pan, J. et al. Preserving details in semantics-aware context for scene parsing. Sci. China Inf. Sci. 63, 120106 (2020). https://doi.org/10.1007/s11432-019-2738-y

Download citation

Received: 13 November 2019
Revised: 06 December 2019
Accepted: 24 December 2019
Published: 15 January 2020
DOI: https://doi.org/10.1007/s11432-019-2738-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Preserving details in semantics-aware context for scene parsing

Abstract

Access this article

Similar content being viewed by others

Cross-CBAM: a lightweight network for real-time scene segmentation

Semantic Flow for Fast and Accurate Scene Parsing

HSNet: hierarchical semantics network for scene parsing

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Preserving details in semantics-aware context for scene parsing

Abstract

Access this article

Similar content being viewed by others

Cross-CBAM: a lightweight network for real-time scene segmentation

Semantic Flow for Fast and Accurate Scene Parsing

HSNet: hierarchical semantics network for scene parsing

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation