Objectness Region Enhancement Networks for Scene Parsing

Ou, Xin-Yu; Li, Ping; Ling, He-Fei; Liu, Si; Wang, Tian-Jiang; Li, Dan

doi:10.1007/s11390-017-1751-x

Objectness Region Enhancement Networks for Scene Parsing

Regular Paper
Published: 14 July 2017

Volume 32, pages 683–700, (2017)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Xin-Yu Ou^1,2,3,
Ping Li¹,
He-Fei Ling¹,
Si Liu²,
Tian-Jiang Wang¹ &
…
Dan Li¹

321 Accesses
1 Citation
6 Altmetric
Explore all metrics

Abstract

Semantic segmentation has recently witnessed rapid progress, but existing methods only focus on identifying objects or instances. In this work, we aim to address the task of semantic understanding of scenes with deep learning. Different from many existing methods, our method focuses on putting forward some techniques to improve the existing algorithms, rather than to propose a whole new framework. Objectness enhancement is the first effective technique. It exploits the detection module to produce object region proposals with category probability, and these regions are used to weight the parsing feature map directly. “Extra background” category, as a specific category, is often attached to the category space for improving parsing result in semantic and instance segmentation tasks. In scene parsing tasks, extra background category is still beneficial to improve the model in training. However, some pixels may be assigned into this nonexistent category in inference. Black-hole filling technique is proposed to avoid the incorrect classification. For verifying these two techniques, we integrate them into a parsing framework for generating parsing result. We call this unified framework as Objectness Enhancement Network (OENet). Compared with previous work, our proposed OENet system effectively improves the performance over the original model on SceneParse150 scene parsing dataset, reaching 38.4 mIoU (mean intersectionover-union) and 77.9% accuracy in the validation set without assembling multiple models. Its effectiveness is also verified on the Cityscapes dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved Efficiency of Semantic Segmentation using Pyramid Scene Parsing Deep Learning Network Method

Adaptive Generation of Weakly Supervised Semantic Segmentation for Object Detection

Article 16 June 2022

Shibao Li, Yixuan Liu, … Jianhang Liu

Light U-Net: Network Architecture for Outdoor Scene Semantic Segmentation

References

Chen L C, Papandreou G, Kokkinos I, Murphy K, Yuille A L. DeepLab: Semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Analysis and Machine Intelligence, 2017, doi: 10.1109/TPAMI.2017.2699184.
Zhou B L, Zhao H, Puig X, Fidler S, Barriuso A, Torralba A. Semantic understanding of scenes through the ADE20K dataset. arXiv: 1608.05442, 2016. https://arxiv. org/abs/1608.05442, June 2017.
Fu Z J, Huang F X, Sun X M, Vasilakos A, Yang C N. Enabling semantic search based on conceptual graphs over encrypted outsourced data. IEEE Trans. Services Computing, 2016, doi: 10.1109/TSC.2016.2622697.
Google Scholar
Pan Z Q, Lei J J, Zhang Y et al. Fast motion estimation based on content property for low-complexity H.265/HEVC encoder. IEEE Trans. Broadcasting, 2016, 62(3): 675-684.
Article Google Scholar
Fu Z J, Ren K, Shu J G, Sun X M, Huang F X. Enabling personalized search over encrypted outsourced data with efficiency improvement. IEEE Tran. Parallel and Distributed Systems, 2016, 27(9): 2546-2559.
Article Google Scholar
Wen X Z, Shao L, Xue Y, Fang W. A rapid learning algorithm for vehicle classification. Information Sciences, 2015, 295: 395-406.
Article Google Scholar
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2015, pp.3431-3440.
Lin G S, Shen C H, van den Hengel A, Reid I. Exploring context with deep structured models for semantic segmentation. arXiv: 1603.03183, 2017. https://arxiv.org/abs/16-03.03183, June 2017.
Lin T Y, Maire M, Belongie S et al. Microsoft COCO: Common objects in context. In Proc. European Conf. Computer Vision, October 2014, pp.740-755.
Dai J F, He K M, Sun J. Instance-aware semantic segmentation via multi-task network cascades. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2016, pp.3150-3158.
Pinheiro P H O, Collobert R, Doll´ar P. Learning to segment object candidates. In Proc. the 28th Int. Conf. Neural Information Processing Systems, December 2015, pp.1990-1998.
Girshick R. Fast R-CNN. In Proc. IEEE Int. Conf. Computer Vision, December 2015, pp.1440-1448.
Ren S Q, He K M, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
Dai J F, Li Y, He K M, Sun J. R-FCN: Object detection via region-based fully convolutional networks. In Proc. the 30th Conf. Neural Information Processing Systems, December 2016, pp.379-387.
Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In Proc. the 25th Int. Conf. Neural Information Processing Systems, December 2012, pp.1097-1105.
Badrinarayanan V, Kendall A, Cipolla R. SegNet: A deep convolutional encoder-decoder architecture for scene segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence, 2017, doi: 10.1109/TPAMI.2016.2644615.
Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation. In Proc. IEEE Int. Conf. Computer Vision, December 2015, pp.1520-1528.
Liu S, Wang C H, Qian R H et al. Surveillance video parsing with single frame supervision. arXiv: 1611.09587, 2016.https://arxiv.org/abs/1611.09587, June 2017.
Liu S, Liang X D, Liu L Q, Shen X H, Yang J C, Xu C S, Lin L, Cao X, Yan S C. Matching-CNN meets KNN: Quasiparametric human parsing. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2015, pp.1419-1427.
Liu S, Liang X D, Liu L Q et al. Fashion parsing with video context. IEEE Trans. Multimedia, 2015, 17(8): 1347-1358.
Article Google Scholar
Liang X D, Liu S, Shen X H, Yang J C, Liu L Q, Dong J, Lin L, Yan S C. Deep human parsing with active template regression. IEEE Trans. Pattern Analysis and Machine Intelligence, 2015, 37(12): 2402-2414.
Liu S, Ou X Y, Qian R H et al. Makeup like a superstar: Deep localized makeup transfer network. In Proc. the 25th Int. Joint Conf. Artificial Intelligence, July 2016, pp.2568-2575.
Liu S, Feng J S, Song Z, Zhang T Z, Lu H Q, Xu C S, Yan S C. Hi, magic closet, tell me what to wear! In Proc. the 20th ACM Int. Conf. Multimedia, October 2012, pp.619-628.
Zhou B L, Khosla A, Lapedriza `A, Torralba A, Oliva A. Places: An image database for deep scene understanding. arXiv: 1610.02055, 2016. https://arxiv.org/abs/16-10.02055, June 2017.
He K M, Zhang X Y, Ren S Q, Sun J. Identity mappings in deep residual networks. In Proc. European Conf. Computer Vision, October 2016, pp.630645.
Dai J F, He K M, Sun J. Convolutional feature masking for joint object and stuff segmentation. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2015, pp.3992-4000.
Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2014, pp.580-587.
Hariharan B, Arbel´aez P, Girshick R, Malik J. Simultaneous detection and segmentation. In Proc. European Conf. Computer Vision, October 2014, pp.297-312.
Sharma A, Tuzel O, Liu M Y. Recursive context propagation network for semantic scene labeling. In Proc. Annual Conf. Neural Information Processing Systems, December 2014, pp.2447-2455.
Sharma A, Tuzel O, Jacobs D W. Deep hierarchical parsing for semantic segmentation. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2015, pp.530-538.
He K M, Zhang X Y, Ren S Q, Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916.
Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z Z, Du D L, Huang C, Torr P H S. Conditional random fields as recurrent neural networks. In Proc. IEEE Int. Conf. Computer Vision, December 2015, pp.1529-1537.
Arnab A, Jayasumana S, Zheng S, Torr P H S. Higher order conditional random fields in deep neural networks. In Proc. European Conf. Computer Vision, October 2016, pp.524-540.
Ciresan D, Meier U, Schmidhuber J. Multi-column deep neural networks for image classification. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2012, pp.3642-3649.
Dai J F, He K M, Sun J. BoxSup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In Proc. IEEE Int. Conf. Computer Vision, December 2015, pp.1635-1643.
Krähenbühl P, Koltun V. Efficient inference in fully connected CRFs with Gaussian edge potentials. In Proc. the 24th Int. Conf. Neural Information Processing Systems, December 2011, pp.109-117.
Jia Y Q, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R B, Guadarrama S, Darrell T. Caffe: Convolutional architecture for fast feature embedding. In Proc. the 22nd ACM Int. Conf. Multimedia, November 2014, pp.675-678.
Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. arXiv: 1511.07122, 2016. https://arxiv. org/abs/1511.07122, June 2017.
Cordts M, OmranM, Ramos S et al. The Cityscapes dataset for semantic urban scene understanding. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2016, pp.3213-3223.
Liu Z W, Li X X, Luo P, Loy C C, Tang X O. Semantic image segmentation via deep parsing network. In Proc. IEEE Int. Conf. Computer Vision, December 2015, pp.1377-1385.

Download references

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information

Authors and Affiliations

School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
Xin-Yu Ou, Ping Li, He-Fei Ling, Tian-Jiang Wang & Dan Li
Institute of Information Engineering, Chinese Academy of Sciences, Beijing, 100091, China
Xin-Yu Ou & Si Liu
Cadres Online Learning Institute of Yunnan Province, Yunnan Open University, Kunming, 650223, China
Xin-Yu Ou

Authors

Xin-Yu Ou
View author publications
You can also search for this author in PubMed Google Scholar
Ping Li
View author publications
You can also search for this author in PubMed Google Scholar
He-Fei Ling
View author publications
You can also search for this author in PubMed Google Scholar
Si Liu
View author publications
You can also search for this author in PubMed Google Scholar
Tian-Jiang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Dan Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ping Li.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 1122 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ou, XY., Li, P., Ling, HF. et al. Objectness Region Enhancement Networks for Scene Parsing. J. Comput. Sci. Technol. 32, 683–700 (2017). https://doi.org/10.1007/s11390-017-1751-x

Download citation

Received: 20 December 2016
Revised: 12 June 2017
Published: 14 July 2017
Issue Date: July 2017
DOI: https://doi.org/10.1007/s11390-017-1751-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Objectness Region Enhancement Networks for Scene Parsing

Abstract

Access this article

Similar content being viewed by others

Improved Efficiency of Semantic Segmentation using Pyramid Scene Parsing Deep Learning Network Method

Adaptive Generation of Weakly Supervised Semantic Segmentation for Object Detection

Light U-Net: Network Architecture for Outdoor Scene Semantic Segmentation

References

Publisher’s Note

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Objectness Region Enhancement Networks for Scene Parsing

Abstract

Access this article

Similar content being viewed by others

Improved Efficiency of Semantic Segmentation using Pyramid Scene Parsing Deep Learning Network Method

Adaptive Generation of Weakly Supervised Semantic Segmentation for Object Detection

Light U-Net: Network Architecture for Outdoor Scene Semantic Segmentation

References

Publisher’s Note

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation