Skip to main content
Log in

Objectness Region Enhancement Networks for Scene Parsing

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Semantic segmentation has recently witnessed rapid progress, but existing methods only focus on identifying objects or instances. In this work, we aim to address the task of semantic understanding of scenes with deep learning. Different from many existing methods, our method focuses on putting forward some techniques to improve the existing algorithms, rather than to propose a whole new framework. Objectness enhancement is the first effective technique. It exploits the detection module to produce object region proposals with category probability, and these regions are used to weight the parsing feature map directly. “Extra background” category, as a specific category, is often attached to the category space for improving parsing result in semantic and instance segmentation tasks. In scene parsing tasks, extra background category is still beneficial to improve the model in training. However, some pixels may be assigned into this nonexistent category in inference. Black-hole filling technique is proposed to avoid the incorrect classification. For verifying these two techniques, we integrate them into a parsing framework for generating parsing result. We call this unified framework as Objectness Enhancement Network (OENet). Compared with previous work, our proposed OENet system effectively improves the performance over the original model on SceneParse150 scene parsing dataset, reaching 38.4 mIoU (mean intersectionover-union) and 77.9% accuracy in the validation set without assembling multiple models. Its effectiveness is also verified on the Cityscapes dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Chen L C, Papandreou G, Kokkinos I, Murphy K, Yuille A L. DeepLab: Semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Analysis and Machine Intelligence, 2017, doi: 10.1109/TPAMI.2017.2699184.

  2. Zhou B L, Zhao H, Puig X, Fidler S, Barriuso A, Torralba A. Semantic understanding of scenes through the ADE20K dataset. arXiv: 1608.05442, 2016. https://arxiv. org/abs/1608.05442, June 2017.

  3. Fu Z J, Huang F X, Sun X M, Vasilakos A, Yang C N. Enabling semantic search based on conceptual graphs over encrypted outsourced data. IEEE Trans. Services Computing, 2016, doi: 10.1109/TSC.2016.2622697.

    Google Scholar 

  4. Pan Z Q, Lei J J, Zhang Y et al. Fast motion estimation based on content property for low-complexity H.265/HEVC encoder. IEEE Trans. Broadcasting, 2016, 62(3): 675-684.

    Article  Google Scholar 

  5. Fu Z J, Ren K, Shu J G, Sun X M, Huang F X. Enabling personalized search over encrypted outsourced data with efficiency improvement. IEEE Tran. Parallel and Distributed Systems, 2016, 27(9): 2546-2559.

    Article  Google Scholar 

  6. Wen X Z, Shao L, Xue Y, Fang W. A rapid learning algorithm for vehicle classification. Information Sciences, 2015, 295: 395-406.

    Article  Google Scholar 

  7. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2015, pp.3431-3440.

  8. Lin G S, Shen C H, van den Hengel A, Reid I. Exploring context with deep structured models for semantic segmentation. arXiv: 1603.03183, 2017. https://arxiv.org/abs/16-03.03183, June 2017.

  9. Lin T Y, Maire M, Belongie S et al. Microsoft COCO: Common objects in context. In Proc. European Conf. Computer Vision, October 2014, pp.740-755.

  10. Dai J F, He K M, Sun J. Instance-aware semantic segmentation via multi-task network cascades. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2016, pp.3150-3158.

  11. Pinheiro P H O, Collobert R, Doll´ar P. Learning to segment object candidates. In Proc. the 28th Int. Conf. Neural Information Processing Systems, December 2015, pp.1990-1998.

  12. Girshick R. Fast R-CNN. In Proc. IEEE Int. Conf. Computer Vision, December 2015, pp.1440-1448.

  13. Ren S Q, He K M, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.

  14. Dai J F, Li Y, He K M, Sun J. R-FCN: Object detection via region-based fully convolutional networks. In Proc. the 30th Conf. Neural Information Processing Systems, December 2016, pp.379-387.

  15. Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In Proc. the 25th Int. Conf. Neural Information Processing Systems, December 2012, pp.1097-1105.

  16. Badrinarayanan V, Kendall A, Cipolla R. SegNet: A deep convolutional encoder-decoder architecture for scene segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence, 2017, doi: 10.1109/TPAMI.2016.2644615.

  17. Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation. In Proc. IEEE Int. Conf. Computer Vision, December 2015, pp.1520-1528.

  18. Liu S, Wang C H, Qian R H et al. Surveillance video parsing with single frame supervision. arXiv: 1611.09587, 2016.https://arxiv.org/abs/1611.09587, June 2017.

  19. Liu S, Liang X D, Liu L Q, Shen X H, Yang J C, Xu C S, Lin L, Cao X, Yan S C. Matching-CNN meets KNN: Quasiparametric human parsing. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2015, pp.1419-1427.

  20. Liu S, Liang X D, Liu L Q et al. Fashion parsing with video context. IEEE Trans. Multimedia, 2015, 17(8): 1347-1358.

    Article  Google Scholar 

  21. Liang X D, Liu S, Shen X H, Yang J C, Liu L Q, Dong J, Lin L, Yan S C. Deep human parsing with active template regression. IEEE Trans. Pattern Analysis and Machine Intelligence, 2015, 37(12): 2402-2414.

  22. Liu S, Ou X Y, Qian R H et al. Makeup like a superstar: Deep localized makeup transfer network. In Proc. the 25th Int. Joint Conf. Artificial Intelligence, July 2016, pp.2568-2575.

  23. Liu S, Feng J S, Song Z, Zhang T Z, Lu H Q, Xu C S, Yan S C. Hi, magic closet, tell me what to wear! In Proc. the 20th ACM Int. Conf. Multimedia, October 2012, pp.619-628.

  24. Zhou B L, Khosla A, Lapedriza `A, Torralba A, Oliva A. Places: An image database for deep scene understanding. arXiv: 1610.02055, 2016. https://arxiv.org/abs/16-10.02055, June 2017.

  25. He K M, Zhang X Y, Ren S Q, Sun J. Identity mappings in deep residual networks. In Proc. European Conf. Computer Vision, October 2016, pp.630645.

  26. Dai J F, He K M, Sun J. Convolutional feature masking for joint object and stuff segmentation. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2015, pp.3992-4000.

  27. Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2014, pp.580-587.

  28. Hariharan B, Arbel´aez P, Girshick R, Malik J. Simultaneous detection and segmentation. In Proc. European Conf. Computer Vision, October 2014, pp.297-312.

  29. Sharma A, Tuzel O, Liu M Y. Recursive context propagation network for semantic scene labeling. In Proc. Annual Conf. Neural Information Processing Systems, December 2014, pp.2447-2455.

  30. Sharma A, Tuzel O, Jacobs D W. Deep hierarchical parsing for semantic segmentation. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2015, pp.530-538.

  31. He K M, Zhang X Y, Ren S Q, Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916.

  32. Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z Z, Du D L, Huang C, Torr P H S. Conditional random fields as recurrent neural networks. In Proc. IEEE Int. Conf. Computer Vision, December 2015, pp.1529-1537.

  33. Arnab A, Jayasumana S, Zheng S, Torr P H S. Higher order conditional random fields in deep neural networks. In Proc. European Conf. Computer Vision, October 2016, pp.524-540.

  34. Ciresan D, Meier U, Schmidhuber J. Multi-column deep neural networks for image classification. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2012, pp.3642-3649.

  35. Dai J F, He K M, Sun J. BoxSup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In Proc. IEEE Int. Conf. Computer Vision, December 2015, pp.1635-1643.

  36. Krähenbühl P, Koltun V. Efficient inference in fully connected CRFs with Gaussian edge potentials. In Proc. the 24th Int. Conf. Neural Information Processing Systems, December 2011, pp.109-117.

  37. Jia Y Q, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R B, Guadarrama S, Darrell T. Caffe: Convolutional architecture for fast feature embedding. In Proc. the 22nd ACM Int. Conf. Multimedia, November 2014, pp.675-678.

  38. Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. arXiv: 1511.07122, 2016. https://arxiv. org/abs/1511.07122, June 2017.

  39. Cordts M, OmranM, Ramos S et al. The Cityscapes dataset for semantic urban scene understanding. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2016, pp.3213-3223.

  40. Liu Z W, Li X X, Luo P, Loy C C, Tang X O. Semantic image segmentation via deep parsing network. In Proc. IEEE Int. Conf. Computer Vision, December 2015, pp.1377-1385.

Download references

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ping Li.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 1122 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ou, XY., Li, P., Ling, HF. et al. Objectness Region Enhancement Networks for Scene Parsing. J. Comput. Sci. Technol. 32, 683–700 (2017). https://doi.org/10.1007/s11390-017-1751-x

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-017-1751-x

Keywords

Navigation