Multi-scale deep context convolutional neural networks for semantic segmentation

Zhou, Quan; Yang, Wenbing; Gao, Guangwei; Ou, Weihua; Lu, Huimin; Chen, Jie; Latecki, Longin Jan

doi:10.1007/s11280-018-0556-3

Multi-scale deep context convolutional neural networks for semantic segmentation

Published: 19 April 2018

Volume 22, pages 555–570, (2019)
Cite this article

World Wide Web Aims and scope Submit manuscript

Quan Zhou^1,2,
Wenbing Yang^1,2,
Guangwei Gao³,
Weihua Ou⁴,
Huimin Lu⁵,
Jie Chen⁶ &
…
Longin Jan Latecki⁷

1950 Accesses
96 Citations
Explore all metrics

Abstract

Recent years have witnessed the great progress for semantic segmentation using deep convolutional neural networks (DCNNs). This paper presents a novel fully convolutional network for semantic segmentation using multi-scale contextual convolutional features. Since objects in natural images tend to be with various scales and aspect ratios, capturing the rich contextual information is very critical for dense pixel prediction. On the other hand, when going deeper in convolutional layers, the convolutional feature maps of traditional DCNNs gradually become coarser, which may be harmful for semantic segmentation. According to these observations, we attempt to design a multi-scale deep context convolutional network (MDCCNet), which combines the feature maps from different levels of network in a holistic manner for semantic segmentation. The segmentation outputs of MDCCNets are further enhanced using dense connected conditional random fields (CRF). The proposed network allows us to fully exploit local and global contextual information, ranging from an entire scene to every single pixel, to perform pixel-wise label estimation. The experimental results demonstrate that our method outperforms or is comparable to state-of-the-art methods on PASCAL VOC 2012 and SIFTFlow semantic segmentation datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

U-Net: Convolutional Networks for Biomedical Image Segmentation

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

References

Badrinarayanan, V., Alex, K., Roberto, C.: SegNet: A deep convolutional encoder-decoder architecture for scene segmentation. IEEE TPAMI (2017)
Carreira, J., Sminchisescu, C.: Cpmc: Automatic object segmentation using constrained parametric min-cuts. IEEE TPAMI. 34(7), 1312–1328 (2012)
Article Google Scholar
Chen, L.C., Yang, Y., Wang, J., Xu, W., Yuille, A.L.: Attention to scale: scale-aware semantic image segmentation. In: Proceedings of CVPR, pp. 3640–3649 (2016)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE TPAMI (2017)
Erhan, D., Szegedy, C., Toshev, A., Anguelov, D.: Scalable object detection using deep neural networks. In: Proceedings of CVPR, pp. 2147–2154 (2014)
Everingham, M., Eslami, S.A., Van, G.L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. IJCV 11(1), 98–136 (2015)
Article Google Scholar
Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE TPAMI. 35(8), 1915–1929 (2013)
Article Google Scholar
Fulkerson, B., Vedaldi, A., Soatto, S.: Class Segmentation and Object Localization with Superpixel Neighborhoods. In: Proceedings of ICCV, pp. 670-677 (2009)
Gao, L.L., Song, J.K., Nie, F.P., Zhou, F.H., Sebe, N., Shen, H.T.: Graph-Without-Cut: an ideal graph learning for image segmentation. In: Proceedings of AAAI, pp. 1188–1194 (2016)
Gao, L.L., Guo, Z., Zhang, H.W., Xu, X., Shen, H.T.: Video captioning with Attention-Based LSTM and semantic consistency. IEEE TMM. 19(9), 2045–2055 (2017)
Google Scholar
Girshick, R.: Fast R-Cnn. In: Proceedings of ICCV, pp. 1440–1448 (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of CVPR, pp. 580–587 (2014)
Hariharan, B., ArbelAez, P., Bourdev, L., Maji, S., Malik, J.: Semantic contours from inverse detectors. In: Proceedings of ICCV, pp. 991–998 (2011)
He, K.M., Zhang, X.Y., Ren, S.Q., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE TPAMI. 37(9), 1904–1916 (2015)
Article Google Scholar
He, K.M., Zhang, X.Y., Ren, S.Q., Sun, J.: Deep residual learning for image recognition. In: Proceedings of CVPR, pp. 770–778 (2016)
Jia, Y.Q., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of ACMMM, pp. 675–678 (2014)
Kamran, S.A., Sabbir, A.S.: Efficient yet deep convolutional neural networks for semantic segmentation. In: Arxiv (2017)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of NIPS, pp. 1097–1105 (2012)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of CVPR, pp. 2169–2178 (2006)
Lin, G.S., Shen, C.H., Van, D.H., Reid, I.: Exploring context with deep structured models for semantic segmentation. IEEE TPAMI (2017)
Liu, C., Yuen, J., Torralba, A.: Sift flow: Dense correspondence across scenes and its applications. IEEE TPAMI. 33(5), 978–994 (2011)
Article Google Scholar
Liu, Z.W., Li, X.X., Luo, P., Loy, C.C., Tang, X.O.: Semantic image segmentation via deep parsing network. In: Proceedings of ICCV, pp. 1377–1385 (2015)
Liu, Y., Chen, M.M., Hu, X.W., Wang, K., Bai, X.: Richer convolutional features for edge detection. In: Proceedings of CVPR, pp. 5872–5881 (2017)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE TPAMI. 39(4), 640–651 (2017)
Article Google Scholar
Long, J.L., Zhang, N., Darrell, T.: Do convnets learn correspondence? In: Proceedings of NIPS, pp. 1601–1609 (2014)
Mostajabi, M., Yadollahpour, P., Shakhnarovich, G.: Feedforward semantic segmentation with zoom-out features. In: Proceedings of CVPR, pp. 3376–3385 (2015)
Nguyen, K., Fookes, C., Sridharan, S.: Deep context modeling for semantic segmentation. In: Proceedings of WACV, pp. 56–63 (2017)
Noh, H., Hong, S., Han, B.Y.: Learning deconvolution network for semantic segmentation. In: Proceedings of ICCV, pp. 1520–1528 (2015)
Pinherio, R.C., Pedro, H.: Recurrent convolutional neural networks for scene parsing. In: Proceedings of ICML (2014)
Ren, S.Q., He, K.M., Girshick, R., Sun, J.: Faster R-Cnn: towards real-time object detection with region proposal networks. In: Proceedings of NIPS, pp. 91–99 (2015)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Proceedings of MICCAI, pp. 234–241 (2015)
Shotton, J., Johnson, M., Cipolla, R.: Semantic texton forests for image categorization and segmentation. In: Proceedings of CVPR, pp. 1–8 (2008)
Shotton, J., Winn, J., Rother, C., Criminisi, A.: Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context. IJCV 81(1), 2–23 (2009)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
Song, J.K., Gao, L.L., Nie, F.P., Shen, H.T., Yan, Y., Sebe, N.: Optimized graph learning using partial tags and multiple features for image and video annotation. IEEE TIP. 25(11), 4999–5011 (2016)
MathSciNet MATH Google Scholar
Song, J.K., Gao, L.L., Puscas, M.M., Nie, F.P., Shen, F.M., Sebe, N.: Joint graph learning and video segmentation via multiple cues and topology calibration. In: Proceedings of ACM MM, pp. 831–840 (2016)
Song, J.K., Gao, L., Liu, L., Zhu, X., Sebe, N.: Quantization-based hashing: a general framework for scalable image and video retrieval. PR (2017)
Song, J.K., Zhang, H.W., Li, X.P., Gao, L.L., Wang, M., Hong, R.C.: Self-supervised video hashing with hierarchical binary auto-encoder. IEEE TIP (2018)
Szegedy, C., Liu, W., Jia, Y.Q., Sermanet, P., Reed, S., Anguelo, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of CVPR, pp. 1–9 (2015)
Tighe, J., Lazebnik, S.: Finding things: image parsing with regions and per-exemplar detectors. In: Proceedings of CVPR, pp. 3001–3008 (2013)
Tu, Z.W., Bai, X.: Auto-context and its application to high-level vision tasks and 3d brain image segmentation. IEEE TPAMI. 32(10), 1744–1757 (2010)
Article Google Scholar
Uijlings, J.R., Van, D.S., Gevers, T., Smeulders, A.W.: Selective search for object recognition. IJCV. 104(2), 154–171 (2013)
Article Google Scholar
Vladlen, K.: Efficient Inference in Fully Connected Crfs with Gaussian Edge Potentials. In: Proceedings of NIPS, pp. 4–10 (2011)
Wang, X., Gao, L., Wang, P., Sun, X., Liu, X.: Two-stream 3D convNet fusion for action recognition in videos with arbitrary size and length. IEEE Transactions on Multimedia (2017)
Xu, X., He, L., Shimada, A., Taniguchi, R.I., Lu, H: Self-supervised video hashing with hierarchical binary auto-encoder. Neurocomputing 21(3), 191–203 (2016)
Article Google Scholar
Xu, X., Shen, F., Yang, Y., Shen, H.T., Li, X.L.: Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE TIP. 26(5), 2494–2507 (2017)
MathSciNet MATH Google Scholar
Yang, W.B., Zhou, Q., Fan, Y.W., Gao, G.W., Wu, S.S., Ou, W.H., Lu, H.M., Cheng, J., Longin, J.L.: Deep context convolutional neural networks for semantic segmentation. In: Proceedings of CCCV (2017)
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv:1511.07122 (2015)
Zhao, H.H., Shi, J.P., Qi, X.J., Wang, X.G., Jia, J.Y.: Pyramid scene parsing network. arXiv:1612.01105 (2017)
Zheng, S., Jayasumana, S., Paredes, B.R., Vineet, V., Su, Z.Z., Du, D.L., Huang, C., Torr, P.H.: Conditional random fields as recurrent neural networks. In: Proceedings of ICCV, pp. 1529–1537 (2015)
Zhou, Q., Zhu, J., Liu, W.Y.: Learning dynamic hybrid Markov random field for image labeling. IEEE TIP. 22(6), 2219–2232 (2013)
MathSciNet MATH Google Scholar
Zhou, Q., Zheng, B.Y., Zhu, W.P., Latecki, L.J.: Multi-scale context for scene labeling via flexible segmentation graph. PR 2016(59), 312–324 (2016)
Google Scholar

Download references

Acknowledgements

The authors would like to thank all the anonymous reviewers for their valuable comments and suggestions. This work was partly supported by the National Science Foundation (Grant No. IIS-1302164), the National Natural Science Foundation of China (Grant No. 61401228, 61402238, 61762021, 61571240, 61501247, 61501259, 61671253, 61402122), China Postdoctoral Science Foundation (Grant No. 2015M581841), Natural Science Foundation of Jiangsu Province (Grant No. BK20150849, BK20160908), Postdoctoral Science Foundation of Jiangsu Province (Grant No. 1501019A), Open Research Fund of National Engineering Research Center of Communications and Networking (Nanjing University of Posts and Telecommunications) (Grant No. TXKY17009), Open Fund Project of Fujian Provincial Key Laboratory of Information Processing and Intelligent Control (Minjiang University) (Grant No. MJUKF201710), Open Fund Project of Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education (Nanjing University of Science and Technology) (Grant No. JYB201709, JYB201710), Natural Science Foundation of Guizhou Province (Grant No.[2017]1130), and the 2014 Ph.D Recruitment Program of Guizhou Normal University.

Author information

Authors and Affiliations

National Engineering Research Center of Communications and Networking, Nanjing University of Posts, Telecommunications, Nanjing, China
Quan Zhou & Wenbing Yang
Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, Minjiang University, Fuzhou, 350121, China
Quan Zhou & Wenbing Yang
Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, Nanjing University of Science, Technology, Nanjing, China
Guangwei Gao
School of Big Data and Computer Science, Guizhou Normal University, Guiyang, China
Weihua Ou
Department of Mechanical and Control Engineering, Kyushu Institute of Technology, Kitakyushu, Japan
Huimin Lu
Huawei Technologies Co. Ltd., ShenZhen, China
Jie Chen
Department of Computer and Information Sciences, Temple University, Philadelphia, USA
Longin Jan Latecki

Authors

Quan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Wenbing Yang
View author publications
You can also search for this author in PubMed Google Scholar
Guangwei Gao
View author publications
You can also search for this author in PubMed Google Scholar
Weihua Ou
View author publications
You can also search for this author in PubMed Google Scholar
Huimin Lu
View author publications
You can also search for this author in PubMed Google Scholar
Jie Chen
View author publications
You can also search for this author in PubMed Google Scholar
Longin Jan Latecki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Quan Zhou or Weihua Ou.

Additional information

This article belongs to the Topical Collection: Special Issue on Deep vs. Shallow: Learning for Emerging Web-scale Data Computing and Applications

Guest Editors: Jingkuan Song, Shuqiang Jiang, Elisa Ricci, and Zi Huang

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, Q., Yang, W., Gao, G. et al. Multi-scale deep context convolutional neural networks for semantic segmentation. World Wide Web 22, 555–570 (2019). https://doi.org/10.1007/s11280-018-0556-3

Download citation

Received: 16 August 2017
Revised: 16 March 2018
Accepted: 27 March 2018
Published: 19 April 2018
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s11280-018-0556-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-scale deep context convolutional neural networks for semantic segmentation

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-scale deep context convolutional neural networks for semantic segmentation

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation