Dual Context Network for real-time semantic segmentation

Yin, Hong; Xie, Wenbin; Zhang, Jingjing; Zhang, Yuanfa; Zhu, Weixing; Gao, Jie; Shao, Yan; Li, Yajun

doi:10.1007/s00138-023-01373-7

Dual Context Network for real-time semantic segmentation

ORIGINAL PAPER
Published: 19 January 2023

Volume 34, article number 22, (2023)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Hong Yin¹,
Wenbin Xie¹,
Jingjing Zhang²,
Yuanfa Zhang²,
Weixing Zhu¹,
Jie Gao¹,
Yan Shao¹ &
…
Yajun Li³

484 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Real-time semantic segmentation is a challenging task as both segmentation accuracy and inference speed need to be considered at the same time. In this paper, a Dual Context Network (DCNet) is presented to address this challenge. It contains two independent sub-networks: Region Context Network and Pixel Context Network. Region Context Network is main network with low-resolution input and features re-weighting module to achieve sufficient receptive field. Meanwhile, Pixel Context Network with location attention module is to capture the location dependencies of each pixel for assisting the main network to recover spatial detail. A contextual feature fusion is introduced to combine output features of these two sub-networks. The experiments show that DCNet can achieve high-quality segmentation while keeping a high speed. Specifically, for Cityscapes test dataset, it can achieve 76.1% Mean IOU with the speed of 82 FPS on a single GTX 2080Ti GPU when using ResNet50 as backbone and 71.2% Mean IOU with the speed of 142 FPS when using ResNet18 as backbone.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CCMFRNet: A Real-Time Semantic Segmentation Network with Context Cascade and Multi-scale Feature Refinement

FBRNet: a feature fusion and border refinement network for real-time semantic segmentation

Article 24 January 2024

DSMRSeg: Dual-Stage Feature Pyramid and Multi-Range Context Aggregation for Real-Time Semantic Segmentation

References

Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12), 2481–2495 (2017)
Article Google Scholar
Paszke, A., Chaurasia, A., Kim,S., Culurciello, E.: Enet: a deep neural network architecture for real-time semantic segmentation (2016)
Mehta, S., Rastegari, M., Caspi, A., Shapiro, L., Hajishirzi, H.: Espnet: efficient spatial pyramid of dilated convolutions for semantic segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 552–568 (2018)
Yu, C., Wang, J., Peng, C. Gao, C., Yu, G., Sang, N.: Bisenet: bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 325–341 (2018)
Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J.: Icnet for real-time semantic segmentation on high-resolution images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 405–420 (2018)
Li, H., Xiong, P., Fan, H., Sun, J.: Dfanet: deep feature aggregation for real-time semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9522–9531 (2019).
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Cordts, M. et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)
Brostow, G.J., Fauqueur, J., Cipolla, R.J.: Semantic object classes in video: a high-definition ground truth database. Pattern Recognit. Lett. 30(2), 88–97 (2009)
Article Google Scholar
Wu, Z., Shen, C., Hengel, A.: Real-time semantic image segmentation via spatial sparsity (2017)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Springer (2015)
Liu, W., Rabinovich, A., Berg, A. C.: Parsenet: Looking wider to see better (2015)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Chen, L.-C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation (2017)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017).
Yang, M., Yu, K., Zhang, C., Li, Z.,. Yang, K: Denseaspp for semantic segmentation in street scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3684–3692 (2018)
Yuan, Y., Wang, J.: Ocnet: Object context network for scene parsing (2018)
Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: Learning a discriminative feature network for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1857–1866 (2018)
Orsic, M., Kreso, I., Bevandic, P., Segvic, S.: In defense of pre-trained imagenet architectures for real-time semantic segmentation of road-driving images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12607–12616 (2019)
Lin, Z. et al.: A structured self-attentive sentence embedding (2017)
Romera, E., Alvarez, J.M., Bergasa, L.M., Arroyo, R.: "Erfnet: efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst. 19(1), 263–272 (2017)
Article Google Scholar
Howard A. et al.: Searching for mobilenetv3. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1314–1324
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017)
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp. 801–818 (2018)
Fu, J. et al.: Dual attention network for scene segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3146–3154 (2019)
Zhang, H. et al.: Context encoding for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7151–7160 (2018)
Sturgess, P., Alahari, K., Ladicky, L., Torr, P. H.: Combining appearance and structure from motion features for road scene understanding (2009)
Caesar, H., Uijlings, J., Ferrari, V.: Coco-stuff: Thing and stuff classes in context. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1209–1218 (2018)
Lin, T.-Y. et al.: Microsoft coco: common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440 (2015)
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Article Google Scholar
Wang, P. et al.: Understanding convolution for semantic segmentation. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1451–1460, IEEE (2018)
Ding, H., Jiang, X., Shuai, B., Qun Liu, A., Wang, G.: Context contrasted feature and gated multi-scale aggregation for scene segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2393–2402 (2018)
Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: practical guidelines for efficient CNN architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018)
Pohlen, T., Hermans, A., Mathias, M., Leibe, B.: Full-resolution residual networks for semantic segmentation in street scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4151–4160 (2017)
Pan, X., Shi, J., Luo, P., Wang, X., Tang, X.: Spatial as deep: Spatial CNN for traffic scene understanding. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: Ccnet: Criss-cross attention for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 603–612 (2019)
Li, H., Xiong, P. An, J., Wang, L.: Pyramid attention network for semantic segmentation. arXiv 2018
Zheng, S. et al.: Conditional random fields as recurrent neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1529–1537 (2015)
Liu, Y., Cao, S., Lasang, P., Shen, S.: Modular lightweight network for road object detection using a feature fusion approach. IEEE Trans. Syst. Man Cybern. Syst. pp 1–13 (2019)
Khelifi, L., Mignotte, M.: A novel fusion approach based on the global consistency criterion to fusing multiple segmentations. IEEE Trans. Syst. Man Cybern. Syst. 47(9), 2489–2502 (2017)
Google Scholar
Yuan, X., Cao, X., Hao, X., Chen, H., Wei, X.: Vehicle detection by a context-aware multichannel feature pyramid. IEEE Trans. Syst. Man Cybern. Syst. 47(7), 1348–1357 (2017)
Article Google Scholar
Li, Y., Guo, Y., Kao, Y., He, R.: Image piece learning for weakly supervised semantic segmentation. IEEE Trans. Syst. Man Cybern. Syst. 47(4), 648–659 (2017)
Article Google Scholar
Si, J., Zhang, H., Li, C., Guo, J.: Spatial pyramid-based statistical features for person re-identification: a comprehensive evaluation. IEEE Trans. Syst. Man Cybern. Syst. 48(7), 1140–1154 (2018)
Article Google Scholar
Chen, B., Collins, M., Zhu, Y., Liu, T., Huang, T., Adam, H., Chen, L.: Panoptic-deeplab: a simple, strong, and fast baseline for bottom-up panoptic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)
Fang, J., Sun, Y., Zhang, Q., Peng, K., Li, Y., Liu, W., Wang, X.: FNA++: fast network adaptation via parameter remapping and architecture search. IEEE Trans. Pattern Anal. Mach. Intell. 43(9), 2990–3004 (2021)
Article Google Scholar
Li, X., You, A., Zhu, Z., Zhao, H., Yang, M., Yang, K., Tan, S., Tong, Y.: Semantic flow for fast and accurate scene parsing. In: Proceedings of the European Conference on Computer Vision (ECCV) (2020)
Fang, C., Tian., H., Zhang, D, Zhang, Q., Han, J., Han, J.: Densely nested top-down flows for salient object detection. Sci. China Inf. Sci. (2022)
Liu, N., Han, J.: DHSNet: deep hierarchical saliency network for salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
Liu, N., Han, J., Yang, M.: PiCANet: learning pixel-wise contextual attention for saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Zhao, J., Liu, J.-J., Fan, D.-P., Cao, Y., Yang, J., Cheng, M.-M.: EGNet: Edge guidance network for salient object detection. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 8779–8788 (2019)
Liu, N., Zhao, W., Shao, L., Han, J.: SCG: saliency and contour guided salient instance segmentation. IEEE Trans. Image Process. 30, 5862–5874 (2021)
Article Google Scholar
Zhang, D., Zeng, W., Guo, G., Fang, C., Cheng, L., Han. J.: Weakly supervised semantic segmentation via alternative self-dual teaching. arXiv preprint arXiv:2112.09459 (2021)
Zhang, D., Zeng, W., Yao, J., Han, J.: Weakly supervised object detection using proposal-and semantic-level relationships. IEEE Trans. Pattern Anal. Mach. Intell. 44(6), 3349–3363 (2022)
Article Google Scholar
Zhang, D., Han, J., Yang, L., Xu, D.: SPFTN: a joint learning framework for localizing and segmenting objects in weakly labeled videos. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 475–489 (2020)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Army Engineering University of PLA, Nanjing, China
Hong Yin, Wenbin Xie, Weixing Zhu, Jie Gao & Yan Shao
Naval Research Academy, Nanjing, China
Jingjing Zhang & Yuanfa Zhang
Nanjing University of Science and Technology, Nanjing, 210094, China
Yajun Li

Authors

Hong Yin
View author publications
You can also search for this author in PubMed Google Scholar
Wenbin Xie
View author publications
You can also search for this author in PubMed Google Scholar
Jingjing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yuanfa Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Weixing Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Jie Gao
View author publications
You can also search for this author in PubMed Google Scholar
Yan Shao
View author publications
You can also search for this author in PubMed Google Scholar
Yajun Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenbin Xie.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yin, H., Xie, W., Zhang, J. et al. Dual Context Network for real-time semantic segmentation. Machine Vision and Applications 34, 22 (2023). https://doi.org/10.1007/s00138-023-01373-7

Download citation

Received: 20 January 2021
Revised: 30 October 2022
Accepted: 02 January 2023
Published: 19 January 2023
DOI: https://doi.org/10.1007/s00138-023-01373-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dual Context Network for real-time semantic segmentation

Abstract

Access this article

Similar content being viewed by others

CCMFRNet: A Real-Time Semantic Segmentation Network with Context Cascade and Multi-scale Feature Refinement

FBRNet: a feature fusion and border refinement network for real-time semantic segmentation

DSMRSeg: Dual-Stage Feature Pyramid and Multi-Range Context Aggregation for Real-Time Semantic Segmentation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dual Context Network for real-time semantic segmentation

Abstract

Access this article

Similar content being viewed by others

CCMFRNet: A Real-Time Semantic Segmentation Network with Context Cascade and Multi-scale Feature Refinement

FBRNet: a feature fusion and border refinement network for real-time semantic segmentation

DSMRSeg: Dual-Stage Feature Pyramid and Multi-Range Context Aggregation for Real-Time Semantic Segmentation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation