Augmented FCN: rethinking context modeling for semantic segmentation

Zhang, Dong; Zhang, Liyan; Tang, Jinhui

doi:10.1007/s11432-021-3590-1

Augmented FCN: rethinking context modeling for semantic segmentation

Research Paper
Published: 09 February 2023

Volume 66, article number 142105, (2023)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Dong Zhang¹,
Liyan Zhang² &
Jinhui Tang¹

318 Accesses
Explore all metrics

Abstract

The effectiveness of modeling contextual information has been empirically shown in numerous computer vision tasks. In this paper, we propose a simple yet efficient augmented fully convolutional network (AugFCN) by aggregating content- and position-based object contexts for semantic segmentation. Specifically, motivated because each deep feature map is a global, class-wise representation of the input, we first propose an augmented nonlocal interaction (AugNI) to aggregate the global content-based contexts through all feature map interactions. Compared to classical position-wise approaches, AugNI is more efficient. Moreover, to eliminate permutation equivariance and maintain translation equivariance, a learnable, relative position embedding branch is then supportably installed in AugNI to capture the global position-based contexts. AugFCN is built on a fully convolutional network as the backbone by deploying AugNI before the segmentation head network. Experimental results on two challenging benchmarks verify that AugFCN can achieve a competitive 45.38% mIoU (standard mean intersection over union) and 81.9% mIoU on the ADE20K val set and Cityscapes test set, respectively, with little computational overhead. Additionally, the results of the joint implementation of AugNI and existing context modeling schemes show that AugFCN leads to continuous segmentation improvements in state-of-the-art context modeling. We finally achieve a top performance of 45.43% mIoU on the ADE20K val set and 83.0% mIoU on the Cityscapes test set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CESegNet:Context-Enhancement Semantic Segmentation Network Based on Transformer

Dual Context Network for real-time semantic segmentation

Article 19 January 2023

Nested attention network based on category contexts learning for semantic segmentation

Article Open access 19 June 2024

References

Li X, Chen H, Qi X, et al. H-DenseUNet: hybrid densely connected UNet for liver and tumor segmentation from CT volumes. IEEE Trans Med Imag, 2018, 37: 2663–2674
Article Google Scholar
Li P, Chen X, Shen S. Stereo R-CNN based 3D object detection for autonomous driving. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019
Alhaija H A, Mustikovela S K, Mescheder L, et al. Augmented reality meets computer vision: efficient data generation for urban driving scenes. Int J Comput Vis, 2018, 126: 961–972
Article Google Scholar
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015
Hou Q, Zhang L, Cheng M M, et al. Strip pooling: rethinking spatial pooling for scene parsing. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020
Feng J P, Wang X G, Liu W Y. Deep graph cut network for weakly-supervised semantic segmentation. Sci China Inf Sci, 2021, 64: 130105
Article Google Scholar
Zhang D, Zhang H, Tang J, et al. Self-regulation for semantic segmentation. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), 2021
Yuan Y, Chen X, Wang J. Object-contextual representations for semantic segmentation. In: Proceedings of European Conference on Computer Vision (ECCV), 2020
Zhang H, Zhang H, Wang C, et al. Co-occurrent features in semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019
Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. In: Proceedings of International Conference on Learning Representations (ICLR), 2016
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of International Conference on Neural Information Processing Systems (NeurIPS), 2017
Wang X, Girshick R, Gupta A, et al. Non-local neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018
Chen L C, Papandreou G, Kokkinos I, et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell, 2017, 40: 834–848
Article Google Scholar
Wang P, Chen P, Yuan Y, et al. Understanding convolution for semantic segmentation. In: Proceedings of IEEE Winter Conference on Applications of Computer Vision (WACV), 2018
Ahn J, Cho S, Kwak S. Weakly supervised learning of instance segmentation with inter-pixel relations. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019
Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
He J, Deng Z, Zhou L, et al. Adaptive pyramid context network for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019
Chen L C, Papandreou G, Schroff F, et al. Rethinking atrous convolution for semantic image segmentation. 2017. ArXiv:1706.05587
Chen L C, Zhu Y, Papandreou G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of European Conference on Computer Vision (ECCV), 2018
Cordts M, Omran M, Ramos S, et al. The Cityscapes dataset for semantic urban scene understanding. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016
Zhu Z, Xu M, Bai S, et al. Asymmetric non-local neural networks for semantic segmentation. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), 2019
Zhang D, Zhang H, Tang J, et al. Feature pyramid transformer. In: Proceedings of European Conference on Computer Vision (ECCV), 2020
Huang Z, Wang X, Huang L, et al. CCNet: criss-cross attention for semantic segmentation. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), 2019
Yuan Y, Wang J. OCNet: object context network for scene parsing. 2018. ArXiv:1809.00916
Chen Y, Rohrbach M, Yan Z, et al. Graph-based global reasoning networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019
Li J, Ma A J, Yuen P C. Semi-supervised region metric learning for person re-identification. Int J Comput Vis, 2018, 126: 855–874
Article Google Scholar
Fu J, Liu J, Wang Y, et al. Adaptive context network for scene parsing. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), 2019
Ma C, Huang J B, Yang X, et al. Adaptive correlation filters with long-term and short-term memory for object tracking. Int J Comput Vis, 2018, 126: 771–796
Article Google Scholar
Bello I, Zoph B, Vaswani A, et al. Attention augmented convolutional networks. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), 2019
Carion N, Massa F, Synnaeve G, et al. End-to-end object detection with transformers. In: Proceedings of European Conference on Computer Vision (ECCV), 2020
Shaw P, Uszkoreit J, Vaswani A. Self-attention with relative position representations. In: Proceedings of North American Chapter of the Association for Computational Linguistics (NAACL), 2018
Parmar N, Vaswani A, Uszkoreit J, et al. Image transformer. In: Proceedings of International Conference on Machine Learning (ICML), 2018
Liu R, Lehman J, Molino P, et al. An intriguing failing of convolutional neural networks and the coordconv solution. In: Proceedings of International Conference on Neural Information Processing Systems (NeurIPS), 2018
Huang C Z A, Vaswani A, Uszkoreit J, et al. Music transformer. In: Proceedings of International Conference on Neural Information Processing Systems (NeurIPS), 2018
Shen Z, Zhang M, Zhao H, et al. Efficient attention: attention with linear complexities. In: Proceedings of IEEE Winter Conference on Applications of Computer Vision (WACV), 2021
Selvaraju R R, Cogswell M, Das A, et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), 2017
Zhou B, Khosla A, Lapedriza A, et al. Learning deep features for discriminative localization. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016
Zhou B, Zhao H, Puig X, et al. Scene parsing through ADE20K dataset. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
Gao H B, Guo F, Zhu J P, et al. Human motion segmentation based on structure constraint matrix factorization. Sci China Inf Sci, 2022, 65: 119103
Article Google Scholar
Zhang Z J, Pang Y W. CGNet: cross-guidance network for semantic segmentation. Sci China Inf Sci, 2020, 63: 120104
Article Google Scholar
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of International Conference on Learning Representations (ICLR), 2014
Huang G, Liu Z, van der Maaten L, et al. Densely connected convolutional networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
Fu J, Liu J, Tian H, et al. Dual attention network for scene segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019
Peng C, Zhang X, Yu G, et al. Large kernel matters—improve semantic segmentation by global convolutional network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
Zhang Z, Zhang X, Peng C, et al. ExFuse: enhancing feature fusion for semantic segmentation. In: Proceedings of European Conference on Computer Vision (ECCV), 2018
Badrinarayanan V, Kendall A, Cipolla R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 2481–2495
Article Google Scholar
Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. In: Proceedings of International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2015
Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), 2015
Guo M H, Lu C Z, Liu Z N, et al. Visual attention network. 2022. ArXiv:2202.09741
Zhou H, Qi L, Huang H, et al. CANet: co-attention network for RGB-D semantic segmentation. Pattern Recognition, 2022, 124: 108468
Article Google Scholar
Zhang D W, Wang B, Wang G R, et al. Onfocus detection: identifying individual-camera eye contact from unconstrained images. Sci China Inf Sci, 2022, 65: 160101
Article Google Scholar
Zhang D W, Zeng W, Yao J, et al. Weakly supervised object detection using proposal- and semantic-level relationships. IEEE Trans Pattern Anal Mach Intell, 2022, 44: 3349–3363
Article Google Scholar
Zhang D W, Han J, Cheng G, et al. Weakly supervised object localization and detection: a survey. IEEE Trans Pattern Anal Mach Intell, 2022, 44: 5866–5885
Article Google Scholar
Yang M, Yu K, Zhang C, et al. DenseASPP for semantic segmentation in street scenes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018
Wu T, Tang S, Zhang R, et al. CGNet: a light-weight context guided network for semantic segmentation. IEEE Trans Image Process, 2020, 30: 1169–1179
Article Google Scholar
Kong B, Supančič J, Ramanan D, et al. Cross-domain image matching with deep feature maps. Int J Comput Vis, 2019, 127: 1738–1750
Article Google Scholar
Li W, Wang X, Xia X, et al. SepViT: separable vision transformer. 2022. ArXiv:2203.15380
Chen L, Zhang H, Xiao J, et al. SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018
Li X, Wang W, Hu X, et al. Selective kernel networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019
Zhang H, Wu C, Zhang Z, et al. ResNeSt: split-attention networks. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2022
Bello I. LambdaNetworks: modeling long-range interactions without attention. In: Proceedings of International Conference on Learning Representations (ICLR), 2021
Tao C, Gao S, Shang M, et al. Get the point of my utterance! Learning towards effective responses with multi-head attention mechanism. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), 2018
Goodfellow I, Bengio Y, Courville A, et al. Deep Learning. Cambridge: MIT Press, 2016
Google Scholar
Albawi S, Mohammed T A, Al-Zawi S. Understanding of a convolutional neural network. In: Proceedings of International Conference on Engineering and Technology (ICET), 2017
Zhong Z, Lin Z Q, Bidart R, et al. Squeeze-and-attention networks for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020
Sun K, Xiao B, Liu D, et al. Deep high-resolution representation learning for human pose estimation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019
Zhang H, Dana K, Shi J, et al. Context encoding for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018
Paszke A, Gross S, Massa F, et al. PyTorch: an imperative style, high-performance deep learning library. In: Proceedings of International Conference on Neural Information Processing Systems (NeurIPS), 2019
Deng J, Dong W, Socher R, et al. ImageNet: a large-scale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009
Lin G, Milan A, Shen C, et al. RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
Li Y, Gupta A. Beyond grids: learning graph representations for visual recognition. In: Proceedings of International Conference on Neural Information Processing Systems (NeurIPS), 2018
Xiao T, Liu Y, Zhou B, et al. Unified perceptual parsing for scene understanding. In: Proceedings of European Conference on Computer Vision (ECCV), 2018
Liang X, Zhou H, Xing E. Dynamic-structured semantic propagation network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018
Zhao H, Zhang Y, Liu S, et al. PSANet: point-wise spatial attention network for scene parsing. In: Proceedings of European Conference on Computer Vision (ECCV), 2018
Zhang R, Tang S, Zhang Y, et al. Scale-adaptive convolutions for scene parsing. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), 2017
Liang X, Hu Z, Zhang H, et al. Symbolic graph reasoning meets convolutions. In: Proceedings of International Conference on Neural Information Processing Systems (NeurIPS), 2018
Kong S, Fowlkes C C. Recurrent scene parsing with perspective understanding in the loop. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018
Wu Z, Shen C, van den Hengel A. Wider or deeper: revisiting the ResNet model for visual recognition. Pattern Recognition, 2019, 90: 119–133
Article Google Scholar
Yu C, Wang J, Peng C, et al. BiSeNet: bilateral segmentation network for real-time semantic segmentation. In: Proceedings of European Conference on Computer Vision (ECCV), 2018
Ke T W, Hwang J J, Liu Z, et al. Adaptive affinity fields for semantic segmentation. In: Proceedings of European Conference on Computer Vision (ECCV), 2018
Yu C, Wang J, Peng C, et al. Learning a discriminative feature network for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018
Ding H, Jiang X, Shuai B, et al. Semantic correlation promoted shape-variant context for segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019
Cheng B, Chen L C, Wei Y, et al. SPGNet: semantic prediction guidance for scene parsing. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), 2019
Ding H, Jiang X, Liu A Q, et al. Boundary-aware feature propagation for scene segmentation. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), 2019

Download references

Acknowledgements

This work was partially supported by National Key Research and Development Program of China (Grant No. 2018AAA0102002) and National Natural Science Foundation of China (Grant Nos. 61925204, 62172212). The authors would like to thank all the anonymous reviewers for their constructive comments and suggestions.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China
Dong Zhang & Jinhui Tang
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
Liyan Zhang

Authors

Dong Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Liyan Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Jinhui Tang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Jinhui Tang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, D., Zhang, L. & Tang, J. Augmented FCN: rethinking context modeling for semantic segmentation. Sci. China Inf. Sci. 66, 142105 (2023). https://doi.org/10.1007/s11432-021-3590-1

Download citation

Received: 26 December 2021
Revised: 08 June 2022
Accepted: 28 July 2022
Published: 09 February 2023
DOI: https://doi.org/10.1007/s11432-021-3590-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Augmented FCN: rethinking context modeling for semantic segmentation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CESegNet:Context-Enhancement Semantic Segmentation Network Based on Transformer

Dual Context Network for real-time semantic segmentation

Nested attention network based on category contexts learning for semantic segmentation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now