Attention-based dual context aggregation for image semantic segmentation

Zhao, Dexin; Qi, Zhiyang; Yang, Ruixue; Wang, Zhaohui

doi:10.1007/s11042-021-11094-6

Attention-based dual context aggregation for image semantic segmentation

Published: 01 June 2021

Volume 80, pages 28201–28216, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Dexin Zhao¹,
Zhiyang Qi ORCID: orcid.org/0000-0002-6611-2342¹,
Ruixue Yang¹ &
…
Zhaohui Wang¹

338 Accesses
2 Citations
Explore all metrics

Abstract

Recent works have extensively probed contextual relevance to enhance the scene understanding. However, most approaches tend to model the relationships between local regions due to the limitation of the convolution kernel, while rarely exploring long-range dependencies. In this paper, we come up with the Dual Context Aggregation Module (DCM) to effectively capture such important information. DCM splits into two attention modules to obtain dense contextual information via modeling relations between positions and channels. The spatial attention module generates huge attention maps by constructing pairwise relationships between positions in the same row or column. The channel attention module applies the Weight Calibrate Block to generate weights for all the channels to effectively get the correlation between different channels. We adopt an element addition to integrate the feature maps of the two modules. Moreover, we design a two-step decoder module to improve the feature representation. On the basis of these developments, we construct the Dual Context aggregation Network (DCNet). Extensive evaluation experiments on the benchmarks prove that our model leads to robust feature representation. Our method demonstrates competitive performance compared to state-of-the-art models, achieving the MIoU scores of 81.9% on Cityscapes and 45.54% on ADE20K.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CESegNet:Context-Enhancement Semantic Segmentation Network Based on Transformer

Semantic Image Segmentation with Improved Position Attention and Feature Fusion

Article 12 May 2020

An anisotropic non-local attention network for image segmentation

Article 03 February 2022

References

Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Article Google Scholar
Bulo SR, Porzi L, Kontschieder P (2018) In-place activated BatchNorm for memory-optimized training of DNNs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5639–5647
Cao Y, Xu J, Lin S, Wei F, Hu H (2019) GCNet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv preprint arXiv:14127062
Chen L, Papandreou G, Schroff F, Adam H (2017) Rethinking Atrous convolution for semantic image segmentation. arXiv preprint arXiv:170605587
Chen L, Collins MD, Zhu Y, Papandreou G, Zoph B, Schroff F, Adam H, Shlens J (2018) Searching for efficient multi-scale architectures for dense image prediction. In: Advances in neural information processing systems, pp. 8699–8710
Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) DeepLab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40:834–848
Article Google Scholar
Chen L, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with Atrous separable convolution for semantic image segmentation. In: Proceedings of European conference on computer vision, pp. 833–851
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Fan H, Mei X, Prokhorov DV, Ling H (2018) Multi-level contextual RNNs with attention model for scene labeling. IEEE Trans Intell Transp Syst 19:3475–3485
Article Google Scholar
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3146–3154
Fu J, Liu J, Wang Y, Li Y, Bao Y, Tang J, Lu H (2019) Adaptive context network for scene parsing. In: Proceedings of IEEE International Conference on Computer Vision, pp. 6748–6757
Fu J, Liu J, Wang Y, Lu H (2019) Stacked Deconvolutional network for semantic segmentation. IEEE Trans Image Process:1. https://doi.org/10.1109/TIP.2019.2895460
Hu H, Gu J, Zhang Z, Dai J, Wei Y (2018) Relation networks for object detection. In: Proceedings of IEEE International Conference on Computer Vision, pp. 3588–3597
Hu J, Shen L, Albanie S, Sun G, Wu E (2018) Squeeze-and-excitation networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141
Huang Z, Wang X, Huang L, Huang C, Wei Y, Liu W (2019) CCNet: Criss-cross attention for semantic segmentation. In: Proceedings of IEEE International Conference on Computer Vision, pp. 603–612
Li X, Zhong Z, Wu J, Yang Y, Lin Z, Liu H (2019) Expectation-maximization attention networks for semantic segmentation. In: Proceedings of IEEE International Conference on Computer Vision, pp. 9167–9176
Li Z, Tang J, Zhang L, Yang J (2020) Weakly-supervised semantic guided hashing for social image retrieval. Int J Comput Vis 128:2265–2278. https://doi.org/10.1007/s11263-020-01331-0
Article MathSciNet Google Scholar
Lin G, Milan A, Shen C, Reid I (2017) RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 5168–5177
Lin Z, Feng M, Santos CND, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. arXiv preprint arXiv:170303130
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440
Lu Y, Chen Y, Zhao D, Chen J (2020) Graph-FCN for image semantic segmentation. In: Proceedings of International Symposium on Neural Networks, pp. 97–105
Peng Z, Li Z, Zhang J, Li Y, Qi G, Tang J (2019) Few-shot image recognition with knowledge transfer. In: Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp 441-449
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention, pp. 234–241
Shuai B, Zuo Z, Wang B, Wang G (2016) DAG-recurrent neural networks for scene labeling. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3620–3629
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Proceedings of Advances in Neural Information Processing Systems neural information processing systems, pp. 5998–6008
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803
Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:151107122
Yu C, Wang J, Gao C, Yu G, Shen C, Sang N (2020) Context prior for scene segmentation. arXiv preprint arXiv:200401547
Yuan Y, Wang J (2018) OCNet: object context network for scene parsing. arXiv preprint arXiv:180900916
Zhang H, Dana KJ, Shi J, Zhang Z, Wang X, Tyagi A, Agrawal A (2018) Context encoding for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 7151–7160
Zhang H, Goodfellow I, Metaxas DN, Odena A (2018) Self-attention generative adversarial networks. In: Proceedings of International Conference on Machine Learning, pp. 7354–7363
Zhang Y, Sun X, Dong J, Chen C, Shen Y (2020) High-Order Paired-ASPP Networks for Semantic Segmentation. arXiv preprint arXiv: 200207371
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 6230–6239
Zhao H, Zhang Y, Liu S, Shi J, Loy CC, Lin D, Jia J (2018) PSANet: point-wise spatial attention network for scene parsing. In: Proceedings of European conference on computer vision, pp. 270–286
Zhou B, Zhao H, Puig X, Fidler S, Barriuso A, Torralba A (2017) Scene parsing through ADE20K dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5122–5130
Zhu Z, Xu M, Bai S, Huang T, Bai X (2019) Asymmetric non-local neural networks for semantic segmentation. In: Proceedings of IEEE International Conference on Computer Vision, pp. 593–602

Download references

Author information

Authors and Affiliations

Tianjin Key Laboratory of Intelligence Computing and Novel Software Technology, Tianjin University of Technology, Tianjin, 300384, China
Dexin Zhao, Zhiyang Qi, Ruixue Yang & Zhaohui Wang

Authors

Dexin Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyang Qi
View author publications
You can also search for this author in PubMed Google Scholar
Ruixue Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zhaohui Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhiyang Qi.

Ethics declarations

Conflict of interest

There are no conflicts of interests in this work.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, D., Qi, Z., Yang, R. et al. Attention-based dual context aggregation for image semantic segmentation. Multimed Tools Appl 80, 28201–28216 (2021). https://doi.org/10.1007/s11042-021-11094-6

Download citation

Received: 16 September 2020
Revised: 06 January 2021
Accepted: 21 May 2021
Published: 01 June 2021
Issue Date: July 2021
DOI: https://doi.org/10.1007/s11042-021-11094-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attention-based dual context aggregation for image semantic segmentation

Abstract

Access this article

Similar content being viewed by others

CESegNet:Context-Enhancement Semantic Segmentation Network Based on Transformer

Semantic Image Segmentation with Improved Position Attention and Feature Fusion

An anisotropic non-local attention network for image segmentation

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Attention-based dual context aggregation for image semantic segmentation

Abstract

Access this article

Similar content being viewed by others

CESegNet:Context-Enhancement Semantic Segmentation Network Based on Transformer

Semantic Image Segmentation with Improved Position Attention and Feature Fusion

An anisotropic non-local attention network for image segmentation

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation