An anisotropic non-local attention network for image segmentation

Yuan, Feiniu; Zhu, Yaowen; Li, Kang; Fang, Zhijun; Shi, Jinting

doi:10.1007/s00138-021-01265-8

An anisotropic non-local attention network for image segmentation

Original Paper
Published: 03 February 2022

Volume 33, article number 23, (2022)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Feiniu Yuan^1,2,5^na1,
Yaowen Zhu^1,2,5^na1,
Kang Li^1,2,5,
Zhijun Fang³ &
…
Jinting Shi⁴

426 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Recent studies witness that combining contextual and spatial information significantly improves the performance of segmentation networks. Existing methods differ from each other mainly in the way of extracting contextual and spatial information. To comprehensively utilize spatial details from shallow layers, semantic information of deeper layers, and attention mechanism by special pooling, we propose an Anisotropic Non-local Attention Network (ANANet) to jointly acquire contextual and spatial information in a flexible and efficient way. We first present a spatial contextual module with anisotropic pooling (SCMA) to further encode contextual features by integrating traditional square pooling, anisotropic pooling and attention mechanisms. Our SCMA adopts adaptive spatial pooling to extract multi-scale features and designs an anisotropic pooling attention module (APAM) to compensate for the shortage of square pooling. Our APAM first uses horizontal and vertical pooling, and then multiplies one pooling result by another to generate attention maps for long-shaped and anisotropic objects. Then, we propose a non-local channel contextual module (CCM) to fully reuse shallow features by the backbone network for emphasizing channel interdependency. Our CCM encodes category differences to further reduce erroneous segmentation of ambiguous boundary pixels. Finally, we concatenated the outputs of SCMA and CCM to further improve feature representation. Experiments show that our method achieves obviously better results than existing state-of-the-art methods on public datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scale channel attention network for image segmentation

Article 18 November 2020

A hybrid attention multi-scale fusion network for real-time semantic segmentation

Article Open access 06 January 2025

PPNet : pooling position attention network for semantic segmentation

Article 02 September 2023

References

Teichmann, M., Weber, M., Zoellner, M., Cipolla, R., Urtasun, R.: Multinet: real-time joint semantic reasoning for autonomous driving. In: IEEE Intelligent Vehicles Symposium (IVS), pp. 1013–1020 (2018)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241 (2015)
Murali, S., Govindan, V., Kalady, S.: Single image shadow removal by optimization using non-shadow anchor values. Comput. Vis. Media 5(3), 311–324 (2019)
Article Google Scholar
Le, T., Almansa, A., Gousseau, Y., Masnou, S.: Object removal from complex videos using a few annotations. Comput. Vis. Media 5(3), 267–291 (2019)
Article Google Scholar
Borji, A., Cheng, M., Hou, Q., Jiang, H., Li, J.: Salient object detection: a survey. Comput. Vis. Media 5(2), 117–150 (2019)
Article Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)
Article Google Scholar
Chen, L., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation (2017). arXiv preprint arXiv:1706.05587
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6230–6239 (2017)
Ding, H., Jiang, X., Shuai, B., Liu, A., Wang, G.: Context contrasted feature and gated multi-scale aggregation for scene segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2393–2402 (2018)
Lin, G., Milan, A., Shen, C., Reid, I.: Refinenet: multi-path refinement networks for high-resolution semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5168–5177 (2017)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention, pp. 234–241 (2015)
Byeon, W., Breuel, T., Raue, F., Liwicki, M.: Scene labeling with LSTM recurrent neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3547–3555 (2015)
Shuai, B., Zuo, Z., Wang, B., Wang, G.: Scene segmentation with dag-recurrent neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 40, 1480–1493 (2018)
Article Google Scholar
Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1915–1929 (2013)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation (2015). arXiv preprint arXiv:1511.00561
Chen, L., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation (2018). arXiv preprint arXiv:1802.02611
Liu, W., Rabinovich, A., Berg, A.: Parsenet: looking wider to see better (2015). arXiv preprint arXiv:1506.04579
He, J., Deng, Z., Qiao, Y.: Dynamic multi-scale filters for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3562–3572 (2019)
Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., Lu, H.: Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3146–3154 (2019)
Yuan, Y., Wang, J.: Ocnet: object context network for scene parsing (2018). arXiv preprint arXiv:1809.00916
Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105 (2012)
Wang, P., Chen, P., Yuan, Y., Liu, D., Huang, Z., Hou, X., Cottrell, G.: Understanding convolution for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3, 4, 5, 13 (2017)
Peng, C., Zhang, X., Yu, G., Luo, G., Sun, J.: ‘Large kernel matters–improve semantic segmentation by global convolutional network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1743–1751 (2017)
Li, H., Xiong, P., An, J., Wang, L.: Pyramid attention network for semantic segmentation (2018). arxiv:1805.10180
Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: Learning a discriminative feature network for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1857–1866 (2018)
Zhu, Z., Xu, M., Bai, S., Huang, T., Bai, X.: Anisotropic non-local neural networks for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 593–602 (2019)
Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: BiSeNet: bilateral segmentation network for real-time semantic segmentation. In: European Conference on Computer Vision (2018)
Bluche, T.: Joint line segmentation and transcription for end-to-end handwritten paragraph recognition. In: Conference on Neural Information Processing Systems (2016)
Miech, A., Laptev, I., Sivic, J.: Learnable pooling with context gating for video classification (2017). 1706.06905
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: Conference on Neural Information Processing Systems (2015)
Cao, C., Liu, X., Yang, Y., Yu, Y., Wang, J., Wang, Z., Huang, Y., Wang, L., Huang, C., Xu, W., Ramanan, D., Huang, T.S.: Look and think twice: capturing top-down visual attention with feedback convolutional neural networks. In: International Conference on Computer Vision (2015)
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning (2015)
Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., Liu, W., Chua, T.: SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)
Chung, J., Senior, A., Vinyals, O., Zisserman, A.: Lip reading sentences in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Zhang, H., Dana, K., Shi, J., Zhang, Z., Wang, X., Tyagi, A., Agrawal, A.: Context encoding for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)
Zhao, H., Yi, Z., Shu, L., Jianping, S., Loy, C., Dahua, L., Jia, J.: Psanet: point-wise spatial attention network for scene parsing. In: European Conference on Computer Vision (2018)
Shu, K., Charless, F.: Recurrent scene parsing with perspective understanding in the loop. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 956–965 (2018)
Huang, Z., Wang, X., Huang, L., et al.: Ccnet: criss-cross attention for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 603–612 (2019)
Everingham, M., Gool, L., Williams, C., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Article Google Scholar
Brostow, G., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and recognition using structure from motion point clouds. In: European Conference on Computer Vision, pp. 44–57 (2008)
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)
Hariharan, B., Arbeláez, P., Girshick, R., Malik, J.: Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 447–456 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)
Article MathSciNet Google Scholar
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’ pp. 177–186 (2010)
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in PyTorch. In: Conference on Neural Information Processing Systems (2017)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Computer Science (2015)
He, J., Deng, Z., Zhou, L., Wang, Y., Qiao, Y.: Adaptive pyramid context network for semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7519–7528 (2019)
Li, Y., Song, L., Chen, Y., Li, Z., Zhang, X., Wang, X., Sun, J.: Learning dynamic routing for semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8553–8562 (2020)
Li, X., Zhong, Z., Wu, J., Yang, Y., Lin, Z., Liu, H.: Expectation-maximization attention networks for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9167–9176 (2019)
Zoph, B., Ghiasi, G., Lin, T., Cui, Y., Liu, H., Cubuk, E., Le, Q.: Rethinking pre-training and self-training (2020). arXiv preprint arxiv:2006.06882
Ghiasi, G., Cui, Y., Srinivas, A., Qian, R., Lin, T., Cubuk, E., Quoc, V., Zoph, B.: Simple copy-paste is a strong data augmentation method for instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2918–2928 (2021)
Rashwan, A., Du, X., Yin, X., Li, J.: Dilated SpineNet for semantic segmentation (2021). arXiv preprint arxiv:2103.12270
Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: Enet: a deep neural network architecture for real-time semantic segmentation, pp. 2, 4, 5, 6, 9, 11, 12 (2016). arXiv
Li, H., Xiong, P., Fan, H., Sun, J.: Dfanet: deep feature aggregation for real-time semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9522–9531 (2019)
Chen, Y., Li, J., Xiao, H., Jin, X., Yan, S., Feng, J.: Dual path networks (2017). arXiv preprint arxiv:1707.01629
Karianakis, N., Liu, Z., Chen, Y., Soatto, S.: Reinforced temporal attention and split-rate transfer for depth-based person re-identification. In: Proceedings of the European Conference on Computer Vision, pp. 715–733 (2018)
Zhu, Y., Sapra, K., Reda, F., Shih, K., Newsam, S., Tao, A., Catanzaro, B.: Improving semantic segmentation via video propagation and label relaxation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8856–8865 (2019)
Chandra, S., Couprie, C., Kokkinos, I.: Deep spatio-temporal random fields for efficient video segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8915–8924 (2018)
Pal, A., Krishnan, G., Moorthy, M.R., Yadav, N., Ganesh, A.R., Sharmila, T.S.: DICENet: fine-grained recognition via dilated iterative contextual encoding. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2019)
Kreso, I., Causevic, D., Krapac, J., Segvic, S.: Convolutional scale invariance for semantic segmentation. In: German Conference on Pattern Recognition, pp. 64–75. Springer, Cham (2016)
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions (2015). arXiv preprint arxiv:1511.07122
Song, H., Zhou, Y., Jiang, Z., Guo, X., Yang, Z.: ResNet with global and local image features, stacked pooling block, for semantic segmentation. In: 2018 IEEE/CIC International Conference on Communications in China (ICCC), pp. 79–83 (2018)
Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J.: Icnet for real-time semantic segmentation on high-resolution images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 405–420 (2018)
Han, H., Fan, L.: A new semantic segmentation model for supplementing more spatial information. IEEE Access 7, 86979–86988 (2019)
Article Google Scholar
Chen, P., Lo, S., Hang, H., Chan, S., Lin, J.: Efficient road lane marking detection with deep learning. In: 2018 IEEE 23rd International Conference on Digital Signal Processing (DSP), pp. 1–5 (2018)
Nekrasov, V., Shen, C., Reid, I.: Template-based automatic search of compact semantic segmentation architectures. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 1980–1989 (2020)

Download references

Acknowledgments

This work was partially supported by the National Natural Science Foundation of China (61862029, 62062038), and the Natural Science Foundation of Jiangxi Province (20202BABL212007, 20192BAB207011, 20212BAB202012).

Author information

Feiniu Yuan and Yaowen Zhu are co-first authors.

Authors and Affiliations

College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai, 201418, China
Feiniu Yuan, Yaowen Zhu & Kang Li
Research Base of Online Education for Shanghai Middle and Primary Schools, Shanghai Normal University, Shanghai, China
Feiniu Yuan, Yaowen Zhu & Kang Li
School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, 201620, China
Zhijun Fang
Vocational School of Teachers and Technology, Jiangxi Agricultural University, Jiangxi, 330045, China
Jinting Shi
Shanghai Engineering Research Center of Intelligent Education and Big Data, Shanghai Normal University, Shanghai, 201418, China
Feiniu Yuan, Yaowen Zhu & Kang Li

Authors

Feiniu Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Yaowen Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Kang Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhijun Fang
View author publications
You can also search for this author in PubMed Google Scholar
Jinting Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zhijun Fang or Jinting Shi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yuan, F., Zhu, Y., Li, K. et al. An anisotropic non-local attention network for image segmentation. Machine Vision and Applications 33, 23 (2022). https://doi.org/10.1007/s00138-021-01265-8

Download citation

Received: 07 May 2021
Revised: 07 November 2021
Accepted: 11 November 2021
Published: 03 February 2022
DOI: https://doi.org/10.1007/s00138-021-01265-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An anisotropic non-local attention network for image segmentation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Scale channel attention network for image segmentation

A hybrid attention multi-scale fusion network for real-time semantic segmentation

PPNet : pooling position attention network for semantic segmentation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

An anisotropic non-local attention network for image segmentation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Scale channel attention network for image segmentation

A hybrid attention multi-scale fusion network for real-time semantic segmentation

PPNet : pooling position attention network for semantic segmentation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation