Skip to main content
Log in

An anisotropic non-local attention network for image segmentation

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Recent studies witness that combining contextual and spatial information significantly improves the performance of segmentation networks. Existing methods differ from each other mainly in the way of extracting contextual and spatial information. To comprehensively utilize spatial details from shallow layers, semantic information of deeper layers, and attention mechanism by special pooling, we propose an Anisotropic Non-local Attention Network (ANANet) to jointly acquire contextual and spatial information in a flexible and efficient way. We first present a spatial contextual module with anisotropic pooling (SCMA) to further encode contextual features by integrating traditional square pooling, anisotropic pooling and attention mechanisms. Our SCMA adopts adaptive spatial pooling to extract multi-scale features and designs an anisotropic pooling attention module (APAM) to compensate for the shortage of square pooling. Our APAM first uses horizontal and vertical pooling, and then multiplies one pooling result by another to generate attention maps for long-shaped and anisotropic objects. Then, we propose a non-local channel contextual module (CCM) to fully reuse shallow features by the backbone network for emphasizing channel interdependency. Our CCM encodes category differences to further reduce erroneous segmentation of ambiguous boundary pixels. Finally, we concatenated the outputs of SCMA and CCM to further improve feature representation. Experiments show that our method achieves obviously better results than existing state-of-the-art methods on public datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Teichmann, M., Weber, M., Zoellner, M., Cipolla, R., Urtasun, R.: Multinet: real-time joint semantic reasoning for autonomous driving. In: IEEE Intelligent Vehicles Symposium (IVS), pp. 1013–1020 (2018)

  2. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241 (2015)

  3. Murali, S., Govindan, V., Kalady, S.: Single image shadow removal by optimization using non-shadow anchor values. Comput. Vis. Media 5(3), 311–324 (2019)

    Article  Google Scholar 

  4. Le, T., Almansa, A., Gousseau, Y., Masnou, S.: Object removal from complex videos using a few annotations. Comput. Vis. Media 5(3), 267–291 (2019)

    Article  Google Scholar 

  5. Borji, A., Cheng, M., Hou, Q., Jiang, H., Li, J.: Salient object detection: a survey. Comput. Vis. Media 5(2), 117–150 (2019)

    Article  Google Scholar 

  6. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)

  7. Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)

    Article  Google Scholar 

  8. Chen, L., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation (2017). arXiv preprint arXiv:1706.05587

  9. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6230–6239 (2017)

  10. Ding, H., Jiang, X., Shuai, B., Liu, A., Wang, G.: Context contrasted feature and gated multi-scale aggregation for scene segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2393–2402 (2018)

  11. Lin, G., Milan, A., Shen, C., Reid, I.: Refinenet: multi-path refinement networks for high-resolution semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5168–5177 (2017)

  12. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention, pp. 234–241 (2015)

  13. Byeon, W., Breuel, T., Raue, F., Liwicki, M.: Scene labeling with LSTM recurrent neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3547–3555 (2015)

  14. Shuai, B., Zuo, Z., Wang, B., Wang, G.: Scene segmentation with dag-recurrent neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 40, 1480–1493 (2018)

    Article  Google Scholar 

  15. Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1915–1929 (2013)

    Article  Google Scholar 

  16. Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

  17. Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation (2015). arXiv preprint arXiv:1511.00561

  18. Chen, L., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation (2018). arXiv preprint arXiv:1802.02611

  19. Liu, W., Rabinovich, A., Berg, A.: Parsenet: looking wider to see better (2015). arXiv preprint arXiv:1506.04579

  20. He, J., Deng, Z., Qiao, Y.: Dynamic multi-scale filters for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3562–3572 (2019)

  21. Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., Lu, H.: Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3146–3154 (2019)

  22. Yuan, Y., Wang, J.: Ocnet: object context network for scene parsing (2018). arXiv preprint arXiv:1809.00916

  23. Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105 (2012)

  24. Wang, P., Chen, P., Yuan, Y., Liu, D., Huang, Z., Hou, X., Cottrell, G.: Understanding convolution for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3, 4, 5, 13 (2017)

  25. Peng, C., Zhang, X., Yu, G., Luo, G., Sun, J.: ‘Large kernel matters–improve semantic segmentation by global convolutional network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1743–1751 (2017)

  26. Li, H., Xiong, P., An, J., Wang, L.: Pyramid attention network for semantic segmentation (2018). arxiv:1805.10180

  27. Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: Learning a discriminative feature network for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1857–1866 (2018)

  28. Zhu, Z., Xu, M., Bai, S., Huang, T., Bai, X.: Anisotropic non-local neural networks for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 593–602 (2019)

  29. Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: BiSeNet: bilateral segmentation network for real-time semantic segmentation. In: European Conference on Computer Vision (2018)

  30. Bluche, T.: Joint line segmentation and transcription for end-to-end handwritten paragraph recognition. In: Conference on Neural Information Processing Systems (2016)

  31. Miech, A., Laptev, I., Sivic, J.: Learnable pooling with context gating for video classification (2017). 1706.06905

  32. Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: Conference on Neural Information Processing Systems (2015)

  33. Cao, C., Liu, X., Yang, Y., Yu, Y., Wang, J., Wang, Z., Huang, Y., Wang, L., Huang, C., Xu, W., Ramanan, D., Huang, T.S.: Look and think twice: capturing top-down visual attention with feedback convolutional neural networks. In: International Conference on Computer Vision (2015)

  34. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning (2015)

  35. Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., Liu, W., Chua, T.: SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)

  36. Chung, J., Senior, A., Vinyals, O., Zisserman, A.: Lip reading sentences in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)

  37. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)

  38. Zhang, H., Dana, K., Shi, J., Zhang, Z., Wang, X., Tyagi, A., Agrawal, A.: Context encoding for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)

  39. Zhao, H., Yi, Z., Shu, L., Jianping, S., Loy, C., Dahua, L., Jia, J.: Psanet: point-wise spatial attention network for scene parsing. In: European Conference on Computer Vision (2018)

  40. Shu, K., Charless, F.: Recurrent scene parsing with perspective understanding in the loop. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 956–965 (2018)

  41. Huang, Z., Wang, X., Huang, L., et al.: Ccnet: criss-cross attention for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 603–612 (2019)

  42. Everingham, M., Gool, L., Williams, C., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)

    Article  Google Scholar 

  43. Brostow, G., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and recognition using structure from motion point clouds. In: European Conference on Computer Vision, pp. 44–57 (2008)

  44. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223 (2016)

  45. Hariharan, B., Arbeláez, P., Girshick, R., Malik, J.: Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 447–456 (2015)

  46. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  47. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  48. Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’ pp. 177–186 (2010)

  49. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in PyTorch. In: Conference on Neural Information Processing Systems (2017)

  50. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Computer Science (2015)

  51. He, J., Deng, Z., Zhou, L., Wang, Y., Qiao, Y.: Adaptive pyramid context network for semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7519–7528 (2019)

  52. Li, Y., Song, L., Chen, Y., Li, Z., Zhang, X., Wang, X., Sun, J.: Learning dynamic routing for semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8553–8562 (2020)

  53. Li, X., Zhong, Z., Wu, J., Yang, Y., Lin, Z., Liu, H.: Expectation-maximization attention networks for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9167–9176 (2019)

  54. Zoph, B., Ghiasi, G., Lin, T., Cui, Y., Liu, H., Cubuk, E., Le, Q.: Rethinking pre-training and self-training (2020). arXiv preprint arxiv:2006.06882

  55. Ghiasi, G., Cui, Y., Srinivas, A., Qian, R., Lin, T., Cubuk, E., Quoc, V., Zoph, B.: Simple copy-paste is a strong data augmentation method for instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2918–2928 (2021)

  56. Rashwan, A., Du, X., Yin, X., Li, J.: Dilated SpineNet for semantic segmentation (2021). arXiv preprint arxiv:2103.12270

  57. Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: Enet: a deep neural network architecture for real-time semantic segmentation, pp. 2, 4, 5, 6, 9, 11, 12 (2016). arXiv

  58. Li, H., Xiong, P., Fan, H., Sun, J.: Dfanet: deep feature aggregation for real-time semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9522–9531 (2019)

  59. Chen, Y., Li, J., Xiao, H., Jin, X., Yan, S., Feng, J.: Dual path networks (2017). arXiv preprint arxiv:1707.01629

  60. Karianakis, N., Liu, Z., Chen, Y., Soatto, S.: Reinforced temporal attention and split-rate transfer for depth-based person re-identification. In: Proceedings of the European Conference on Computer Vision, pp. 715–733 (2018)

  61. Zhu, Y., Sapra, K., Reda, F., Shih, K., Newsam, S., Tao, A., Catanzaro, B.: Improving semantic segmentation via video propagation and label relaxation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8856–8865 (2019)

  62. Chandra, S., Couprie, C., Kokkinos, I.: Deep spatio-temporal random fields for efficient video segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8915–8924 (2018)

  63. Pal, A., Krishnan, G., Moorthy, M.R., Yadav, N., Ganesh, A.R., Sharmila, T.S.: DICENet: fine-grained recognition via dilated iterative contextual encoding. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2019)

  64. Kreso, I., Causevic, D., Krapac, J., Segvic, S.: Convolutional scale invariance for semantic segmentation. In: German Conference on Pattern Recognition, pp. 64–75. Springer, Cham (2016)

  65. Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions (2015). arXiv preprint arxiv:1511.07122

  66. Song, H., Zhou, Y., Jiang, Z., Guo, X., Yang, Z.: ResNet with global and local image features, stacked pooling block, for semantic segmentation. In: 2018 IEEE/CIC International Conference on Communications in China (ICCC), pp. 79–83 (2018)

  67. Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J.: Icnet for real-time semantic segmentation on high-resolution images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 405–420 (2018)

  68. Han, H., Fan, L.: A new semantic segmentation model for supplementing more spatial information. IEEE Access 7, 86979–86988 (2019)

    Article  Google Scholar 

  69. Chen, P., Lo, S., Hang, H., Chan, S., Lin, J.: Efficient road lane marking detection with deep learning. In: 2018 IEEE 23rd International Conference on Digital Signal Processing (DSP), pp. 1–5 (2018)

  70. Nekrasov, V., Shen, C., Reid, I.: Template-based automatic search of compact semantic segmentation architectures. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 1980–1989 (2020)

Download references

Acknowledgments

This work was partially supported by the National Natural Science Foundation of China (61862029, 62062038), and the Natural Science Foundation of Jiangxi Province (20202BABL212007, 20192BAB207011, 20212BAB202012).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zhijun Fang or Jinting Shi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yuan, F., Zhu, Y., Li, K. et al. An anisotropic non-local attention network for image segmentation. Machine Vision and Applications 33, 23 (2022). https://doi.org/10.1007/s00138-021-01265-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-021-01265-8

Keywords

Navigation