Skip to main content

Advertisement

SANet: similarity aggregation and semantic fusion for few-shot semantic segmentation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Few-shot semantic segmentation (FSS) methods based on meta-learning strategies have shown promise in extracting instance knowledge from support set to infer pixel-wise labels in query set. However, a key challenge in FSS is addressing spatial inconsistency between query image and support image due to intra-class difference and inter-class similarity. Moreover, existing FSS methods often rely on multiple decoding methods for differentiated pixel-wise matching, leading to semantic inconsistency. To tackle these issues, we propose a similarity aggregation network (SANet), which effectively explores visual correspondence between support and query features while aligning semantic dimensions. Specifically, SANet introduces a mask attention module (MAM) to capture spatial relations between non-local attention features from support features and query features. Additionally, a similarity aggregation module (SAM) is proposed, which utilizes the multi-head attention mechanism and combines prior mask to calculate the aggregation similarity between each query pixel and all supporting pixels, thereby focusing the network on foreground areas. Finally, a feature fusion module (FFM) is used to adaptively fuse features at multiple scales and channels for accurate prediction. Extensive experiments on PASCAL-5i and COCO-20i demonstrate the efficiency and competitiveness of SANet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability and Access

The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3431–3440

  2. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28

  3. Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-assisted intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5-9 October, 2015, Proceedings, Part III 18, pp 234–241 . Springer

  4. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778

  5. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 779–788

  6. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2961–2969

  7. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4700–4708

  8. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2117–2125

  9. Shaban A, Bansal S, Liu Z, Essa I, Boots B (2017) One-shot learning for semantic segmentation. arXiv:1709.03410

  10. Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. Adv Neural Inf Process Syst 30

  11. Dong N, Xing EP (2018) Few-shot semantic segmentation with prototype learning. In: BMVC, vol 3, p 4

  12. Lu Z, He S, Zhu X, Zhang L, Song Y-Z, Xiang T (2021) Simpler is better: Few-shot semantic segmentation with classifier weight transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 8741–8750

  13. Wang K, Liew JH, Zou Y, Zhou D, Feng J (2019) Panet: Few-shot image semantic segmentation with prototype alignment. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 9197–9206

  14. Zhang X, Wei Y, Yang Y, Huang TS (2020) Sg-one: Similarity guidance network for one-shot semantic segmentation. IEEE Trans Cybern 50(9):3855–3865

    Article  MATH  Google Scholar 

  15. Tian Z, Zhao H, Shu M, Yang Z, Li R, Jia J (2020) Prior guided feature enrichment network for few-shot segmentation. IEEE Trans Patt Anal Mach Intell 44(2):1050–1065

    Article  MATH  Google Scholar 

  16. Wang H, Zhang X, Hu Y, Yang Y, Cao X, Zhen X (2020) Few-shot semantic segmentation with democratic attention networks. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August, 2020, Proceedings, Part XIII 16, pp 730–746 . Springer

  17. Zhang G, Kang G, Yang Y, Wei Y (2021) Few-shot segmentation via cycle-consistent transformer. Adv Neural Inf Process Syst 34:21984–21996

    Google Scholar 

  18. Shi X, Wei D, Zhang Y, Lu D, Ning M, Chen J, Ma K, Zheng Y (2022) Dense cross-query-and-support attention weighted mask aggregation for few-shot segmentation. In: European Conference on Computer Vision, pp 151–168 . Springer

  19. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30

  20. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6-12 September, 2014, Proceedings, Part V 13, pp 740–755 . Springer

  21. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  MATH  Google Scholar 

  22. Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2881–2890

  23. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv:2010.11929

  24. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 10012–10022

  25. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115:211–252

    Article  MathSciNet  Google Scholar 

  26. Yang Y, Chen Q, Feng Y, Huang T (2023) Mianet: aggregating unbiased instance and general information for few-shot semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7131–7140

  27. Zhang C, Lin G, Liu F, Yao R, Shen C (2019) Canet: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5217–5226

  28. Lang C, Cheng G, Tu B, Han J (2022) Learning what not to segment: A new perspective on few-shot segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8057–8067

  29. Liu W, Zhang C, Lin G, Liu F (2020) Crnet: Cross-reference networks for few-shot segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4165–4173

  30. Min J, Kang D, Cho M (2021) Hypercorrelation squeeze for few-shot segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 6941–6952

  31. Sun J, Shen Z, Wang Y, Bao H, Zhou X (2021) Loftr: Detector-free local feature matching with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8922–8931

  32. Cao L, Guo Y, Yuan Y, Jin Q (2022) Prototype as query for few shot semantic segmentation. arXiv:2211.14764

  33. Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7794–7803

  34. Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3146–3154

  35. Zhuge Y, Shen C (2021) Deep reasoning network for few-shot semantic segmentation. In: Proceedings of the 29th ACM International Conference on Multimedia, pp 5344–5352

  36. Wang J, Chen Y, Dong Z, Gao M (2023) Improved yolov5 network for real-time multi-scale traffic sign detection. Neural Comput Appl 35(10):7853–7865

    Article  MATH  Google Scholar 

  37. Iqbal E, Safarov S, Bang S (2022) Msanet: Multi-similarity and attention guidance for boosting few-shot segmentation. arXiv:2206.09667

  38. Yang B, Liu C, Li B, Jiao J, Ye Q (2020) Prototype mixture models for few-shot semantic segmentation. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August, 2020, Proceedings, Part VIII 16, pp 763–778 . Springer

  39. Zhang G, Navasardyan S, Chen L, Zhao Y, Wei Y, Shi H et al (2022) Mask matching transformer for few-shot segmentation. Adv Neural Inf Process Syst 35:823–836

    MATH  Google Scholar 

  40. Xu W, Huang H, Cheng M, Yu L, Wu Q, Zhang J (2023) Masked cross-image encoding for few-shot segmentation. In: 2023 IEEE International Conference on Multimedia and Expo (ICME), pp 744–749. IEEE

  41. Liu J, Bao Y, Xie G-S, Xiong H, Sonke J-J, Gavves E (2022) Dynamic prototype convolution network for few-shot semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11553–11562

  42. Peng B, Tian Z, Wu X, Wang C, Liu S, Su J, Jia J (2023) Hierarchical dense correlation distillation for few-shot segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 23641–23651

  43. Liu H, Peng P, Chen T, Wang Q, Yao Y, Hua X-S (2023) Fecanet: Boosting few-shot semantic segmentation with feature-enhanced context-aware network. IEEE Trans Multimed

  44. Cheng G, Lang C, Han J (2022) Holistic prototype activation for few-shot segmentation. IEEE Trans Pattern Anal Mach Intell 45(4):4650–4666

    MATH  Google Scholar 

  45. Xu Q, Zhao W, Lin G, Long C (2023) Self-calibrated cross attention network for few-shot segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 655–665

  46. Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88:303–338

    Article  Google Scholar 

  47. Hariharan B, Arbeláez P, Girshick R, Malik J (2014) Simultaneous detection and segmentation. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6-12 September, 2014, Proceedings, Part VII 13, pp 297–312 . Springer

Download references

Author information

Authors and Affiliations

Authors

Contributions

Minrui Ye: Conceptualization, Methodology, Software, Data curation, Writing - Original draft preparation. Tao Zhang: Supervision, Validation, Writing, Project administration.

Corresponding author

Correspondence to Tao Zhang.

Ethics declarations

Conflict of Interest/Competing Interests

The authors declare that they have no conflicts of interest or competing interests relevant to the content of this manuscript.

Ethics Approval and Consent to Participate

Not applicable.

Consent for Publication

Consent for publication was obtained from all individuals included in this study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ye, M., Zhang, T. SANet: similarity aggregation and semantic fusion for few-shot semantic segmentation. Appl Intell 55, 119 (2025). https://doi.org/10.1007/s10489-024-05986-x

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-05986-x

Keywords