SANet: similarity aggregation and semantic fusion for few-shot semantic segmentation

Ye, Minrui; Zhang, Tao

doi:10.1007/s10489-024-05986-x

SANet: similarity aggregation and semantic fusion for few-shot semantic segmentation

Published: 10 December 2024

Volume 55, article number 119, (2025)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Minrui Ye^1,2 &
Tao Zhang^1,2

144 Accesses
Explore all metrics

Abstract

Few-shot semantic segmentation (FSS) methods based on meta-learning strategies have shown promise in extracting instance knowledge from support set to infer pixel-wise labels in query set. However, a key challenge in FSS is addressing spatial inconsistency between query image and support image due to intra-class difference and inter-class similarity. Moreover, existing FSS methods often rely on multiple decoding methods for differentiated pixel-wise matching, leading to semantic inconsistency. To tackle these issues, we propose a similarity aggregation network (SANet), which effectively explores visual correspondence between support and query features while aligning semantic dimensions. Specifically, SANet introduces a mask attention module (MAM) to capture spatial relations between non-local attention features from support features and query features. Additionally, a similarity aggregation module (SAM) is proposed, which utilizes the multi-head attention mechanism and combines prior mask to calculate the aggregation similarity between each query pixel and all supporting pixels, thereby focusing the network on foreground areas. Finally, a feature fusion module (FFM) is used to adaptively fuse features at multiple scales and channels for accurate prediction. Extensive experiments on PASCAL-5i and COCO-20i demonstrate the efficiency and competitiveness of SANet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-scale attentional similarity guidance network for few-shot semantic segmentation

Article 25 June 2022

Deep Similarity Fusion Networks for One-Shot Semantic Segmentation

Eliminating Feature Ambiguity for Few-Shot Segmentation

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability and Access

The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3431–3440
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-assisted intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5-9 October, 2015, Proceedings, Part III 18, pp 234–241 . Springer
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 779–788
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2961–2969
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4700–4708
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2117–2125
Shaban A, Bansal S, Liu Z, Essa I, Boots B (2017) One-shot learning for semantic segmentation. arXiv:1709.03410
Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. Adv Neural Inf Process Syst 30
Dong N, Xing EP (2018) Few-shot semantic segmentation with prototype learning. In: BMVC, vol 3, p 4
Lu Z, He S, Zhu X, Zhang L, Song Y-Z, Xiang T (2021) Simpler is better: Few-shot semantic segmentation with classifier weight transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 8741–8750
Wang K, Liew JH, Zou Y, Zhou D, Feng J (2019) Panet: Few-shot image semantic segmentation with prototype alignment. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 9197–9206
Zhang X, Wei Y, Yang Y, Huang TS (2020) Sg-one: Similarity guidance network for one-shot semantic segmentation. IEEE Trans Cybern 50(9):3855–3865
Article MATH Google Scholar
Tian Z, Zhao H, Shu M, Yang Z, Li R, Jia J (2020) Prior guided feature enrichment network for few-shot segmentation. IEEE Trans Patt Anal Mach Intell 44(2):1050–1065
Article MATH Google Scholar
Wang H, Zhang X, Hu Y, Yang Y, Cao X, Zhen X (2020) Few-shot semantic segmentation with democratic attention networks. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August, 2020, Proceedings, Part XIII 16, pp 730–746 . Springer
Zhang G, Kang G, Yang Y, Wei Y (2021) Few-shot segmentation via cycle-consistent transformer. Adv Neural Inf Process Syst 34:21984–21996
Google Scholar
Shi X, Wei D, Zhang Y, Lu D, Ning M, Chen J, Ma K, Zheng Y (2022) Dense cross-query-and-support attention weighted mask aggregation for few-shot segmentation. In: European Conference on Computer Vision, pp 151–168 . Springer
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6-12 September, 2014, Proceedings, Part V 13, pp 740–755 . Springer
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Article MATH Google Scholar
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2881–2890
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv:2010.11929
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 10012–10022
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115:211–252
Article MathSciNet Google Scholar
Yang Y, Chen Q, Feng Y, Huang T (2023) Mianet: aggregating unbiased instance and general information for few-shot semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7131–7140
Zhang C, Lin G, Liu F, Yao R, Shen C (2019) Canet: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5217–5226
Lang C, Cheng G, Tu B, Han J (2022) Learning what not to segment: A new perspective on few-shot segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8057–8067
Liu W, Zhang C, Lin G, Liu F (2020) Crnet: Cross-reference networks for few-shot segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4165–4173
Min J, Kang D, Cho M (2021) Hypercorrelation squeeze for few-shot segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 6941–6952
Sun J, Shen Z, Wang Y, Bao H, Zhou X (2021) Loftr: Detector-free local feature matching with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8922–8931
Cao L, Guo Y, Yuan Y, Jin Q (2022) Prototype as query for few shot semantic segmentation. arXiv:2211.14764
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7794–7803
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3146–3154
Zhuge Y, Shen C (2021) Deep reasoning network for few-shot semantic segmentation. In: Proceedings of the 29th ACM International Conference on Multimedia, pp 5344–5352
Wang J, Chen Y, Dong Z, Gao M (2023) Improved yolov5 network for real-time multi-scale traffic sign detection. Neural Comput Appl 35(10):7853–7865
Article MATH Google Scholar
Iqbal E, Safarov S, Bang S (2022) Msanet: Multi-similarity and attention guidance for boosting few-shot segmentation. arXiv:2206.09667
Yang B, Liu C, Li B, Jiao J, Ye Q (2020) Prototype mixture models for few-shot semantic segmentation. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August, 2020, Proceedings, Part VIII 16, pp 763–778 . Springer
Zhang G, Navasardyan S, Chen L, Zhao Y, Wei Y, Shi H et al (2022) Mask matching transformer for few-shot segmentation. Adv Neural Inf Process Syst 35:823–836
MATH Google Scholar
Xu W, Huang H, Cheng M, Yu L, Wu Q, Zhang J (2023) Masked cross-image encoding for few-shot segmentation. In: 2023 IEEE International Conference on Multimedia and Expo (ICME), pp 744–749. IEEE
Liu J, Bao Y, Xie G-S, Xiong H, Sonke J-J, Gavves E (2022) Dynamic prototype convolution network for few-shot semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11553–11562
Peng B, Tian Z, Wu X, Wang C, Liu S, Su J, Jia J (2023) Hierarchical dense correlation distillation for few-shot segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 23641–23651
Liu H, Peng P, Chen T, Wang Q, Yao Y, Hua X-S (2023) Fecanet: Boosting few-shot semantic segmentation with feature-enhanced context-aware network. IEEE Trans Multimed
Cheng G, Lang C, Han J (2022) Holistic prototype activation for few-shot segmentation. IEEE Trans Pattern Anal Mach Intell 45(4):4650–4666
MATH Google Scholar
Xu Q, Zhao W, Lin G, Long C (2023) Self-calibrated cross attention network for few-shot segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 655–665
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88:303–338
Article Google Scholar
Hariharan B, Arbeláez P, Girshick R, Malik J (2014) Simultaneous detection and segmentation. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6-12 September, 2014, Proceedings, Part VII 13, pp 297–312 . Springer

Download references

Author information

Authors and Affiliations

School of Information Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai, 201210, Shanghai, China
Minrui Ye & Tao Zhang
No.1 Department of Engineering, Shanghai Institute of Technical Physics, Chinese Academy of Sciences, 500 Yutian Road, Shanghai, 200083, Shanghai, China
Minrui Ye & Tao Zhang

Authors

Minrui Ye
View author publications
You can also search for this author in PubMed Google Scholar
Tao Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Minrui Ye: Conceptualization, Methodology, Software, Data curation, Writing - Original draft preparation. Tao Zhang: Supervision, Validation, Writing, Project administration.

Corresponding author

Correspondence to Tao Zhang.

Ethics declarations

Conflict of Interest/Competing Interests

The authors declare that they have no conflicts of interest or competing interests relevant to the content of this manuscript.

Ethics Approval and Consent to Participate

Not applicable.

Consent for Publication

Consent for publication was obtained from all individuals included in this study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ye, M., Zhang, T. SANet: similarity aggregation and semantic fusion for few-shot semantic segmentation. Appl Intell 55, 119 (2025). https://doi.org/10.1007/s10489-024-05986-x

Download citation

Accepted: 30 September 2024
Published: 10 December 2024
DOI: https://doi.org/10.1007/s10489-024-05986-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SANet: similarity aggregation and semantic fusion for few-shot semantic segmentation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-scale attentional similarity guidance network for few-shot semantic segmentation

Deep Similarity Fusion Networks for One-Shot Semantic Segmentation

Eliminating Feature Ambiguity for Few-Shot Segmentation

Data Availability and Access

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interest/Competing Interests

Ethics Approval and Consent to Participate

Consent for Publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

SANet: similarity aggregation and semantic fusion for few-shot semantic segmentation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-scale attentional similarity guidance network for few-shot semantic segmentation

Deep Similarity Fusion Networks for One-Shot Semantic Segmentation

Eliminating Feature Ambiguity for Few-Shot Segmentation

Explore related subjects

Data Availability and Access

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interest/Competing Interests

Ethics Approval and Consent to Participate

Consent for Publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation