Skip to main content

Advertisement

Log in

Dual-branch aggregation and edge refinement network for few shot semantic segmentation

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Existing Few-Shot Semantic Segmentation (FSS) methods often focus on extracting semantic information from support images to guide the segmentation of query images, while less attention is paid to exploring the query branch. However, due to the limited number of support images, there exists significant intra-class variance between support and query images. Additionally, relying solely on a single support prototype to guide query segmentation often leads to inaccurate segmentation boundaries in the prediction results, which can affect the model’s performance. In this study, we simultaneously consider the information extraction from both the support and query branches and propose a Dual-branch Aggregation and Edge Refinement (DAER) network for accurate query image segmentation. Specifically, to better explore the information from the query branch, we introduce an Initial Mask Generation Module (IMGM) that generates an initial mask for the query image. Furthermore, we propose a Dual-Branch Aggregation Module (DBAM) that simultaneously captures information from both the support and query branches. Finally, an Edge Refinement Module (ERM) is introduced to integrate more query-specific positional information into the network. Extensive experiments on standard few-shot semantic segmentation benchmarks, including PASCAL-\(5^i\) and COCO-\(20^i\), demonstrate the effectiveness of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data Availability

Data will be made available on request.

References

  1. Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)

    Article  Google Scholar 

  2. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, Massachusetts, USA, pp. 3431–3440 (2015)

  3. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp. 234-241. Springer International Publishing (2015)

  4. Zhao, H., Shi, J., Qi, X., et al.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, USA, pp. 2881–2890 (2017)

  5. Chen, L.C., Papandreou, G., Kokkinos, I., et al.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)

    Article  Google Scholar 

  6. Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: International Conference on Learning Representations (ICLR), San Juan, Puerto Ric (2016)

  7. Vinyals, O., Blundell, C., Lillicrap, T., et al.: Matching networks for one shot learning. Adv. Neural Inf. Process. Syst. 29, 3630–3638 (2016)

    Google Scholar 

  8. Shaban, A., Bansal, S., Liu, Z., et al.: One-shot learning for semantic segmentation. arXiv preprint arXiv:1709.03410 (2017)

  9. Lin, G., Milan A., Shen, C., et al.: Refinenet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, USA,pp. 1925–1934 (2017)

  10. Huang, Z., Wang, X., Huang, L., et al.: Ccnet: criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, South Korea,pp. 603–612 (2019)

  11. Li, H., Eigen D., Dodge, S., et al.: Finding task-relevant features for few-shot learning by category traversal. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, California, USA, pp. 1–10 (2019)

  12. Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. Adv. Neural Inf. Process. Syst. 30 (2017). https://proceedings.neurips.cc/paper_files/paper/2017/file/cb8da6767461f2812ae4290eac7cbc42-Paper.pdf

  13. Sung, F., Yang, Y., Zhang, L., et al.: Learning to compare: relation network for few-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, Utah, USA,pp. 1199–1208 (2018)

  14. Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: International Conference on Machine Learning (ICML), New York City, USA, pp. 1126–1135 (2017)

  15. Jamal, M.A., Qi, G.J.: Task agnostic meta-learning for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, California, USA, pp. 11719–11727 (2019)

  16. Chen, Z., Fu, Y., Chen, K., et al.: Image block augmentation for one-shot learning. Proc. AAAI Conf. Artif. Intell. 33(01), 3379–3386 (2019)

    Google Scholar 

  17. Chen, Z., Fu, Y., Wang, Y.X., et al.: Image deformation meta-networks for one-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, California, USA, pp. 8680–8689 (2019)

  18. Dong, N., Xing, E.P.: Few-shot semantic segmentation with prototype learning. Br. Mach. Vis. Conf. 3, 4 (2018)

    Google Scholar 

  19. Li, G., Jampani, V., Sevilla-Lara, L., et al.: Adaptive prototype learning and allocation for few-shot segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, Tennessee, USA, pp. 8334–8343 (2021)

  20. Wang, K., Liew, J.H., Zou, Y., et al.: Panet: few-shot image semantic segmentation with prototype alignment. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, South Korea, pp. 9197–9206 (2019)

  21. Yang, L., Zhuo, W., Qi, L., et al.: Mining latent classes for few-shot segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Kyoto, Japan, pp. 8721–8730 (2021)

  22. Zhang, X., Wei, Y., Yang, Y., et al.: Sg-one: similarity guidance network for one-shot semantic segmentation. IEEE Trans. Cybern. 50(9), 3855–3865 (2020)

    Article  Google Scholar 

  23. Liu, Y., Zhang, X., Zhang, S., et al.: Part-aware prototype network for few-shot semantic segmentation. In: Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part IX 16, pp. 142–158. Springer International Publishing (2020)

  24. Lang, C., Cheng G., Tu, B., et al.: Learning what not to segment: a new perspective on few-shot segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, Louisiana, USA, pp. 8057–8067 (2022)

  25. Liu, W., Zhang, C., Lin, G., et al.: Crnet: cross-reference networks for few-shot segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, Washington, USA, pp. 4165–4173 (2020)

  26. Zhang, C., Lin, G., Liu, F., et al.: Canet: class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, California, USA, pp. 5217–5226 (2019)

  27. Tian, Z., Zhao, H., Shu, M., et al.: Prior guided feature enrichment network for few-shot segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 44(2), 1050–1065 (2020)

    Article  Google Scholar 

  28. Zhang, C., Lin, G., Liu, F., et al.: Pyramid graph networks with connection attentions for region-based one-shot semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, South Korea, pp. 9587-9595 (2019)

  29. Lu, Z., He, S., Zhu, X., et al.: Simpler is better: few-shot semantic segmentation with classifier weight transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Kyoto, Japan, pp. 8741–8750 (2021)

  30. Yang, Y., Chen, Q., Feng, Y., et al.: MIANet: aggregating unbiased instance and general information for few-shot semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, pp. 7131–7140 (2023)

  31. Fan, Q., Pei, W., Tai, Y.W., et al.: Self-support few-shot semantic segmentation. In: European Conference on Computer Vision (ECCV), Tel Aviv, Israel, pp. 701–719 (2022)

  32. Liu, Y., Liu, N., Yao, X., et al.: Intermediate prototype mining transformer for few-shot semantic segmentation. Adv. Neural Inf. Process. Syst. 35, 38020–38031 (2022)

    Google Scholar 

  33. Liu, Y., Liu, N., Cao, Q., et al.: Learning non-target knowledge for few-shot semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, Louisiana, USA,pp. 11573–11582 ( 2022)

  34. Min, J., Kang, D., Cho, M.: Hypercorrelation squeeze for few-shot segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Kyoto, Japan, pp. 6941–6952 (2021)

  35. Cheng, G., Lang, C., Han, J.: Holistic prototype activation for few-shot segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 45(4), 4650–4666 (2022)

    Google Scholar 

  36. Nguyen, K., Todorovic, S.: Feature weighting and boosting for few-shot segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, South Korea, pp. 622–631 (2019)

  37. Everingham, M., Van Gool, L., Williams, C.-K.-I., et al.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis 88, 303–338 (2010)

    Article  Google Scholar 

  38. Hariharan, B., Arbeláez, P., Bourdev, L., et al.: Semantic contours from inverse detectors. In: International Conference on Computer Vision, pp. 991–998. IEEE (2011)

  39. Lin, T.Y., Maire, M., Belongie, S., et al.: Microsoft coco: common objects in context. In: Computer Vision-ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. 740–755. Springer International Publishing (2014)

  40. Liu, H., Peng, P., Chen, T., et al.: Fecanet: boosting few-shot semantic segmentation with feature-enhanced context-aware network. IEEE Trans. Multimed. 25, 8580–8592 (2023)

    Article  Google Scholar 

  41. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations (ICLR), San Diego, California, USA (2015)

  42. He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition( CVPR), Las Vegas, Nevada, USA, pp. 770–778 (2016)

Download references

Funding

No funding was obtained for this study.

Author information

Authors and Affiliations

Authors

Contributions

Qingosong Tang: Methodology, Investigation, Writing-original draft and review. Yalei Ren: Investigation, Code, Writing-review and editing. Zhanghui Han: Investigation, Data curation, Visualization. Chenyan Bao: Data curation, Code. Yang Liu: Investigation, Writing-review and editing.

Corresponding author

Correspondence to Qingsong Tang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this study.

Additional information

Communicated by Junyu Gao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, Q., Ren, Y., Shan, Z. et al. Dual-branch aggregation and edge refinement network for few shot semantic segmentation. Multimedia Systems 31, 142 (2025). https://doi.org/10.1007/s00530-025-01718-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00530-025-01718-4

Keywords