Skip to main content
Log in

FFNet: Feature Fusion Network for Few-shot Semantic Segmentation

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Semantic segmentation aims at assigning a category label to each pixel in an image. Deep neural networks have achieved many breakthrough research achievements on this task. Nevertheless, there exist two critical bottleneck problems to be solved. First, deep neural networks usually need to be trained on large-scale labeled datasets, which are expensive to obtain or label. Second, traditional semantic segmentation methods are difficult to predict unseen classes after training. To address these problems, few-shot semantic segmentation is proposed, and recent methods have achieved impressive performance. However, many of the existing approaches ignore the semantic correlation between data and fail to generate discriminative features for the semantic segmentation. In this paper, to address the above issue, we propose a feature fusion network (FFNet) for few-shot semantic segmentation to enhance the discriminative ability of the learned data representations. Specifically, a task attention module is devised to learn the semantic correlation between data. Then, a multi-scale feature fusion module is trained to adaptively fuse the contextual information at multiple scale, thus capturing multi-scale object information. To the end, the proposed FFNet experiments conducted on the PASCAL-\(5^i\) and COCO-\(20^i\) datasets demonstrate the superiority of our proposed FFNet and show its advantage over existing approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Farabet C, Couprie C, Najman L, LeCun Y. Learning hierarchical features for scene labeling. IEEE Trans Pattern Anal Mach Intell. 2013;35(8):1915–29.

    Article  Google Scholar 

  2. Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L. Temporal segment networks: Towards good practices for deep action recognition. In: European Conference on Computer Vision. Springer; 2016. p. 20–36.

  3. Aksoy Y, Oh TH, Paris S, Pollefeys M, Matusik W. Semantic soft segmentation. ACM Trans Graph (TOG). 2018;37(4):1–13.

    Article  Google Scholar 

  4. Pham DL, Xu C, Prince JL. Current methods in medical image segmentation. Annu Rev Biomed Eng. 2000;2(1):315–37.

    Article  Google Scholar 

  5. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. p. 3431–3440.

  6. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL. Semantic image segmentation with deep convolutional nets and fully connected CRFS. In: ICLR (Poster). 2015.

  7. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. IEEE Trans Pattern Anal Mach Intell. 2017;40(4):834–48.

    Article  Google Scholar 

  8. Chen LC, Papandreou G, Schroff F, Adam H. Rethinking atrous convolution for semantic image segmentation. CoRR abs/1706.05587. 2017.

  9. Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention. Springer; 2015. p. 234–241.

  10. Zhao H, Shi J, Qi X, Wang X, Jia J. Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. p. 2881–2890.

  11. Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122. 2015.

  12. Wang X, Girshick R, Gupta A, He K. Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018. p. 7794–7803.

  13. Dai J, He K, Sun J. Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision. 2015. p. 1635–1643.

  14. Hung WC, Tsai YH, Liou YT, Lin YY, Yang MH. Adversarial learning for semi-supervised semantic segmentation. arXiv preprint arXiv:1802.07934. 2018.

  15. Ma T, Zhang A. Affinitynet: semi-supervised few-shot learning for disease type prediction. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33. 2019. p. 1069–1076.

  16. Shaban A, Bansal S, Liu Z, Essa I, Boots B. One-shot learning for semantic segmentation. arXiv preprint arXiv:1709.03410. 2017.

  17. Dong N, Xing EP. Few-shot semantic segmentation with prototype learning. In: BMVC, vol. 3. 2018.

  18. Rakelly K, Shelhamer E, Darrell T, Efros A, Levine S. Conditional networks for few-shot semantic segmentation. 2018.

  19. Wang K, Liew JH, Zou Y, Zhou D, Feng J. Panet: Few-shot image semantic segmentation with prototype alignment. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019. p. 9197–9206.

  20. Zhang C, Lin G, Liu F, Yao R, Shen C. Canet: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019. p. 5217–5226.

  21. Zhang X, Wei Y, Yang Y, Huang TS. SG-One: Similarity guidance network for one-shot semantic segmentation. IEEE Transactions on Cybernetics. 2020;50(9):3855–65.

    Article  Google Scholar 

  22. Zhang H, Dana K, Shi J, Zhang Z, Wang X, Tyagi A, Agrawal A. Context encoding for semantic segmentation. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2018. p. 7151–7160.

  23. Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70. 2017. p. 1126–1135.

  24. Munkhdalai T, Yu H. Meta networks. 2017. p. 2554–2563.

  25. Snell J, Swersky K, Zemel R. Prototypical networks for few-shot learning. In: Advances in Neural Information Processing Systems. 2017. p. 4077–4087.

  26. Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K, Wierstra D. Matching networks for one shot learning. arXiv preprint arXiv:1606.04080. 2016.

  27. Yang G, Huang K, Zhang R, Goulermas JY, Hussain A. Inductive generalized zero-shot learning with adversarial relation network. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer; 2020. p. 724–739.

  28. Rusu AA, Rao D, Sygnowski J, Vinyals O, Pascanu R, Osindero S, Hadsell R. Meta-learning with latent embedding optimization. arXiv preprint arXiv:1807.05960. 2018.

  29. Nguyen K, Todorovic S. Feature weighting and boosting for few-shot segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019. p. 622–631.

  30. Yang B, Liu C, Li B, Jiao J, Ye Q. Prototype mixture models for few-shot semantic segmentation. In: European Conference on Computer Vision. Springer; 2020. p. 763–778.

  31. Moon TK. The expectation-maximization algorithm. IEEE Signal Process Mag. 1996;13(6):47–60.

    Article  Google Scholar 

  32. Pambala AK, Dutta T, Biswas S. SML: Semantic meta-learning for few-shot semantic segmentation? 2021.

  33. Koch G, Zemel R, Salakhutdinov R. Siamese neural networks for one-shot image recognition. In: ICML Deep Learning Workshop, vol. 2. Lille; 2015.

  34. Lifchitz Y, Avrithis Y, Picard S, Bursuc A. Dense classification and implanting for few-shot learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019. p. 9258–9267.

  35. Hariharan B, Arbeláez P, Girshick R, Malik J. Simultaneous detection and segmentation. In: European Conference on Computer Vision. Springer; 2014. p. 297–312.

  36. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL: Microsoft coco: Common objects in context. In: European Conference on Computer Vision. Springer; 2014. p. 740–755.

  37. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. 2014.

  38. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. p. 770–778.

  39. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems. 2012. p. 1097–1105.

  40. Siam M, Oreshkin B. Adaptive masked weight imprinting for few-shot segmentation. 2019.

  41. Zhu K, Zhai W, Zha ZJ, Cao Y. Self-supervised tuning for few-shot segmentation. In: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence. 2020. p. 1019–1025.

  42. Zhang C, Lin G, Liu F, Guo J, Wu Q, Yao R. Pyramid graph networks with connection attentions for region-based one-shot semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019. p. 9587–9595

  43. Liu J, Qin Y. Prototype refinement network for few-shot segmentation. arXiv preprint arXiv:2002.03579. 2020.

  44. Hu T, Yang P, Zhang C, Yu G, Mu Y, Snoek CG. Attention-based multi-context guiding for few-shot semantic segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33. 2019. p. 8441–8448.

Download references

Funding

This work was supported by the National Key Research and Development Program of China under Grant No. 2018AAA0100400, the Joint Fund of the Equipments Pre-Research and Ministry of Education of China under Grant No. 6141A020337, the Natural Science Foundation of Shandong Province under Grant No. ZR2020MF131 and the Science and Technology Program of Qingdao under Grant No. 21-1-4-ny-19-nsh.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guoqiang Zhong.

Ethics declarations

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed Consent

Informed consent was obtained from all individual participants included in the study.

Conflicts of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, YN., Tian, X. & Zhong, G. FFNet: Feature Fusion Network for Few-shot Semantic Segmentation. Cogn Comput 14, 875–886 (2022). https://doi.org/10.1007/s12559-021-09990-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-021-09990-y

Keywords

Navigation