Skip to main content
Log in

Few-shot video object segmentation with prototype evolution

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

As a challenging task, few-shot video object segmentation attempts to segment objects of novel categories in the video while providing only a few annotated images. Current methods for this task only explore the relationship between support images and target query video ignoring the rich temporal information in the query video itself. To address this problem, we propose a simple yet effective framework named prototype evolution network (PENet) for few-shot video object segmentation in this paper. PENet first adopts a prototype-based structure which efficiently constructs and exploits the correlation between support images and target query video. Then a prototype evolution module is designed to summarize and propagate temporal information through the evolution process of the video prototype. The feature representation adopted by the module is of fixed size and does not increase memory burden as the video frame moves forward. Along with the category prototype extracted from the support set, the global video prototype provides guidance for the current frame segmentation. Additionally, the approach of utilizing the high-level features is introduced as an optional solution that trades a small amount of speed for higher accuracy. Experimental results on the Youtube-VIS dataset of 2019 version and 2021 version demonstrate that our PENet outperforms the previous methods with a sizable margin, validating the superiority of the proposed model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

The Youtube-VIS 2019 and Youtube-VIS 2021 datasets analyzed during the current study are available in the https://youtube-vos.org/dataset/vis/.

References

  1. Fragkiadaki K, Arbelaez P, Felsen P, Malik J (2015) Learning to segment moving objects in videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4083–4090

  2. Tsai Y, Yang M, Black MJ (2016) Video segmentation via object flow. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3899–3908

  3. Wang W, Zhou T, Porikli F, Crandall DJ, Gool LV (2021) A survey on deep learning technique for video segmentation. arXiv preprint arXiv: 2107.01153

  4. Caelles S, Maninis K, Pont-Tuset J, Leal-Taixé L, Cremers D, Gool LV (2017) One-shot video object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5320–5329

  5. Perazzi F, Khoreva A, Benenson R, Schiele B, Sorkine-Hornung A (2017) Learning video object segmentation from static images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3491–3500

  6. Oh SW, Lee J, Xu N, Kim SJ (2019) Video object segmentation using space-time memory networks. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9225–9234

  7. Chen H, Wu H, Zhao N, Ren S, He S (2021) Delving deep into many-to-many attention for few-shot video object segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14040–14049

  8. Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K, Wierstra D (2016) Matching networks for one shot learning. In: Proceedings of advances in neural information processing systems, pp 3630–3638

  9. Snell J, Swersky K, Zemel RS (2017) Prototypical networks for few-shot learning. In: Proccedings of advances in neural information processing systems, pp 4077–4087

  10. Sung F, Yang Y, Zhang L, Xiang T, Torr PHS, Hospedales TM Learning to compare: relation network for few-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1199–1208

  11. Zhang C, Lin G, Liu F, Yao R, Shen C (2019) Canet: class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5217–5226

  12. Wang K, Liew JH, Zou Y, Zhou D, Feng J (2019) Panet: few-shot image semantic segmentation with prototype alignment. In:Proceedings of the IEEE/CVF international conference on computer vision, pp 9196–9205

  13. Liu Y, Zhang X, Zhang S, He X (2020) Part-aware prototype network for few-shot semantic segmentation. In: Proceedings of European conference computer vision, pp 142–158

  14. Liu Y, Zhang X, Zhang S, He X (2020) Part-aware prototype network for few-shot semantic segmentation. In: Proceedings of European conference computer vision, pp 142–158

  15. Tian Z, Zhao H, Shu M, Yang Z, Li R, Jia J (2022) Prior guided feature enrichment network for few-shot segmentation. IEEE Trans Pattern Anal Mach Intell 44(2):1050–1065

    Article  PubMed  Google Scholar 

  16. Mao B, Wang L, Xiang S, Pan C (2021) LTAF-Net: learning task-aware adaptive features and refining mask for few-shot semantic segmentation. In: Proccedings of IEEE international conference on acoustics, speech and signal processing, pp 2320–2324

  17. Zhu L, Yang Y (2022) Label independent memory for semi-supervised few-shot video classification. IEEE Trans Pattern Anal Mach Intell 44(1):273–285

    PubMed  Google Scholar 

  18. Voigtlaender P, Leibe B (2017) Online adaptation of convolutional neural networks for video object segmentation. In: British machine vision conference 2017

  19. Cheng J, Tsai Y, Wang S, Yang M (2017) Segflow: Joint learning for video object segmentation and optical flow. In: Proceedings of the IEEE international conference on computer vision, pp 686–695

  20. Oh SW, Lee J-Y, Sunkavalli K, Kim SJ (2018) Fast video object segmentation by reference-guided mask propagation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7376–7385

  21. Jampani V, Gadde R, Gehler PV (2017) Video propagation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3154–3164

  22. Xiao H, Feng J, Lin G, Liu Y, Zhang M (2018) Monet: deep motion exploitation for video object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1140–1148

  23. Khoreva A, Benenson R, Ilg E, Brox T, Schiele B (2019) Lucid data dreaming for video object segmentation. Int J Comput Vis 127(9):1175–1197

    Article  Google Scholar 

  24. Hu P, Wang G, Kong X, Kuen J, Tan Y (2020) Motion-guided cascaded refinement network for video object segmentation. IEEE Trans Pattern Anal Mach Intell 42(8):1957–1967

    Article  PubMed  Google Scholar 

  25. Gui Y, Tian Y, Zeng D, Xie Z, Cai Y (2020) Reliable and dynamic appearance modeling and label consistency enforcing for fast and coherent video object segmentation with the bilateral grid. IEEE Trans Circuits Syst Video Technol 30(12):4781–4795

    Article  Google Scholar 

  26. Liu W, Lin G, Zhang T, Liu Z (2021) Guided co-segmentation network for fast video object segmentation. IEEE Trans Circuits Syst Video Technol 31(4):1607–1617

    Article  Google Scholar 

  27. Tan Z, Liu B, Chu Q, Zhong H, Wu Y, Li W, Yu N (2021) Real time video object segmentation in compressed domain. IEEE Trans Circuits Syst Video Technol 31(1):175–188

    Article  Google Scholar 

  28. Li Y, Shen Z, Shan Y (2020) Fast video object segmentation using the global context module. In: Proceedings of European conference on computer vision, pp 735–750

  29. Lu X, Wang W, Danelljan M, Zhou T, Shen J, Gool LV (2020) Video object segmentation with episodic graph memory networks. In: Proceedings of European conference on computer vision, pp 661–679

  30. Seong H, Hyun J, Kim E (2020) Kernelized memory network for video object segmentation. In: Proceedings of European conference on computer vision, pp 629–645

  31. Yoon JS, Rameau F, Kim J, Lee S, Shin S, Kweon IS (2017) Pixel-level matching for video object segmentation using convolutional neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 2186–2195

  32. Voigtlaender P, Chai Y, Schroff F, Adam H, Leibe B, Chen L (2019) FEELVOS: fast end-to-end embedding learning for video object segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9481–9490

  33. Wang Z, Xu J, Liu L, Zhu F, Shao L (2019) Ranet: ranking attention network for fast video object segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 3977–3986

  34. Zhang Y, Wu Z, Peng H, Lin S (2020) A transductive approach for video object segmentation. In: Proceedings of the IEEE international conference on computer vision, pp. 6947–6956

  35. Zhu W, Li J, Lu J, Zhou J (2022) Separable structure modeling for semi-supervised video object segmentation. IEEE Trans Circuits Syst Video Technol 32(1):330–344

    Article  Google Scholar 

  36. Shaban A, Bansal S, Liu Z, Essa I, Boots B (2017) One-shot learning for semantic segmentation. In: British machine vision conference 2017

  37. Zhang X, Wei Y, Yang Y, Huang TS (2020) Sg-one: similarity guidance network for one-shot semantic segmentation. IEEE Trans Cybern 50(9):3855–3865

    Article  PubMed  Google Scholar 

  38. Li G, Jampani V, Sevilla-Lara L, Sun D, Kim J, Kim J (2021) Adaptive prototype learning and allocation for few-shot segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8334–8343

  39. Liu W, Zhang C, Lin G, Liu F (2020) Crnet: cross-reference networks for few-shot segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4164–4172

  40. Yang X, Wang B, Zhou X, Chen K, Yi S, Ouyang W, Zhou L (2020) Brinet: towards bridging the intra-class and inter-class gaps in one-shot segmentation. In: British machine vision conference 2020

  41. Zhang C, Lin G, Liu F, Guo J, Wu Q, Yao R (2019) Pyramid graph networks with connection attentions for region-based one-shot semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9586–9594

  42. Wang H, Zhang X, Hu Y, Yang Y, Cao X, Zhen X (2020) Few-shot semantic segmentation with democratic attention networks. In: Proceedings of Europe conference on computer vision, pp 730–746

  43. Boudiaf M, Kervadec H, Ziko IM, Piantanida P, Ayed IB, Dolz J (2021) Few-shot segmentation without meta-learning: A good transductive inference is all you need? In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 13979–13988

  44. Lu Z, He S, Zhu X, Zhang L, Song Y, Xiang T (2021) Simpler is better: Few-shot semantic segmentation with classifier weight transformer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8721–8730

  45. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  46. Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  PubMed  Google Scholar 

  47. Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 315–323

  48. Yang L, Fan Y, Xu N (2019) Video instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5187–5196

  49. Caelles S, Montes A, Maninis K, Chen Y, Gool LV, Perazzi F, Pont-Tuset J (2018) The 2018 DAVIS challenge on video object segmentation. arXiv preprint arXiv:1803.00557

  50. Deng J, Dong W, Socher R, Li L, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 248–255

Download references

Acknowledgements

This research was supported by the National Key Research and Development Program of China under Grant No. 2018AAA0100400, and the National Natural Science Foundation of China under Grants 62071466, 62076242, and 61976208.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shiming Xiang.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mao, B., Liu, X., Shi, L. et al. Few-shot video object segmentation with prototype evolution. Neural Comput & Applic 36, 5367–5382 (2024). https://doi.org/10.1007/s00521-023-09325-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-09325-y

Keywords

Navigation