Abstract
As a challenging task, few-shot video object segmentation attempts to segment objects of novel categories in the video while providing only a few annotated images. Current methods for this task only explore the relationship between support images and target query video ignoring the rich temporal information in the query video itself. To address this problem, we propose a simple yet effective framework named prototype evolution network (PENet) for few-shot video object segmentation in this paper. PENet first adopts a prototype-based structure which efficiently constructs and exploits the correlation between support images and target query video. Then a prototype evolution module is designed to summarize and propagate temporal information through the evolution process of the video prototype. The feature representation adopted by the module is of fixed size and does not increase memory burden as the video frame moves forward. Along with the category prototype extracted from the support set, the global video prototype provides guidance for the current frame segmentation. Additionally, the approach of utilizing the high-level features is introduced as an optional solution that trades a small amount of speed for higher accuracy. Experimental results on the Youtube-VIS dataset of 2019 version and 2021 version demonstrate that our PENet outperforms the previous methods with a sizable margin, validating the superiority of the proposed model.
Similar content being viewed by others
Data availability
The Youtube-VIS 2019 and Youtube-VIS 2021 datasets analyzed during the current study are available in the https://youtube-vos.org/dataset/vis/.
References
Fragkiadaki K, Arbelaez P, Felsen P, Malik J (2015) Learning to segment moving objects in videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4083–4090
Tsai Y, Yang M, Black MJ (2016) Video segmentation via object flow. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3899–3908
Wang W, Zhou T, Porikli F, Crandall DJ, Gool LV (2021) A survey on deep learning technique for video segmentation. arXiv preprint arXiv: 2107.01153
Caelles S, Maninis K, Pont-Tuset J, Leal-Taixé L, Cremers D, Gool LV (2017) One-shot video object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5320–5329
Perazzi F, Khoreva A, Benenson R, Schiele B, Sorkine-Hornung A (2017) Learning video object segmentation from static images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3491–3500
Oh SW, Lee J, Xu N, Kim SJ (2019) Video object segmentation using space-time memory networks. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9225–9234
Chen H, Wu H, Zhao N, Ren S, He S (2021) Delving deep into many-to-many attention for few-shot video object segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 14040–14049
Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K, Wierstra D (2016) Matching networks for one shot learning. In: Proceedings of advances in neural information processing systems, pp 3630–3638
Snell J, Swersky K, Zemel RS (2017) Prototypical networks for few-shot learning. In: Proccedings of advances in neural information processing systems, pp 4077–4087
Sung F, Yang Y, Zhang L, Xiang T, Torr PHS, Hospedales TM Learning to compare: relation network for few-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1199–1208
Zhang C, Lin G, Liu F, Yao R, Shen C (2019) Canet: class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5217–5226
Wang K, Liew JH, Zou Y, Zhou D, Feng J (2019) Panet: few-shot image semantic segmentation with prototype alignment. In:Proceedings of the IEEE/CVF international conference on computer vision, pp 9196–9205
Liu Y, Zhang X, Zhang S, He X (2020) Part-aware prototype network for few-shot semantic segmentation. In: Proceedings of European conference computer vision, pp 142–158
Liu Y, Zhang X, Zhang S, He X (2020) Part-aware prototype network for few-shot semantic segmentation. In: Proceedings of European conference computer vision, pp 142–158
Tian Z, Zhao H, Shu M, Yang Z, Li R, Jia J (2022) Prior guided feature enrichment network for few-shot segmentation. IEEE Trans Pattern Anal Mach Intell 44(2):1050–1065
Mao B, Wang L, Xiang S, Pan C (2021) LTAF-Net: learning task-aware adaptive features and refining mask for few-shot semantic segmentation. In: Proccedings of IEEE international conference on acoustics, speech and signal processing, pp 2320–2324
Zhu L, Yang Y (2022) Label independent memory for semi-supervised few-shot video classification. IEEE Trans Pattern Anal Mach Intell 44(1):273–285
Voigtlaender P, Leibe B (2017) Online adaptation of convolutional neural networks for video object segmentation. In: British machine vision conference 2017
Cheng J, Tsai Y, Wang S, Yang M (2017) Segflow: Joint learning for video object segmentation and optical flow. In: Proceedings of the IEEE international conference on computer vision, pp 686–695
Oh SW, Lee J-Y, Sunkavalli K, Kim SJ (2018) Fast video object segmentation by reference-guided mask propagation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7376–7385
Jampani V, Gadde R, Gehler PV (2017) Video propagation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3154–3164
Xiao H, Feng J, Lin G, Liu Y, Zhang M (2018) Monet: deep motion exploitation for video object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1140–1148
Khoreva A, Benenson R, Ilg E, Brox T, Schiele B (2019) Lucid data dreaming for video object segmentation. Int J Comput Vis 127(9):1175–1197
Hu P, Wang G, Kong X, Kuen J, Tan Y (2020) Motion-guided cascaded refinement network for video object segmentation. IEEE Trans Pattern Anal Mach Intell 42(8):1957–1967
Gui Y, Tian Y, Zeng D, Xie Z, Cai Y (2020) Reliable and dynamic appearance modeling and label consistency enforcing for fast and coherent video object segmentation with the bilateral grid. IEEE Trans Circuits Syst Video Technol 30(12):4781–4795
Liu W, Lin G, Zhang T, Liu Z (2021) Guided co-segmentation network for fast video object segmentation. IEEE Trans Circuits Syst Video Technol 31(4):1607–1617
Tan Z, Liu B, Chu Q, Zhong H, Wu Y, Li W, Yu N (2021) Real time video object segmentation in compressed domain. IEEE Trans Circuits Syst Video Technol 31(1):175–188
Li Y, Shen Z, Shan Y (2020) Fast video object segmentation using the global context module. In: Proceedings of European conference on computer vision, pp 735–750
Lu X, Wang W, Danelljan M, Zhou T, Shen J, Gool LV (2020) Video object segmentation with episodic graph memory networks. In: Proceedings of European conference on computer vision, pp 661–679
Seong H, Hyun J, Kim E (2020) Kernelized memory network for video object segmentation. In: Proceedings of European conference on computer vision, pp 629–645
Yoon JS, Rameau F, Kim J, Lee S, Shin S, Kweon IS (2017) Pixel-level matching for video object segmentation using convolutional neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 2186–2195
Voigtlaender P, Chai Y, Schroff F, Adam H, Leibe B, Chen L (2019) FEELVOS: fast end-to-end embedding learning for video object segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9481–9490
Wang Z, Xu J, Liu L, Zhu F, Shao L (2019) Ranet: ranking attention network for fast video object segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 3977–3986
Zhang Y, Wu Z, Peng H, Lin S (2020) A transductive approach for video object segmentation. In: Proceedings of the IEEE international conference on computer vision, pp. 6947–6956
Zhu W, Li J, Lu J, Zhou J (2022) Separable structure modeling for semi-supervised video object segmentation. IEEE Trans Circuits Syst Video Technol 32(1):330–344
Shaban A, Bansal S, Liu Z, Essa I, Boots B (2017) One-shot learning for semantic segmentation. In: British machine vision conference 2017
Zhang X, Wei Y, Yang Y, Huang TS (2020) Sg-one: similarity guidance network for one-shot semantic segmentation. IEEE Trans Cybern 50(9):3855–3865
Li G, Jampani V, Sevilla-Lara L, Sun D, Kim J, Kim J (2021) Adaptive prototype learning and allocation for few-shot segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8334–8343
Liu W, Zhang C, Lin G, Liu F (2020) Crnet: cross-reference networks for few-shot segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4164–4172
Yang X, Wang B, Zhou X, Chen K, Yi S, Ouyang W, Zhou L (2020) Brinet: towards bridging the intra-class and inter-class gaps in one-shot segmentation. In: British machine vision conference 2020
Zhang C, Lin G, Liu F, Guo J, Wu Q, Yao R (2019) Pyramid graph networks with connection attentions for region-based one-shot semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9586–9594
Wang H, Zhang X, Hu Y, Yang Y, Cao X, Zhen X (2020) Few-shot semantic segmentation with democratic attention networks. In: Proceedings of Europe conference on computer vision, pp 730–746
Boudiaf M, Kervadec H, Ziko IM, Piantanida P, Ayed IB, Dolz J (2021) Few-shot segmentation without meta-learning: A good transductive inference is all you need? In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 13979–13988
Lu Z, He S, Zhu X, Zhang L, Song Y, Xiang T (2021) Simpler is better: Few-shot semantic segmentation with classifier weight transformer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8721–8730
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 315–323
Yang L, Fan Y, Xu N (2019) Video instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5187–5196
Caelles S, Montes A, Maninis K, Chen Y, Gool LV, Perazzi F, Pont-Tuset J (2018) The 2018 DAVIS challenge on video object segmentation. arXiv preprint arXiv:1803.00557
Deng J, Dong W, Socher R, Li L, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 248–255
Acknowledgements
This research was supported by the National Key Research and Development Program of China under Grant No. 2018AAA0100400, and the National Natural Science Foundation of China under Grants 62071466, 62076242, and 61976208.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mao, B., Liu, X., Shi, L. et al. Few-shot video object segmentation with prototype evolution. Neural Comput & Applic 36, 5367–5382 (2024). https://doi.org/10.1007/s00521-023-09325-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-09325-y