Fast target-aware learning for few-shot video object segmentation

Chen, Yadang; Hao, Chuanyan; Yang, Zhi-Xin; Wu, Enhua

doi:10.1007/s11432-021-3396-7

Fast target-aware learning for few-shot video object segmentation

Research Paper
Published: 27 July 2022

Volume 65, article number 182104, (2022)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Yadang Chen¹,
Chuanyan Hao²,
Zhi-Xin Yang³ &
…
Enhua Wu^4,5

253 Accesses
Explore all metrics

Abstract

Few-shot video object segmentation (FSVOS) aims to segment a specific object throughout a video sequence when only the first-frame annotation is given. In this study, we develop a fast target-aware learning approach for FSVOS, where the proposed approach adapts to new video sequences from its first-frame annotation through a lightweight procedure. The proposed network comprises two models. First, the meta knowledge model learns the general semantic features for the input video image and up-samples the coarse predicted mask to the original image size. Second, the target model adapts quickly from the limited support set. Concretely, during the online inference for testing the video, we first employ fast optimization techniques to train a powerful target model by minimizing the segmentation error in the first frame and then use it to predict the subsequent frames. During the offline training, we use a bilevel-optimization strategy to mimic the full testing procedure to train the meta knowledge model across multiple video sequences. The proposed method is trained only on an individual public video object segmentation (VOS) benchmark without additional training sets and compared favorably with state-of-the-art methods on DAVIS-2017, with a ${\cal J} \& {\cal F}$ overall score of 71.6%, and on YouTubeVOS-2018, with a ${\cal J} \& {\cal F}$ overall score of 75.4%. Meanwhile, a high inference speed of approximately 0.13 s per frame is maintained.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning What to Learn for Video Object Segmentation

Adaptive Online Learning for Video Object Segmentation

Semi-supervised one-shot learning for video object segmentation in dynamic environments

Article 04 January 2025

References

Wu W M, Wang Q, Yuan C Z, et al. Rapid dynamical pattern recognition for sampling sequences. Sci China Inf Sci, 2021, 64: 132201
Article MathSciNet Google Scholar
Gu Y F, Liu H, Wang T F, et al. Deep feature extraction and motion representation for satellite video scene classification. Sci China Inf Sci, 2020, 63: 140307
Article Google Scholar
Chen Y D, Hao C Y, Wu W, et al. Robust dense reconstruction by range merging based on confidence estimation. Sci China Inf Sci, 2016, 59: 092103
Article Google Scholar
Perazzi F, Khoreva A, Benenson R, et al. Learning video object segmentation from static images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017
Caelles S, Maninis K K, Pont-Tuset J, et al. One-shot video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017. 5320–5329
Lu X K, Wang W G, Shen J B, et al. Learning video object segmentation from unlabeled videos. In: Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition, 2020. 8957–8967
Luiten J, Voigtlaender P, Leibe B. PReMVOS: proposal-generation, refinement and merging for video object segmentation. In: Proceedings of the 2018 DAVIS Challenge on Video Object Segmentation-CVPR Workshops, 2018
Maninis K K, Caelles S, Chen Y, et al. Video object segmentation without temporal information. IEEE Trans Pattern Anal Mach Intell, 2019, 41: 1515–1530
Article Google Scholar
Khoreva A, Benenson R, Ilg E, et al. Lucid data dreaming for video object segmentation. Int J Comput Vis, 2019, 127: 1175–1197
Article Google Scholar
Oh S W, Lee J, Sunkavalli K, et al. Fast video object segmentation by reference-guided mask propagation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 7376–7385
Xiao H, Feng J, Lin G, et al. MoNet: deep motion exploitation for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 1140–1148
Johnander J, Danelljan M, Brissman E, et al. A generative appearance model for end-to-end video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 8945–8954
Xie H Z, Yao H X, Zhou S C, et al. Efficient regional memory network for video object segmentation. 2021. ArXiv:2103.12934
Hu Y T, Huang J B, Schwing A G. VideoMatch: matching based video object segmentation. In: Proceedings of the 2018 European Conference on Computer Vision, 2018
Voigtlaender P, Chai Y, Schroff F, et al. FEELVOS: fast end-to-end embedding learning for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 9473–9482
Lin H, Qi X, Jia J. AGSS-VOS: attention guided single-shot video object segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, 2019. 3948–3956
Yang Z X, Wei Y C, Yang Y. Collaborative video object segmentation by foreground-background integration. In: Proceedings of the European Conference on Computer Vision, 2020
Vaswani A, Shazeera N, Parmar N, et al. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017. 6000–6010
Oh S W, Lee J, Xu N, et al. Video object segmentation using space-time memory networks. In: Proceedings of the IEEE International Conference on Computer Vision, 2019. 9225–9234
Li Y, Shen Z R, Shan Y. Fast video object segmentation using the global context module. In: Proceedings of the European Conference on Computer Vision, 2020
Liang Y Q, Li X, Jafari N, et al. Video object segmentation with adaptive feature bank and uncertain-region refinement. In: Proceedings of the Conference on Neural Information Processing Systems, 2020
Wang H C, Jiang X L, Ren H B, et al. SwiftNet: real-time video object segmentation. 2021. ArXiv:2102.04604
Hu L, Zhang P, Zhang B, et al. Learning position and target consistency for memory-based video object segmentation. 2021. ArXiv:2104.04329
Duke B, Ahmed A, Wolf C, et al. SSTVOS: sparse spatiotemporal transformers for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
Chen Y D, Hao C Y, Liu A X, et al. Multilevel model for video object segmentation based on supervision optimization. IEEE Trans Multimedia, 2019, 21: 1934–1945
Article Google Scholar
Hao C Y, Chen Y D, Yang Z X, et al. Higher-order potentials for video object segmentation in bilateral space. Neurocomputing, 2020, 401: 28–35
Article Google Scholar
Chen Y D, Hao C Y, Liu A X, et al. Appearance-consistent video object segmentation based on a multinomial event model. ACM Trans Multimedia Comput Commun Appl, 2019, 15: 1–15
Google Scholar
Pont-Tuset J, Perazzi F, Caelles S, et al. The 2017 DAVIS challenge on video object segmentation. 2017. ArXiv:1704.00675
Xu N, Yang L J, Fan Y C, et al. YouTube-VOS: a large-scale video object segmentation benchmark. 2018. ArXiv:1809.03327
Voigtlaender P, Leibe B. Online adaptation of convolutional neural networks for video object segmentation. In: Proceedings of the British Machine Vision Conference, 2017
Li X X, Loy C C. Video object segmentation with joint re-identification and attention-aware mask propagation. In: Proceedings of the European Conference on Computer Vision, 2018
Griffin B A, Corso J J. BubbleNets: learning to select the guidance frame in video object segmentation by deep sorting frames. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 8906–8915
Tian Z, He T, Shen C. Decoders matter for semantic segmentation: data-dependent decoding enables flexible feature aggregation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019. 3121–3130
Bao L C, Wu B Y, Liu W. CNN in MRF: video object segmentation via inference in a CNN-based higher-order spatio-temporal MRF. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018
Zhang Y, Wu Z, Peng H, et al. A transductive approach for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020. 6947–6956
Zhang K H, Wang L, Liu D, et al. Dual temporal memory network for efficient video object segmentation. 2020. ArXiv:2003.06125
Chen Y, Pont-Tuset J, Montes A, et al. Blazingly fast video object segmentation with pixel-wise metric learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 1189–1198
Hospedales T, Antoniou A, Micaelli P, et al. Meta-learning in neural networks: a survey. 2020. ArXiv:2004.05439
Yang L, Wang Y, Xiong X, et al. Efficient video object segmentation via network modulation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 6499–6507
Tang L L, Chen K, Wu C, et al. Improving semantic analysis on point clouds via auxiliary supervision of local geometric priors. IEEE Trans Cybern, 2020, 12: 1–11
Google Scholar
Robinson A, Lawin A J, Danelljan M, et al. Learning fast and robust target models for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020. 7404–7413
Bhat G, Lawin F G, Danelljan M, et al. Learning what to learn for video object segmentation. In: Proceedings of the European Conference on Computer Vision, 2020
Behl H S, Najafi M, Arnab A, et al. Meta learning deep visual words for fast video object segmentation. In: Proceedings of the Conference on Neural Information Processing Systems Machine Learning for Autonomous Driving Workshop, 2019
Pinheiro P, Lin T Y, Collobert R, et al. Learning to refine object segments. In: Proceedings of the European Conference on Computer Vision, 2016. 75–91
Chen L C, Papandreou G, Kokkinos I, et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 834–848
Article Google Scholar
He K M, Zhang X, Ren S Q, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 770–778
Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the Machine Learning Research, 2017. 1126–1135
He K M, Zhang X, Ren S Q, et al. Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE International Conference on Computer Vision, 2015. 1026–1034

Download references

Acknowledgements

This work was partially supported by National Natural Science Foundation of China (Grant Nos. 62072449, 61802197), Science and Technology Development Fund, Macao SAR (Grant Nos. 0018/2019/AKP, SKL-IOTSC(UM)-2021-2023), Guangdong Science and Technology Department (Grant No. 2018B030324002), and Zhuhai Science and Technology Innovation Bureau Zhuhai-Hong Kong-Macau Special Cooperation Project (Grant No. ZH22017002200001PWC).

Author information

Authors and Affiliations

Engineering Research Center of Digital Forensics, Ministry of Education, School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing, 210044, China
Yadang Chen
School of Education Science and Technology, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
Chuanyan Hao
State Key Laboratory of Internet of Things for Smart City, Department of Electromechanical Engineering, University of Macau, Macao, 999078, China
Zhi-Xin Yang
State Key Laboratory of Computer Science, Institute of Software, University of Chinese Academy of Sciences, Beijing, 100190, China
Enhua Wu
Faculty of Science and Technology, University of Macau, Macao, 999078, China
Enhua Wu

Authors

Yadang Chen
View author publications
You can also search for this author inPubMed Google Scholar
Chuanyan Hao
View author publications
You can also search for this author inPubMed Google Scholar
Zhi-Xin Yang
View author publications
You can also search for this author inPubMed Google Scholar
Enhua Wu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Zhi-Xin Yang.

Additional information

Supporting information

Appendixes A—C. The supporting information is available online at www.info.scichina.com and www.link.springer.com. The supporting materials are published as submitted, without typesetting or editing. The responsibility for scientific accuracy and content remains entirely with the authors.

Supplementary File