Skip to main content

Self-supervised Video Object Segmentation Using Motion Feature Compensation

  • Conference paper
  • First Online:
Artificial Neural Networks and Machine Learning – ICANN 2023 (ICANN 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14260))

Included in the following conference series:

  • 998 Accesses

Abstract

Video object segmentation is a popular area of research in computer vision. Traditional models are trained using annotated data, which is both time-consuming and expensive. Training models in unsupervised manner has been proposed as a solution to this issue. However, previous works have focused only on spatial features extracted by self-supervised learning method, without considering the temporal information between frames. In this paper, we propose a new video object segmentation model that utilizes self-supervised learning to extract spatial features, and incorporates a motion feature, extracted from optical flow, as compensation of temporal information for the model, namely motion feature compensation (MFC) model. Additionally, we introduce an attention-based fusion method to merge features from both modalities. Notably, for each video used to train models, we only select two consecutive frames at random to train our model. The dataset Youtube-VOS and DAVIS-2017 are adopted as the training dataset and the validation dataset. The experimental results demonstrate that our approach outperforms previous methods, validating our proposed design. The source code is available at: https://github.com/CVisionProcessing/MFC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Caelles, S., Maninis, K.K., Pont-Tuset, J., Leal-Taixé, L., Cremers, D., Van Gool, L.: One-shot video object segmentation. In: CVPR, pp. 221–230 (2017)

    Google Scholar 

  2. Fan, H., et al.: Multiscale vision transformers. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6804–6815 (2021)

    Google Scholar 

  3. Girisha, R., Murali, S.: Object segmentation from surveillance video sequences. In: 2010 First International Conference on Integrated Intelligent Computing, pp. 146–153 (2010). https://doi.org/10.1109/ICIIC.2010.52

  4. Hou, W., Qin, Z., Xi, X., Lu, X., Yin, Y.: Learning disentangled representation for self-supervised video object segmentation. Neurocomputing 481, 270–280 (2022)

    Article  Google Scholar 

  5. Lai, Z., Lu, E., Xie, W.: Mast: a memory-augmented self-supervised tracker. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google Scholar 

  6. Lai, Z., Xie, W.: Self-supervised learning for video correspondence flow. In: BMVC (2019)

    Google Scholar 

  7. Li, X., Liu, S., De Mello, S., Wang, X., Kautz, J., Yang, M.H.: Joint-task self-supervised learning for temporal correspondence. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  8. Liu, J., Dai, H.N., Zhao, G., Li, B., Zhang, T.: TMVOS: triplet matching for efficient video object segmentation. Signal Process. Image Commun. 107, 116779 (2022)

    Article  Google Scholar 

  9. Lu, X., Wang, W., Shen, J., Tai, Y.W., Crandall, D.J., Hoi, S.C.H.: Learning video object segmentation from unlabeled videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google Scholar 

  10. Oh, S.W., Lee, J.Y., Sunkavalli, K., Kim, S.J.: Fast video object segmentation by reference-guided mask propagation. In: CVPR, pp. 7376–7385 (2018)

    Google Scholar 

  11. Oh, S.W., Lee, J.Y., Xu, N., Kim, S.J.: Video object segmentation using space-time memory networks. In: ICCV, pp. 9226–9235 (2019)

    Google Scholar 

  12. Pont-Tuset, J., Perazzi, F., Caelles, S., Arbeláez, P., Sorkine-Hornung, A., Van Gool, L.: The 2017 Davis challenge on video object segmentation. arXiv:1704.00675 (2017)

  13. Rui, H., Chen, C., Shah, M.: An end-to-end 3D convolutional neural network for action detection and segmentation in videos (2017)

    Google Scholar 

  14. Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440 (2015)

    Google Scholar 

  15. Taggart, R.J.: Point forecasting and forecast evaluation with generalized Huber loss (2021)

    Google Scholar 

  16. Teed, Z., Deng, J.: RAFT: recurrent all-pairs field transforms for optical flow. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 402–419. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_24

    Chapter  Google Scholar 

  17. Vondrick, C., Shrivastava, A., Fathi, A., Guadarrama, S., Murphy, K.: Tracking emerges by colorizing videos. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 391–408 (2018)

    Google Scholar 

  18. Xie, H., Yao, H., Zhou, S., Zhang, S., Sun, W.: Efficient regional memory network for video object segmentation. In: CVPR, pp. 1286–1295 (2021)

    Google Scholar 

  19. Xu, N., et al.: Youtube-vos: a large-scale video object segmentation benchmark. arXiv preprint arXiv:1809.03327 (2018)

  20. Yang, L., Wang, Y., Xiong, X., Yang, J., Katsaggelos, A.K.: Efficient video object segmentation via network modulation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6499–6507 (2018)

    Google Scholar 

  21. Yang, Z., Wei, Y., Yang, Y.: Collaborative video object segmentation by foreground-background integration. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 332–348. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_20

    Chapter  Google Scholar 

  22. Zhang, Z., Fidler, S., Urtasun, R.: Instance-level segmentation for autonomous driving with deep densely connected MRFs. Comput. Sci. (2015)

    Google Scholar 

  23. Zhu, W., Meng, J., Xu, L.: Self-supervised video object segmentation using integration-augmented attention. Neurocomputing 455, 325–339 (2021)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 11627802, 51678249, 61871188).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, T., Li, B. (2023). Self-supervised Video Object Segmentation Using Motion Feature Compensation. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14260. Springer, Cham. https://doi.org/10.1007/978-3-031-44195-0_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-44195-0_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-44194-3

  • Online ISBN: 978-3-031-44195-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics