Abstract
Video object segmentation is a popular area of research in computer vision. Traditional models are trained using annotated data, which is both time-consuming and expensive. Training models in unsupervised manner has been proposed as a solution to this issue. However, previous works have focused only on spatial features extracted by self-supervised learning method, without considering the temporal information between frames. In this paper, we propose a new video object segmentation model that utilizes self-supervised learning to extract spatial features, and incorporates a motion feature, extracted from optical flow, as compensation of temporal information for the model, namely motion feature compensation (MFC) model. Additionally, we introduce an attention-based fusion method to merge features from both modalities. Notably, for each video used to train models, we only select two consecutive frames at random to train our model. The dataset Youtube-VOS and DAVIS-2017 are adopted as the training dataset and the validation dataset. The experimental results demonstrate that our approach outperforms previous methods, validating our proposed design. The source code is available at: https://github.com/CVisionProcessing/MFC.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Caelles, S., Maninis, K.K., Pont-Tuset, J., Leal-Taixé, L., Cremers, D., Van Gool, L.: One-shot video object segmentation. In: CVPR, pp. 221–230 (2017)
Fan, H., et al.: Multiscale vision transformers. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6804–6815 (2021)
Girisha, R., Murali, S.: Object segmentation from surveillance video sequences. In: 2010 First International Conference on Integrated Intelligent Computing, pp. 146–153 (2010). https://doi.org/10.1109/ICIIC.2010.52
Hou, W., Qin, Z., Xi, X., Lu, X., Yin, Y.: Learning disentangled representation for self-supervised video object segmentation. Neurocomputing 481, 270–280 (2022)
Lai, Z., Lu, E., Xie, W.: Mast: a memory-augmented self-supervised tracker. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Lai, Z., Xie, W.: Self-supervised learning for video correspondence flow. In: BMVC (2019)
Li, X., Liu, S., De Mello, S., Wang, X., Kautz, J., Yang, M.H.: Joint-task self-supervised learning for temporal correspondence. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Liu, J., Dai, H.N., Zhao, G., Li, B., Zhang, T.: TMVOS: triplet matching for efficient video object segmentation. Signal Process. Image Commun. 107, 116779 (2022)
Lu, X., Wang, W., Shen, J., Tai, Y.W., Crandall, D.J., Hoi, S.C.H.: Learning video object segmentation from unlabeled videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Oh, S.W., Lee, J.Y., Sunkavalli, K., Kim, S.J.: Fast video object segmentation by reference-guided mask propagation. In: CVPR, pp. 7376–7385 (2018)
Oh, S.W., Lee, J.Y., Xu, N., Kim, S.J.: Video object segmentation using space-time memory networks. In: ICCV, pp. 9226–9235 (2019)
Pont-Tuset, J., Perazzi, F., Caelles, S., Arbeláez, P., Sorkine-Hornung, A., Van Gool, L.: The 2017 Davis challenge on video object segmentation. arXiv:1704.00675 (2017)
Rui, H., Chen, C., Shah, M.: An end-to-end 3D convolutional neural network for action detection and segmentation in videos (2017)
Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440 (2015)
Taggart, R.J.: Point forecasting and forecast evaluation with generalized Huber loss (2021)
Teed, Z., Deng, J.: RAFT: recurrent all-pairs field transforms for optical flow. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 402–419. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_24
Vondrick, C., Shrivastava, A., Fathi, A., Guadarrama, S., Murphy, K.: Tracking emerges by colorizing videos. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 391–408 (2018)
Xie, H., Yao, H., Zhou, S., Zhang, S., Sun, W.: Efficient regional memory network for video object segmentation. In: CVPR, pp. 1286–1295 (2021)
Xu, N., et al.: Youtube-vos: a large-scale video object segmentation benchmark. arXiv preprint arXiv:1809.03327 (2018)
Yang, L., Wang, Y., Xiong, X., Yang, J., Katsaggelos, A.K.: Efficient video object segmentation via network modulation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6499–6507 (2018)
Yang, Z., Wei, Y., Yang, Y.: Collaborative video object segmentation by foreground-background integration. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 332–348. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_20
Zhang, Z., Fidler, S., Urtasun, R.: Instance-level segmentation for autonomous driving with deep densely connected MRFs. Comput. Sci. (2015)
Zhu, W., Meng, J., Xu, L.: Self-supervised video object segmentation using integration-augmented attention. Neurocomputing 455, 325–339 (2021)
Acknowledgments
This work was supported by the National Natural Science Foundation of China (No. 11627802, 51678249, 61871188).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, T., Li, B. (2023). Self-supervised Video Object Segmentation Using Motion Feature Compensation. In: Iliadis, L., Papaleonidas, A., Angelov, P., Jayne, C. (eds) Artificial Neural Networks and Machine Learning – ICANN 2023. ICANN 2023. Lecture Notes in Computer Science, vol 14260. Springer, Cham. https://doi.org/10.1007/978-3-031-44195-0_41
Download citation
DOI: https://doi.org/10.1007/978-3-031-44195-0_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44194-3
Online ISBN: 978-3-031-44195-0
eBook Packages: Computer ScienceComputer Science (R0)