Abstract
In this work, we address the problem of video object segmentation (VOS), namely segmenting specific objects throughout a video sequence when given only an annotated first frame. Previous VOS methods based on deep neural networks often solves this problem by fine-tuning the segmentation model in the first frame of the test video sequence, which is time-consuming and can not be well adapted to the current target video. In this paper, we proposed the adaptive online learning for video object segmentation (AOL-VOS), which adaptively optimizes the network parameters and hyperparameters of segmentation model for better predicting the segmentation results. Specifically, we first pre-train the segmentation model with the static video frames and then learn the effective adaptation strategy on the training set by optimizing both network parameters and hyperparameters. In the testing process, we learn how to online adapt the learned segmentation model to the specific testing video sequence and the corresponding future video frames, where the confidence patterns is employed to constrain/guide the implementation of adaptive learning process by fusing both object appearance and motion cue information. Comprehensive evaluations on Davis 16 and SegTrack V2 datasets well demonstrate the significant superiority of our proposed AOL-VOS over other state-of-the-arts for video object segmentation task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Venetianer, P.L., et al.: Video surveillance system employing video primitives. Google Patents, US Patent 9,892,606 (2018)
Serrano, A., Sitzmann, V., Ruiz-Borau, J., Wetzstein, G., Gutierrez, D., Masiz, B.: Movie editing and cognitive event segmentation in virtual reality video. ACM Trans. Graph. 36, 47 (2017)
Li, W., Mahadevan, V., Vasconcelos, N.: Anomaly detection and localization in crowded scenes. IEEE Trans. Pattern Anal. Mach. Intell. 36, 18–32 (2014)
Zhang, Z., Fidler, S., Urtasun, R.: Instance-level segmentation for autonomous driving with deep densely connected MRFs. In: CVPR, pp. 669–677 (2016)
Teichmann, M., Weber, M., Zoellner, M., Cipolla, R., Urtasun, R.: MultiNet: real-time joint semantic reasoning for autonomous driving. In: 2018 IEEE Intelligent Vehicles Symposium (IV), pp. 1013–1020 (2018)
Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: CVPR, pp. 1647–1655 (2017)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)
Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: ICLR, pp. 730–738 (2017)
Hu, Y.-T., Huang, J.-B., Schwing, A.G.: Unsupervised video object segmentation using motion saliency-guided spatio-temporal propagation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 813–830. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_48
Li, S., Seybold, B., Vorobyov, A., Fathi, A., Huang, Q., Jay Kuo, C.-C.: Instance embedding transfer to unsupervised video object segmentation. In: CVPR, pp. 6526–6535 (2018)
Li, S., Seybold, B., Vorobyov, A., Lei, X., Kuo, C.-C.J.: Unsupervised video object segmentation with motion-based bilateral networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 215–231. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_13
Croitoru, I., Bogolin, S.-V., Leordeanu, M.: Unsupervised learning from video to detect foreground objects in single images. In: ICCV, pp. 4345–4353 (2017)
Tsai, Y.-H., Yang, M.-H., Black, M.J.: Video segmentation via object flow. In: CVPR, pp. 3899–3908 (2016)
Xiao, H., Feng, J., Lin, G., Liu, Y., Zhang, M.: MoNet: deep motion exploitation for video object segmentation. In: CVPR, pp. 1140–1148 (2018)
Hu, P., Wang, G., Kong, X., Kuen, J., Tan, Y.-P.: Motion-guided cascaded refinement network for video object segmentation. In: CVPR, pp. 1400–1409 (2018)
Chen, Y., Pont-Tuset, J., Montes, A., Van Gool, L.: Blazingly fast video object segmentation with pixel-wise metric learning. In: CVPR, pp. 1189–1198 (2018)
Cheng, J., Tsai, Y.-H., Wang, S., Yang, M.-H.: SegFlow: joint learning for video object segmentation and optical flow. In: ICCV, pp. 686–695 (2017)
Xu, M., Fan, C., Wang, Y., Ryoo, M.S., Crandall, D.J.: Joint person segmentation and identification in synchronized first- and third-person videos. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 656–672. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_39
Li, X., Loy, C.C.: Video object segmentation with joint re-identification and attention-aware mask propagation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 93–110. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_6
Li, X., Loy, C.C.: Video object segmentation with re-identification. In: The 2017 DAVIS Challenge on Video Object Segmentation - CVPR Workshops (2017)
Wug Oh, S., Lee, J.-Y., Sunkavalli, K., Joo Kim, S.: Fast video object segmentation by reference-guided mask propagation. In: CVPR, pp. 7376–7385 (2018)
Voigtlaender, P., Leibe, B.: Online adaptation of convolutional neural networks for video object segmentation. In: BMVC, pp. 656–672 (2017)
Khoreva, A., Benenson, R., Ilg, E., Brox, T., Schiele, B.: Lucid data dreaming for object tracking. arXiv preprint arXiv:1703.09554 (2017)
Li, F., Kim, T., Humayun, A., Tsai, D., Rehg, J.M.: Video segmentation by tracking many figure-ground segments. In: ICCV, pp. 2192–2199 (2013)
Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: CVPR, pp. 724–732 (2016)
Chen, L.-C., Papandreou, G., Schroff, F., Adam, H.: Rethinking Atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017)
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with Atrous separable convolution for semantic image segmentation. arXiv preprint arXiv:1802.02611 (2018)
Voigtlaender, P., Leibe, B.: Online adaptation of convolutional neural networks for the 2017 DAVIS challenge on video object segmentation. In: The 2017 DAVIS Challenge on Video Object Segmentation - CVPR Workshops (2017)
Caelles, S., Maninis, K.-K., Pont-Tuset, J., Leal-Taixé, L., Cremers, D., Van Gool, L.: One-shot video object segmentation. In: CVPR, pp. 5320–5329 (2017)
Maninis, K.-K., et al.: Video object segmentation without temporal information. IEEE Trans. Pattern Anal. Mach. Intell. 41(6), 1515–1530 (2018)
Perazzi, F., Khoreva, A., Benenson, R., Schiele, B., Sorkine-Hornung, A.: Learning video object segmentation from static images. In: CVPR, pp. 3491–3500 (2017)
Hu, Y.-T., Huang, J.-B., Schwing, A.: MaskRNN: instance level video object segmentation. In: NIPS, pp. 325–334 (2017)
Ci, H., Wang, C., Wang, Y.: Video object segmentation by learning location-sensitive embeddings. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 524–539. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_31
Xiao, H., Kang, B., Liu, Y., Zhang, M., Feng, J.: Online meta adaptation for fast video object segmentation. IEEE Trans. Pattern Anal. Mach. Intell.(2019)
Acknowledgment
This work was supported by the National Natural Science Foundation of China (Nos. 61972204, 61906094, U1713208), Tencent AI Lab Rhino-Bird Focused Research Program (No. JR201922), the Fundamental Research Funds for the Central Universities (Nos. 30918011321 and 30919011232).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Wei, L., Xu, C., Zhang, T. (2019). Adaptive Online Learning for Video Object Segmentation. In: Cui, Z., Pan, J., Zhang, S., Xiao, L., Yang, J. (eds) Intelligence Science and Big Data Engineering. Visual Data Engineering. IScIDE 2019. Lecture Notes in Computer Science(), vol 11935. Springer, Cham. https://doi.org/10.1007/978-3-030-36189-1_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-36189-1_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36188-4
Online ISBN: 978-3-030-36189-1
eBook Packages: Computer ScienceComputer Science (R0)