A Semi-supervised Video Object Segmentation Method Based on Adaptive Memory Module

Yang, Shaohua; Luo, Zhiming; Cao, Donglin; Lin, Dazhen; Su, Songzhi; Li, Shaozi

doi:10.1007/978-981-19-4546-5_34

Shaohua Yang¹²,
Zhiming Luo¹²,
Donglin Cao¹²,
Dazhen Lin¹²,
Songzhi Su¹² &
…
Shaozi Li¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1491))

Included in the following conference series:

CCF Conference on Computer Supported Cooperative Work and Social Computing

875 Accesses

Abstract

Video object segmentation has becoming a hot research topic in the computer vision society, with a wide range of applications, such as autonomous driving, video editing, and video surveillance. However, due to the complexity of video data, video object segmentation still faces challenges like occlusion, object appearance changes, and similar objects. Previous methods mainly tackle this task by using the memory module, but the computation cost will linearly increase along with the length of the video. To deal with the issue of the previous memory-based method, we proposed a cascaded semi-supervised video object framework with an adaptive memory module. In addition, we use a cascaded instance tracker to find the object and reduce the image resolutions, and we further use a boundary estimation branch to improve the accuracy. Experimental results on several benchmarks demonstrate the effectiveness and efficiency of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Spatial and Temporal Guidance for Semi-supervised Video Object Segmentation

Design Pseudo Ground Truth with Motion Cue for Unsupervised Video Object Segmentation

Global video object segmentation with spatial constraint module

Article Open access 03 January 2023

References

Caelles, S., Maninis, K.K., Pont-Tuset, J., Leal-Taixé, L., Cremers, D., Van Gool, L.: One-shot video object segmentation. In: Proceedings of CVPR, pp. 222–230 (2017)
Google Scholar
Chen, Y., Pont-Tuset, J., Montes, A., Van Gool, L.: Blazingly fast video object segmentation with pixel-wise metric learning. In: Proceedings of CVPR, pp. 1189–1198 (2018)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of CVPR, pp. 770–778 (2016)
Google Scholar
Hu, Y.T., Huang, J.B., Schwing, A.G.: VideoMatch: matching based video object segmentation. In: Proceedings of ECCV, pp. 54–70 (2018)
Google Scholar
Jain, S.D., Grauman, K.: Supervoxel-consistent foreground propagation in video. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 656–671. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_43
Chapter Google Scholar
Jampani, V., Gadde, R., Gehler, P.V.: Video propagation networks. In: Proceeding of CVPR, pp. 451–461 (2017)
Google Scholar
Johnander, J., Danelljan, M., Brissman, E., Khan, F.S., Felsberg, M.: A generative appearance model for end-to-end video object segmentation. In: Proceedings of CVPR, pp. 8953–8962 (2019)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Proc. NeurIPS 25, 1097–1105 (2012)
Google Scholar
Li, Yu., Shen, Z., Shan, Y.: Fast video object segmentation using the global context module. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12355, pp. 735–750. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58607-2_43
Chapter Google Scholar
Luiten, J., Voigtlaender, P., Leibe, B.: PReMVOS: proposal-generation, refinement and merging for video object segmentation. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11364, pp. 565–580. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20870-7_35
Chapter Google Scholar
Maninis, K.K., et al.: Video object segmentation without temporal information. IEEE TPAMI 41(6), 1515–1530 (2018)
Article Google Scholar
Oh, S.W., Lee, J.Y., Sunkavalli, K., Kim, S.J.: Fast video object segmentation by reference-guided mask propagation. In: Proceedings of CVPR, pp. 7376–7385 (2018)
Google Scholar
Oh, S.W., Lee, J.Y., Xu, N., Kim, S.J.: Video object segmentation using space-time memory networks. In: Proceedings of ICCV, pp. 9226–9235 (2019)
Google Scholar
Perazzi, F., Khoreva, A., Benenson, R., Schiele, B., Sorkine-Hornung, A.: Learning video object segmentation from static images. In: Proceedings of CVPR, pp. 2663–2672 (2017)
Google Scholar
Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of CVPR, pp. 724–732 (2016)
Google Scholar
Perazzi, F., Wang, O., Gross, M., Sorkine-Hornung, A.: Fully connected object proposals for video segmentation. In: Proceedings of ICCV, pp. 3227–3234 (2015)
Google Scholar
Pont-Tuset, J., Perazzi, F., Caelles, S., Arbeláez, P., Sorkine-Hornung, A., Van Gool, L.: The 2017 DAVIS challenge on video object segmentation. arXiv:1704.00675 (2017)
Ren, X., Malik, J.: Tracking as repeated figure/ground segmentation. In: Proceedings of CVPR, pp. 1–8 (2007)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Proceedings of MICCAI, pp. 234–241 (2015)
Google Scholar
Seong, H., Hyun, J., Kim, E.: Kernelized memory network for video object segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 629–645. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_38
Chapter Google Scholar
Shin Yoon, J., Rameau, F., Kim, J., Lee, S., Shin, S., So Kweon, I.: Pixel-level matching for video object segmentation using convolutional neural networks. In: Proceedings of ICCV, pp. 2167–2176 (2017)
Google Scholar
Tsai, Y.H., Yang, M.H., Black, M.J.: Video segmentation via object flow. In: Proceedings of CVPR, pp. 3899–3908 (2016)
Google Scholar
Voigtlaender, P., Chai, Y., Schroff, F., Adam, H., Leibe, B., Chen, L.C.: FEELVOS: fast end-to-end embedding learning for video object segmentation. In: Proceedings of CVPR, pp. 9481–9490 (2019)
Google Scholar
Voigtlaender, P., Leibe, B.: Online adaptation of convolutional neural networks for video object segmentation. arXiv preprint arXiv:1706.09364 (2017)
Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P.H.: Fast online object tracking and segmentation: a unifying approach. In: Proceedings of CVPR, pp. 1328–1338 (2019)
Google Scholar
Wang, Z., et al.: Understanding human activities in videos: a joint action and interaction learning approach. Neurocomputing 321, 216–226 (2018)
Article Google Scholar
Wang, Z., Xu, J., Liu, L., Zhu, F., Shao, L.: RANeT: ranking attention network for fast video object segmentation. In: Proceedings of CVPR, pp. 3978–3987 (2019)
Google Scholar
Wei, J., Wang, S., Wu, Z., Su, C., Huang, Q., Tian, Q.: Label decoupling framework for salient object detection. In: Proceedings of CVPR, pp. 13025–13034 (2020)
Google Scholar
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Chapter Google Scholar
Xu, N., Yang, L., Fan, Y., Yang, J., Yue, D., Liang, Y., Price, B., Cohen, S., Huang, T.: YouTube-VOS: sequence-to-sequence video object segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 603–619. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_36
Chapter Google Scholar
Xu, Y., Wang, Z., Li, Z., Yuan, Y., Yu, G.: SiamFC++: towards robust and accurate visual tracking with target estimation guidelines. In: Proceedings of AAAI, vol. 34, pp. 12549–12556 (2020)
Google Scholar
Yang, Z., Wei, Y., Yang, Y.: Collaborative video object segmentation by foreground-background integration. In: Proceedings of ECCV, pp. 332–348 (2020)
Google Scholar

Download references

Acknowledgement

This work is supported by the National Nature Science Foundation of China (No. 61876159, 61806172, 62076116, U1705286).

Author information

Authors and Affiliations

Department of Artificial Intelligence, Xiamen University, Xiamen, China
Shaohua Yang, Zhiming Luo, Donglin Cao, Dazhen Lin, Songzhi Su & Shaozi Li

Authors

Shaohua Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiming Luo
View author publications
You can also search for this author in PubMed Google Scholar
Donglin Cao
View author publications
You can also search for this author in PubMed Google Scholar
Dazhen Lin
View author publications
You can also search for this author in PubMed Google Scholar
Songzhi Su
View author publications
You can also search for this author in PubMed Google Scholar
Shaozi Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhiming Luo .

Editor information

Editors and Affiliations

Shandong University, Jinan, China
Yuqing Sun
Fudan University, Shanghai, China
Tun Lu
Hunan University of Science and Technology, Xiangtan, China
Buqing Cao
Tongji University, Shanghai, China
Hongfei Fan
Guangdong University of Technology, Guangzhou, China
Dongning Liu
University of Warwick, Coventry, UK
Bowen Du
University of Shanghai for Science and Technology, Shanghai, China
Liping Gao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, S., Luo, Z., Cao, D., Lin, D., Su, S., Li, S. (2022). A Semi-supervised Video Object Segmentation Method Based on Adaptive Memory Module. In: Sun, Y., et al. Computer Supported Cooperative Work and Social Computing. ChineseCSCW 2021. Communications in Computer and Information Science, vol 1491. Springer, Singapore. https://doi.org/10.1007/978-981-19-4546-5_34

Download citation

DOI: https://doi.org/10.1007/978-981-19-4546-5_34
Published: 20 July 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-4545-8
Online ISBN: 978-981-19-4546-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

A Semi-supervised Video Object Segmentation Method Based on Adaptive Memory Module