Skip to main content

Video Object Segmentation Based onĀ Guided Feature Transfer Learning

  • Conference paper
  • First Online:
Frontiers of Computer Vision (IW-FCV 2022)

Abstract

Video Object Segmentation (VOS) is a fundamental task with many real-world computer vision applications and challenging due to available distractors and background clutter. Many existing online learning approaches have limited practical significance because of high computational cost required to fine-tune network parameters. Moreover, matching based and propagation approaches are computationally efficient but may suffer from degraded performance in cluttered backgrounds and object drifts. In order to handle these issues, we propose an offline end-to-end model to learn guided feature transfer for VOS. We introduce guided feature modulation based on target mask to capture the video context information and a generative appearance model is used to provide cues for both the target and the background. Proposed guided feature modulation system learns the target semantic information based on modulation activations. Generative appearance model learns the probability of a pixel to be target or background. In addition, low-resolution features from deeper networks may not capture the global contextual information and may reduce the performance during feature refinement. Therefore, we also propose a guided pooled decoder to learn the global as well as local context information for better feature refinement. Evaluation over two VOS benchmark datasets including DAVIS2016 and DAVIS2017 have shown excellent performance of the proposed framework compared to more than 20 existing state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bao, L., Wu, B., Liu, W.: CNN in MRF: video object segmentation via inference in a CNN-based higher-order spatio-temporal MRF. In: CVPR, pp. 5977ā€“5986 (2018)

    Google ScholarĀ 

  2. Caelles, S., et al.: One-shot video object segmentation. In: CVPR, pp. 221ā€“230 (2017)

    Google ScholarĀ 

  3. Caelles, S., et al.: Fast video object segmentation with spatio-temporal GANs. arXiv preprint arXiv:1903.12161 (2019)

  4. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834ā€“848 (2017)

    ArticleĀ  Google ScholarĀ 

  5. Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: ECCV, pp. 801ā€“818 (2018)

    Google ScholarĀ 

  6. Chen, Y., et al.: Blazingly fast video object segmentation with pixel-wise metric learning. In: CVPR, pp. 1189ā€“1198 (2018)

    Google ScholarĀ 

  7. Cheng, J., et al.: SegFlow: Joint learning for video object segmentation and optical flow. In: ICCV, pp. 686ā€“695 (2017)

    Google ScholarĀ 

  8. Cheng, J., et al.: Fast and accurate online video object segmentation via tracking parts. In: CVPR, pp. 7415ā€“7424 (2018)

    Google ScholarĀ 

  9. Ci, H., Wang, C., Wang, Y.: Video object segmentation by learning location-sensitive embeddings. In: ECCV, pp. 501ā€“516 (2018)

    Google ScholarĀ 

  10. De Vries, H., Strub, F., Mary, J., Larochelle, H., Pietquin, O., Courville, A.C.: Modulating early visual processing by language. In: Advances in Neural Information Processing Systems, pp. 6594ā€“6604 (2017)

    Google ScholarĀ 

  11. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR, pp. 248ā€“255. IEEE (2009)

    Google ScholarĀ 

  12. Fiaz, M., Mahmood, A., Baek, K.Y., Farooq, S.S., Jung, S.K.: Improving object tracking by added noise and channel attention. Sensors 20(13), 3780 (2020)

    ArticleĀ  Google ScholarĀ 

  13. Fiaz, M., Mahmood, A., Javed, S., Jung, S.K.: Handcrafted and deep trackers: Recent visual object tracking approaches and trends. ACM Comput. Surv. (CSUR) 52(2), 1ā€“44 (2019)

    ArticleĀ  Google ScholarĀ 

  14. Fiaz, M., Mahmood, A., Jung, S.K.: Learning soft mask based feature fusion with channel and spatial attention for robust visual object tracking. Sensors 20(14), 4021 (2020)

    ArticleĀ  Google ScholarĀ 

  15. Fiaz, M., Mahmood, A., Jung, S.K.: Video object segmentation using guided feature and directional deep appearance learning. In: Proceedings of the 2020 DAVIS Challenge on Video Object Segmentation-CVPR, Workshops, Seattle, WA, USA, vol. 19 (2020)

    Google ScholarĀ 

  16. Fiaz, M., et al.: Adaptive feature selection Siamese networks for visual tracking. In: Ohyama, W., Jung, S.K. (eds.) IW-FCV 2020. CCIS, vol. 1212, pp. 167ā€“179. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-4818-5_13

  17. Fiaz, M., Zaheer, M.Z., Mahmood, A., Lee, S.I., Jung, S.K.: 4G-VOS: video object segmentation using guided context embedding. Knowl. Based Syst. 231, 107401 (2021)

    Google ScholarĀ 

  18. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770ā€“778 (2016)

    Google ScholarĀ 

  19. Hu, Y.T., Huang, J.B., Schwing, A.G.: Videomatch: Matching based video object segmentation. In: ECCV, pp. 54ā€“70 (2018)

    Google ScholarĀ 

  20. Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: ICCV, pp. 1501ā€“1510 (2017)

    Google ScholarĀ 

  21. Jampani, V., Gadde, R., Gehler, P.V.: Video propagation networks. In: CVPR, pp. 451ā€“461 (2017)

    Google ScholarĀ 

  22. Jang, W.D., Kim, C.S.: Online video object segmentation via convolutional trident network. In: CVPR, pp. 5849ā€“5858 (2017)

    Google ScholarĀ 

  23. Johnander, J., Danelljan, M., Brissman, E., Khan, F.S., Felsberg, M.: A generative appearance model for end-to-end video object segmentation. In: CVPR, pp. 8953ā€“8962 (2019)

    Google ScholarĀ 

  24. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  25. Li, X., C. Loy, C.: Video object segmentation with joint re-identification and attention-aware mask propagation. In: ECCV, pp. 90ā€“105 (2018)

    Google ScholarĀ 

  26. Lin, H., Qi, X., Jia, J.: AGSS-VOS: attention guided single-shot video object segmentation. In: ICCV, pp. 3949ā€“3957 (2019)

    Google ScholarĀ 

  27. Lukežič, A., Matas, J., Kristan, M.: D3s-a discriminative single shot segmentation tracker. arXiv preprint arXiv:1911.08862 (2019)

  28. Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML, vol. 30, p. 3 (2013)

    Google ScholarĀ 

  29. Maninis, K.K., et al.: Video object segmentation without temporal information. IEEE Trans. Pattern Anal. Mach. Intell. 41(6), 1515ā€“1530 (2018)

    Google ScholarĀ 

  30. Nam, H., Kim, H.: Batch-instance normalization for adaptively style-invariant neural networks. In: Advances in Neural Information Processing System (2018)

    Google ScholarĀ 

  31. Perazzi, F., Khoreva, A., Benenson, R., Schiele, B., Sorkine-Hornung, A.: Learning video object segmentation from static images. In: CVPR, pp. 2663ā€“2672 (2017)

    Google ScholarĀ 

  32. Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L., Gross, M., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: CVPR, pp. 724ā€“732 (2016)

    Google ScholarĀ 

  33. Pont-Tuset, J., Perazzi, F., Caelles, S., ArbelƔez, P., Sorkine-Hornung, A., Van Gool, L.: The 2017 davis challenge on video object segmentation. arXiv preprint arXiv:1704.00675 (2017)

  34. Rahman, M.M., Fiaz, M., Jung, S.K.: Efficient visual tracking with stacked channel-spatial attention learning. IEEE Access 8, 100857ā€“100869 (2020)

    ArticleĀ  Google ScholarĀ 

  35. Tian, Z., He, T., Shen, C., Yan, Y.: Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation. In: CVPR, pp. 3126ā€“3135 (2019)

    Google ScholarĀ 

  36. Tsai, Y.H., Yang, M.H., Black, M.J.: Video segmentation via object flow. In: CVPR, pp. 3899ā€“3908 (2016)

    Google ScholarĀ 

  37. Ventura, C., Bellver, M., Girbau, A., Salvador, A., Marques, F., Giro-i Nieto, X.: RVOS: end-to-end recurrent network for video object segmentation. In: CVPR, pp. 5277ā€“5286 (2019)

    Google ScholarĀ 

  38. Voigtlaender, P., Chai, Y., Schroff, F., Adam, H., Leibe, B., Chen, L.C.: Feelvos: fast end-to-end embedding learning for video object segmentation. In: CVPR, pp. 9481ā€“9490 (2019)

    Google ScholarĀ 

  39. Voigtlaender, P., Leibe, B.: Online adaptation of convolutional neural networks for the 2017 DAVIS challenge on video object segmentation. In: The 2017 DAVIS Challenge on VOS-CVPR Workshops, vol. 5 (2017)

    Google ScholarĀ 

  40. Voigtlaender, P., Luiten, J., Leibe, B.: BoLTVOS: box-level tracking for video object segmentation. arXiv preprint arXiv:1904.04552 (2019)

  41. Wang, Q., et al.: Fast online object tracking and segmentation: a unifying approach. In: CVPR, pp. 1328ā€“1338 (2019)

    Google ScholarĀ 

  42. Wang, W., Shen, J., Porikli, F., Yang, R.: Semi-supervised video object segmentation with super-trajectories. IEEE Trans. Pattern Anal. Mach. Intell. 41(4), 985ā€“998 (2018)

    Google ScholarĀ 

  43. Wang, Z., Xu, J., Liu, L., Zhu, F., Shao, L.: RANet: ranking attention network for fast video object segmentation. In: ICCV, pp. 3978ā€“3987 (2019)

    Google ScholarĀ 

  44. Oh, S.W., et al.: Fast video object segmentation by reference-guided mask propagation. In: CVPR, pp. 7376ā€“7385 (2018)

    Google ScholarĀ 

  45. Xu, N., et al.: YouTube-VOS: a large-scale video object segmentation benchmark. arXiv preprint arXiv:1809.03327 (2018)

  46. Yang, L., et al.: Efficient video object segmentation via network modulation. In: CVPR, pp. 6499ā€“6507 (2018)

    Google ScholarĀ 

  47. Yang, Z., et al.: Anchor diffusion for unsupervised video object segmentation. In: ICCV, pp. 931ā€“940 (2019)

    Google ScholarĀ 

  48. Zhou, Q., et al.: Proposal, tracking and segmentation (PTS): a cascaded network for video object segmentation. arXiv preprint arXiv:1907.01203 (2019)

  49. Zhuo, T., Cheng, Z., Kankanhalli, M.: Fast video object segmentation via mask transfer network. arXiv preprint arXiv:1908.10717 (2019)

Download references

Acknowledgment

This study was supported by the BK21 FOUR project (AI-driven Convergence Software Education Research Program) funded by the Ministry of Education, School of Computer Science and Engineering, Kyungpook National University, Korea (4199990214394).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Soon Ki Jung .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fiaz, M., Mahmood, A., Shahzad Farooq, S., Ali, K., Shaheryar, M., Jung, S.K. (2022). Video Object Segmentation Based onĀ Guided Feature Transfer Learning. In: Sumi, K., Na, I.S., Kaneko, N. (eds) Frontiers of Computer Vision. IW-FCV 2022. Communications in Computer and Information Science, vol 1578. Springer, Cham. https://doi.org/10.1007/978-3-031-06381-7_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-06381-7_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-06380-0

  • Online ISBN: 978-3-031-06381-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics