Skip to main content
Log in

Video object segmentation guided refinement on foreground-background objects

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Video Object Segmentation (VOS) for separating a foreground object from a video sequence is an intricate task and relies on fine-tuning. Many recent approaches focus on pixel-wise matching of foreground objects and gives importance to balancing the relation between the pixels for identifying the foreground objects and might lead to misclassification. This paper explores mapping between the foreground and background objects in semi-supervised VOS by balancing and mutually mapping the pixels between the foreground and background objects. The proposed model makes practical and effective use of enhanced pixel and instance level matching to improve the prediction. Moreover, the framework implements ensemble learning with a Leaky-ReLU activation function that improves the segmentation process. To evaluate the results of object segmentation process, J and F scores are measured. We carry experiments broadly on popular benchmark DAVIS, in the versions 2016 and 2017. Our Model achieves a promising performance of J & F score of 82%, surpassing all the other techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Code Availability

The relevant code towards the work are available with authors

References

  1. Bao L, Wu B, Liu W (2018) CNN in MRF Video object segmentation via inference in a CNN-based higher-order spatio-temporal MRF. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5977–5986

  2. Caelles S, Maninis K-K, Pont-Tuset J, Leal-Taixé L, Cremers D, Gool LV (2017) One-shot video object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 221–230

  3. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille Al (2017) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

  4. Chen Y, Pont-Tuset J, Montes A, Gool LV (2018) Blazingly fast video object segmentation with pixel-wise metric learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1189–1198

  5. Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818

  6. Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818

  7. Cheng H-T, Chao C-H, Dong J-D, Wen H-K, Liu T-L, Sun M (2018) Cube padding for weakly-supervised saliency prediction in 360 videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1420–1429

  8. Cheng J, Tsai Y-H, Hung W-C, Wang S, Yang M-H (2018) Fast and accurate online video object segmentation via tracking parts. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7415–7424

  9. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255

  10. Favorskaya MN, Andreev VV (2019) The study of activation functions in deep learning for pedestrian detection and tracking. International Archives of the Photogrammetry, Remote Sensing & Spatial Information Sciences

  11. Glorot X, Bengio Y (2018) Understanding the difficulty of training deep feedforward neural networks. 2010. In: International Conference on Artificial Intelligence and Statistics

  12. Griffin BA, Corso JJ (2019) Bubblenets: Learning to select the guidance frame in video object segmentation by deep sorting frames. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8914–8923

  13. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  14. Hu Y-T, Huang J-B, Schwing A (2017) Maskrnn: Instance level video object segmentation. Adv Neural Inf Process Syst 30:325–334

    Google Scholar 

  15. Hu Y-T, Huang J-B, Schwing AG (2018) Videomatch: Matching based video object segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 54–70

  16. Hu J, Li S, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

  17. Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167

  18. Johnander J, Danelljan M, Brissman E, Khan FS, Felsberg M (2019) A generative appearance model for end-to-end video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8953–8962

  19. Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90

    Article  Google Scholar 

  20. Li X, Loy CC (2018) Video object segmentation with joint re-identification and attention-aware mask propagation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 90–105

  21. Liang Y, He F, Zeng X (2020) 3D mesh simplification with feature preservation based on Whale Optimization Algorithm and Differential Evolution. Integrated Computer-Aided Engineering Preprint, pp 1–19

  22. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision, pp 740–755. Springer, Cham

  23. Luiten J, Voigtlaender P, Leibe B (2018) Premvos: Proposal-generation, refinement and merging for video object segmentation. In: Asian Conference on Computer Vision. Springer, Cham, pp 565–580

  24. Oh SW, Lee J-Y, Xu N, Kim SJ (2019) Video object segmentation using space-time memory networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp 9226–9235

  25. Pan Y, He F, Yu H (2020) Learning social representations with deep autoencoder for recommender system. World Wide Web 23(4):2259–2279

  26. Perazzi F, Khoreva A, Benenson R, Schiele B, Sorkine-Hornung A (2017) Learning video object segmentation from static images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2663–2672

  27. Perazzi F, Pont-Tuset J, McWilliams B, Gool LV, Gross M, Sorkine-Hornung A (2016) A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 724–732

  28. Pont-Tuset J, Perazzi F, Caelles S, Arbeláez P, Sorkine-Hornung A, Van Gool L (2017) The 2017 davis challenge on video object segmentation. arXiv:1704.00675

  29. Ventura C, Bellver M, Girbau A, Salvador A, Marques F, Giro-i-Nieto X (2019) Rvos: End-to-end recurrent network for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5277–5286

  30. Visin F, Ciccone M, Romero A, Kastner K, Cho K, Bengio Y, Matteucci M, Courville A (2016) Reseg: A recurrent neural network-based model for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 41–48

  31. Voigtlaender P, Chai Y, Schroff F, Adam H, Leibe B, Chen L-C (2019) Feelvos: Fast end-to-end embedding learning for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 9481–9490

  32. Voigtlaender P, Leibe B (2017) Online adaptation of convolutional neural networks for video object segmentation. arXiv:1706.09364

  33. Wang W, Song H, Zhao S, Shen J, Zhao S, Hoi SCH, Ling H (2019) Learning unsupervised video object segmentation through visual attention. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3064–3074

  34. Wang L, Wang Y, Liang Z, Lin Z, Yang J, An W, Guo Y (2019) Learning parallax attention for stereo image Super-Resolution. Inproceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Long Beach, pp 16–20

  35. Wang Q, Zhang L, Bertinetto L, Hu W, Torr PHS (2019) Fast online object tracking and segmentation: A unifying approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1328–1338

  36. Wolf CT (2016) DIY videos on YouTube: Identity and possibility in the age of algorithms. First Monday

  37. Wu Z, Shen C, Hengel Avd (2016) Bridging category level and instance-level semantic image segmentation. arXiv:1605.06885

  38. Wug O, Seoung J-YL, Sunkavalli K, Kim SJ (2018) Fast video object segmentation by reference-guided mask propagation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7376–7385

  39. Xiao H, Feng J, Lin G, Yu L, Zhang M (2018) Monet: Deep motion exploitation for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1140–1148

  40. Xu B, Wang N, Chen T, Li M (2015) Empirical evaluation of rectified activations in convolutional network. The International Conference on Machine Learning, arXiv:https://arxiv.org/abs/1505.00853 (8 January 2019)

  41. Xu N, Yang L, Fan Y, Yang J, Yue D, Liang Y, Price B, Cohen S, Huang T (2018) Youtube-vos: Sequence-to-sequence video object segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 585–601

  42. Yang Z, Wang Q, Bertinetto L, Hu W, Bai S, Torr PHS (2019) Anchor diffusion for unsupervised video object segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 931–940

  43. Yang L, Wang Y, Xiong X, Yang J, Aggelos K (2018) Katsaggelos: Efficient video object segmentation via network modulation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6499–6507

  44. Yu H, He F, Pan Y (2019) A novel segmentation model for medical images with intensity inhomogeneity based on adaptive perturbation. Multimed Tools Appl 78(9):11779–11798

  45. Zhao A, Balakrishnan G, Durand F, Guttag JV, Dalca AV (2019) Data augmentation using learned transformations for one-shot medical image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8543–8553

  46. Zhu X, Dai J, Zhu X, Wei Y, Yuan L (2018) Towards high performance video object detection for mobiles. arXiv:1804.05830

  47. Zhu X, Wang Y, Dai J, Yuan L, Wei Y (2017) Flow-guided feature aggregation for video object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp 408–417. Wang, Qiang, Li Zhang, Luca Bertinetto, Weiming Hu, and Philip HS Torr. Fast online object tracking and segmentation: A unifying approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1328–1338 (2019)

Download references

Funding

NA

Author information

Authors and Affiliations

Authors

Contributions

Both the authors have equally contributed to the work

Corresponding author

Correspondence to A. Razia Sulthana.

Ethics declarations

Ethics Approval

The manuscript has not been submitted to any other journal nor a part of the work is published anywhere

Consent for Publication

The authors would be ready for publication if this article is accepted by journal

Conflicts of Interests

There is no conflict of interest with any firm or person.

Additional information

Availability of Data And Material

The relevant data and material towards the work are available with authors

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Devi, J.S., Sulthana, A.R. Video object segmentation guided refinement on foreground-background objects. Multimed Tools Appl 82, 6769–6785 (2023). https://doi.org/10.1007/s11042-022-12981-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12981-2

Keywords

Navigation