Skip to main content
Log in

Unsupervised Video Object Segmentation Based on Mixture Models and Saliency Detection

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

In this paper, we propose an unsupervised video object segmentation approach which is mainly based on a saliency detection method and the Gaussian mixture model with Markov random field. In our approach, the saliency detection method is developed as a preprocessing technique to calculate the probability of each pixel as the target object. In contrast to traditional saliency detection methods which are normally difficult to obtain the object’s precise boundary and are therefore hard to segment consistent objects, the developed saliency detection method can calculate the saliency of each frame in the video sequence and extract the position and region of the target object with more accurate object boundary. The refined extracted object region is then taken as the prior information and incorporated into the Gaussian mixture model with Markov random field to obtain the precise pixel-wise segmentation result of each frame. The effectiveness of the proposed unsupervised video object segmentation approach is validated through experimental results using both the SegTrack and the SegTrack v2 data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Süsstrunk S (2012) Slic superpixels compared to state-of-the-art superpixel methods. IEEE Trans Pattern Anal Mach Intell 34(11):2274–2282

    Google Scholar 

  2. Bai X, Wang J, Simons DP, Sapiro G (2009) Video snapcut: robust video object cutout using localized classifiers. ACM Trans Graph 28(3):70

    Google Scholar 

  3. Brendel W, Todorovic S (2009) Video object segmentation by tracking regions. In: 2009 IEEE 12th international conference on computer vision, pp 833–840

  4. Caelles S, Maninis K, Pont-Tuset J, Leal-Taixé L, Cremers D, Gool LV (2017) One-shot video object segmentation. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 5320–5329

  5. Cai W, Chen S, Zhang D (2007) Fast and robust fuzzy c-means clustering algorithms incorporating local information for image segmentation. Pattern Recognit 40(3):825–838

    Google Scholar 

  6. Celeux Gilles, Forbes Florence, Peyrard Nathalie (2001) Em procedures using mean field-like approximations for Markov model-based image segmentation. Pattern Recognit 36(1):131–144

    Google Scholar 

  7. Chockalingam P, Pradeep N, Birchfield S (2009) Adaptive fragments-based tracking of non-rigid objects using level sets. In: 2009 IEEE 12th international conference on computer vision, pp 1530–1537

  8. Fan W, Bouguila N (2016) Model-based clustering based on variational learning of hierarchical infinite beta-liouville mixture models. Neural Process Lett 44(2):431–449

    Google Scholar 

  9. Fan W, Bouguila N (2019) Nonparametric hierarchical Bayesian models for positive data clustering based on inverted Dirichlet-based distributions. IEEE Access 7:83600–83614

    Google Scholar 

  10. Fan W, Hu C, Du J, Bouguila N (2018) A novel model-based approach for medical image segmentation using spatially constrained inverted Dirichlet mixture models. Neural Process Lett 47(2):619–639

    Google Scholar 

  11. Fan W, Bouguila N, Du J, Liu X (2019) Axially symmetric data clustering through Dirichlet process mixture models of Watson distributions. IEEE Trans Neural Netw Learn Syst 30(6):1683–1694

    Google Scholar 

  12. Felzenszwalb PF, Huttenlocher DP (2004) Efficient graph-based image segmentation. Int J Comput Vis 59(2):167–181

    Google Scholar 

  13. Fu H, Cao X, Tu Z (2013) Cluster-based co-saliency detection. IEEE Trans Image Process 22(10):3766–3778

    Google Scholar 

  14. Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6(6):721–741

    Google Scholar 

  15. Greggio N, Bernardino A, Laschi C, Dario P, Santos-Victor J (2012) Fast estimation of Gaussian mixture models for image segmentation. Mach Vis Appl 23(4):773–789

    Google Scholar 

  16. He H, Lu K, Lv B (2006) Gaussian mixture model with Markov random field for mr image segmentation. In: 2006 IEEE international conference on industrial technology, pp 1166–1170

  17. He K, Gkioxari G, Dollar P, Girshick R (2018) Mask R-CNN. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2018.2844175

    Google Scholar 

  18. Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670

    Google Scholar 

  19. Hu C, Fan W, Du J, Zeng Y (2018) Model-based segmentation of image data using spatially constrained mixture models. Neurocomputing 283:214–227

    Google Scholar 

  20. Huang Y, Liu Q, Metaxas D (2009) Video object segmentation by hypergraph cut. In: 2009 IEEE conference on computer vision and pattern recognition, pp 1738–1745

  21. Li F, Kim T, Humayun A, Tsai D, Rehg JM (2013) Video segmentation by tracking many figure-ground segments. In: 2013 IEEE international conference on computer vision, pp 2192–2199

  22. Lu H, Woods JC, Ghanbari M (2007) Binary partition tree for semantic object extraction and image segmentation. IEEE Trans Circuits Syst Video Technol 17(3):378–383

    Google Scholar 

  23. Mahadevan V, Vasconcelos N (2010) Spatiotemporal saliency in dynamic scenes. IEEE Trans Pattern Anal Mach Intell 32(1):171–177

    Google Scholar 

  24. Marki N, Perazzi F, Wang O, Sorkine-Hornung A (2016) Bilateral space video segmentation. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 743–751

  25. Mathe S, Sminchisescu C (2012) Dynamic eye movement datasets and learnt saliency models for visual action recognition. Comput Vis ECCV 2012:842–856

    Google Scholar 

  26. McLachlan G, Peel D (2000) Finite Mixture Models. Wiley, New York

    Google Scholar 

  27. Moon TK (1996) The expectation-maximization algorithm. IEEE Signal Process Mag 13(6):47–60

    Google Scholar 

  28. Nikou C, Galatsanos NP, Likas AC (2007) A class-adaptive spatially variant mixture model for image segmentation. IEEE Trans Image Process 16(4):1121–1130

    Google Scholar 

  29. Oh SW, Lee J, Sunkavalli K, Kim SJ (2018) Fast video object segmentation by reference-guided mask propagation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 7376–7385

  30. Papazoglou A, Ferrari V (2013) Fast object segmentation in unconstrained video. In: 2013 IEEE international conference on computer vision, pp 1777–1784

  31. Perazzi F, Khoreva A, Benenson R, Schiele B, Sorkine-Hornung A (2017) Learning video object segmentation from static images. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 3491–3500

  32. Price BL, Morse BS, Cohen S (2009) Livecut: learning-based interactive video segmentation by evaluation of multiple propagated cues. In: IEEE international conference on computer vision, pp 779–786

  33. Rahtu E, Kannala J, Salo M, Heikkilä J (2010) Segmenting salient objects from images and videos. Comput Vis- ECCV 2010:366–379

    Google Scholar 

  34. Ramadan H, Tairi H (2016) Moving object segmentation in video using spatiotemporal saliency and Laplacian coordinates. In: 2016 IEEE/ACS 13th international conference of computer systems and applications (AICCSA), pp 1–7

  35. Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905

    Google Scholar 

  36. Tsai D, Flagg M, Mrehg J (2010a) Motion coherent tracking with multi-label mrf optimization. BMVC

  37. Tsai D, Flagg M, Rehg J (2010b) Motion coherent tracking with multi-label mrf optimization. In: Proc. BMVC, pp 56.1–11

  38. Tsai Y, Yang M, Black MJ (2016) Video segmentation via object flow. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 3899–3908

  39. Vazquez-Reina A, Avidan S, Pfister H, Miller E (2010) Multiple hypothesis video segmentation from superpixel flows. Comput Vis ECCV 2010:268–281

    Google Scholar 

  40. Wang W, Shen J, Porikli F (2015) Saliency-aware geodesic video object segmentation. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 3395–3402

  41. Wang W, Shen J, Yang R, Porikli F (2018) Saliency-aware video object segmentation. IEEE Trans Pattern Anal Mach Intell 40(1):20–33

    Google Scholar 

  42. Yang C, Zhang L, Lu H, Ruan X, Yang M (2013) Saliency detection via graph-based manifold ranking. In: 2013 IEEE conference on computer vision and pattern recognition, pp 3166–3173

  43. Yu J, Yang X, Gao F, Tao D (2017) Deep multimodal distance metric learning using click constraints for image ranking. IEEE Trans Cybern 47(12):4014–4024

    Google Scholar 

  44. Yu J, Zhu C, Zhang J, Huang Q, Tao D (2019) Spatial pyramid-enhanced NetVLAD with weighted triplet loss for place recognition. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2019.2908982

    Google Scholar 

  45. Yuen J, Russell B, Liu C, Torralba A (2009) Labelme video: Building a video database with human annotations. In: IEEE international conference on computer vision, pp 1451–1458

  46. Zhang D, Javed O, Shah M (2013) Video object segmentation through spatially accurate and temporally dense extraction of primary object regions. In: 2013 IEEE conference on computer vision and pattern recognition, pp 628–635

  47. Zhang L, Liu Y, Han S (2017) Video segmentation based on strong target constrained video saliency. In: 2017 2nd International conference on image, vision and computing (ICIVC), pp 356–360

  48. Zhang J, Yu J, Tao D (2018) Local deep-feature alignment for unsupervised dimension reduction. IEEE Trans Image Process 27(5):2420–2432

    Google Scholar 

Download references

Acknowledgements

The completion of this work was supported by the National Natural Science Foundation of China (61876068), the Natural Science Foundation of Fujian Province (2018J01094) and the Promotion Program for Young and Middle-aged Teacher in Science and Technology Research of Huaqiao University (ZQN-PY510).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wentao Fan.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, G., Fan, W. Unsupervised Video Object Segmentation Based on Mixture Models and Saliency Detection. Neural Process Lett 51, 657–674 (2020). https://doi.org/10.1007/s11063-019-10110-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-019-10110-z

Keywords

Navigation