Unsupervised Video Object Segmentation Based on Mixture Models and Saliency Detection

Lin, Guofeng; Fan, Wentao

doi:10.1007/s11063-019-10110-z

Unsupervised Video Object Segmentation Based on Mixture Models and Saliency Detection

Published: 28 August 2019

Volume 51, pages 657–674, (2020)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

385 Accesses
8 Citations
Explore all metrics

Abstract

In this paper, we propose an unsupervised video object segmentation approach which is mainly based on a saliency detection method and the Gaussian mixture model with Markov random field. In our approach, the saliency detection method is developed as a preprocessing technique to calculate the probability of each pixel as the target object. In contrast to traditional saliency detection methods which are normally difficult to obtain the object’s precise boundary and are therefore hard to segment consistent objects, the developed saliency detection method can calculate the saliency of each frame in the video sequence and extract the position and region of the target object with more accurate object boundary. The refined extracted object region is then taken as the prior information and incorporated into the Gaussian mixture model with Markov random field to obtain the precise pixel-wise segmentation result of each frame. The effectiveness of the proposed unsupervised video object segmentation approach is validated through experimental results using both the SegTrack and the SegTrack v2 data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

ByteTrack: Multi-object Tracking by Associating Every Detection Box

Deep learning for video object segmentation: a review

Article Open access 08 April 2022

References

Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Süsstrunk S (2012) Slic superpixels compared to state-of-the-art superpixel methods. IEEE Trans Pattern Anal Mach Intell 34(11):2274–2282
Google Scholar
Bai X, Wang J, Simons DP, Sapiro G (2009) Video snapcut: robust video object cutout using localized classifiers. ACM Trans Graph 28(3):70
Google Scholar
Brendel W, Todorovic S (2009) Video object segmentation by tracking regions. In: 2009 IEEE 12th international conference on computer vision, pp 833–840
Caelles S, Maninis K, Pont-Tuset J, Leal-Taixé L, Cremers D, Gool LV (2017) One-shot video object segmentation. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 5320–5329
Cai W, Chen S, Zhang D (2007) Fast and robust fuzzy c-means clustering algorithms incorporating local information for image segmentation. Pattern Recognit 40(3):825–838
Google Scholar
Celeux Gilles, Forbes Florence, Peyrard Nathalie (2001) Em procedures using mean field-like approximations for Markov model-based image segmentation. Pattern Recognit 36(1):131–144
Google Scholar
Chockalingam P, Pradeep N, Birchfield S (2009) Adaptive fragments-based tracking of non-rigid objects using level sets. In: 2009 IEEE 12th international conference on computer vision, pp 1530–1537
Fan W, Bouguila N (2016) Model-based clustering based on variational learning of hierarchical infinite beta-liouville mixture models. Neural Process Lett 44(2):431–449
Google Scholar
Fan W, Bouguila N (2019) Nonparametric hierarchical Bayesian models for positive data clustering based on inverted Dirichlet-based distributions. IEEE Access 7:83600–83614
Google Scholar
Fan W, Hu C, Du J, Bouguila N (2018) A novel model-based approach for medical image segmentation using spatially constrained inverted Dirichlet mixture models. Neural Process Lett 47(2):619–639
Google Scholar
Fan W, Bouguila N, Du J, Liu X (2019) Axially symmetric data clustering through Dirichlet process mixture models of Watson distributions. IEEE Trans Neural Netw Learn Syst 30(6):1683–1694
Google Scholar
Felzenszwalb PF, Huttenlocher DP (2004) Efficient graph-based image segmentation. Int J Comput Vis 59(2):167–181
Google Scholar
Fu H, Cao X, Tu Z (2013) Cluster-based co-saliency detection. IEEE Trans Image Process 22(10):3766–3778
Google Scholar
Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6(6):721–741
Google Scholar
Greggio N, Bernardino A, Laschi C, Dario P, Santos-Victor J (2012) Fast estimation of Gaussian mixture models for image segmentation. Mach Vis Appl 23(4):773–789
Google Scholar
He H, Lu K, Lv B (2006) Gaussian mixture model with Markov random field for mr image segmentation. In: 2006 IEEE international conference on industrial technology, pp 1166–1170
He K, Gkioxari G, Dollar P, Girshick R (2018) Mask R-CNN. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2018.2844175
Google Scholar
Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670
Google Scholar
Hu C, Fan W, Du J, Zeng Y (2018) Model-based segmentation of image data using spatially constrained mixture models. Neurocomputing 283:214–227
Google Scholar
Huang Y, Liu Q, Metaxas D (2009) Video object segmentation by hypergraph cut. In: 2009 IEEE conference on computer vision and pattern recognition, pp 1738–1745
Li F, Kim T, Humayun A, Tsai D, Rehg JM (2013) Video segmentation by tracking many figure-ground segments. In: 2013 IEEE international conference on computer vision, pp 2192–2199
Lu H, Woods JC, Ghanbari M (2007) Binary partition tree for semantic object extraction and image segmentation. IEEE Trans Circuits Syst Video Technol 17(3):378–383
Google Scholar
Mahadevan V, Vasconcelos N (2010) Spatiotemporal saliency in dynamic scenes. IEEE Trans Pattern Anal Mach Intell 32(1):171–177
Google Scholar
Marki N, Perazzi F, Wang O, Sorkine-Hornung A (2016) Bilateral space video segmentation. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 743–751
Mathe S, Sminchisescu C (2012) Dynamic eye movement datasets and learnt saliency models for visual action recognition. Comput Vis ECCV 2012:842–856
Google Scholar
McLachlan G, Peel D (2000) Finite Mixture Models. Wiley, New York
Google Scholar
Moon TK (1996) The expectation-maximization algorithm. IEEE Signal Process Mag 13(6):47–60
Google Scholar
Nikou C, Galatsanos NP, Likas AC (2007) A class-adaptive spatially variant mixture model for image segmentation. IEEE Trans Image Process 16(4):1121–1130
Google Scholar
Oh SW, Lee J, Sunkavalli K, Kim SJ (2018) Fast video object segmentation by reference-guided mask propagation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 7376–7385
Papazoglou A, Ferrari V (2013) Fast object segmentation in unconstrained video. In: 2013 IEEE international conference on computer vision, pp 1777–1784
Perazzi F, Khoreva A, Benenson R, Schiele B, Sorkine-Hornung A (2017) Learning video object segmentation from static images. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 3491–3500
Price BL, Morse BS, Cohen S (2009) Livecut: learning-based interactive video segmentation by evaluation of multiple propagated cues. In: IEEE international conference on computer vision, pp 779–786
Rahtu E, Kannala J, Salo M, Heikkilä J (2010) Segmenting salient objects from images and videos. Comput Vis- ECCV 2010:366–379
Google Scholar
Ramadan H, Tairi H (2016) Moving object segmentation in video using spatiotemporal saliency and Laplacian coordinates. In: 2016 IEEE/ACS 13th international conference of computer systems and applications (AICCSA), pp 1–7
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Google Scholar
Tsai D, Flagg M, Mrehg J (2010a) Motion coherent tracking with multi-label mrf optimization. BMVC
Tsai D, Flagg M, Rehg J (2010b) Motion coherent tracking with multi-label mrf optimization. In: Proc. BMVC, pp 56.1–11
Tsai Y, Yang M, Black MJ (2016) Video segmentation via object flow. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 3899–3908
Vazquez-Reina A, Avidan S, Pfister H, Miller E (2010) Multiple hypothesis video segmentation from superpixel flows. Comput Vis ECCV 2010:268–281
Google Scholar
Wang W, Shen J, Porikli F (2015) Saliency-aware geodesic video object segmentation. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 3395–3402
Wang W, Shen J, Yang R, Porikli F (2018) Saliency-aware video object segmentation. IEEE Trans Pattern Anal Mach Intell 40(1):20–33
Google Scholar
Yang C, Zhang L, Lu H, Ruan X, Yang M (2013) Saliency detection via graph-based manifold ranking. In: 2013 IEEE conference on computer vision and pattern recognition, pp 3166–3173
Yu J, Yang X, Gao F, Tao D (2017) Deep multimodal distance metric learning using click constraints for image ranking. IEEE Trans Cybern 47(12):4014–4024
Google Scholar
Yu J, Zhu C, Zhang J, Huang Q, Tao D (2019) Spatial pyramid-enhanced NetVLAD with weighted triplet loss for place recognition. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2019.2908982
Google Scholar
Yuen J, Russell B, Liu C, Torralba A (2009) Labelme video: Building a video database with human annotations. In: IEEE international conference on computer vision, pp 1451–1458
Zhang D, Javed O, Shah M (2013) Video object segmentation through spatially accurate and temporally dense extraction of primary object regions. In: 2013 IEEE conference on computer vision and pattern recognition, pp 628–635
Zhang L, Liu Y, Han S (2017) Video segmentation based on strong target constrained video saliency. In: 2017 2nd International conference on image, vision and computing (ICIVC), pp 356–360
Zhang J, Yu J, Tao D (2018) Local deep-feature alignment for unsupervised dimension reduction. IEEE Trans Image Process 27(5):2420–2432
Google Scholar

Download references

Acknowledgements

The completion of this work was supported by the National Natural Science Foundation of China (61876068), the Natural Science Foundation of Fujian Province (2018J01094) and the Promotion Program for Young and Middle-aged Teacher in Science and Technology Research of Huaqiao University (ZQN-PY510).

Author information

Authors and Affiliations

Department of Computer Science and Technology, Huaqiao University, Xiamen, China
Guofeng Lin & Wentao Fan

Authors

Guofeng Lin
View author publications
You can also search for this author in PubMed Google Scholar
Wentao Fan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wentao Fan.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, G., Fan, W. Unsupervised Video Object Segmentation Based on Mixture Models and Saliency Detection. Neural Process Lett 51, 657–674 (2020). https://doi.org/10.1007/s11063-019-10110-z

Download citation

Published: 28 August 2019
Issue Date: February 2020
DOI: https://doi.org/10.1007/s11063-019-10110-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unsupervised Video Object Segmentation Based on Mixture Models and Saliency Detection

Abstract

Access this article

Similar content being viewed by others

Attention mechanisms in computer vision: A survey

ByteTrack: Multi-object Tracking by Associating Every Detection Box

Deep learning for video object segmentation: a review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Unsupervised Video Object Segmentation Based on Mixture Models and Saliency Detection

Abstract

Access this article

Similar content being viewed by others

Attention mechanisms in computer vision: A survey

ByteTrack: Multi-object Tracking by Associating Every Detection Box

Deep learning for video object segmentation: a review

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation