Skip to main content
Log in

Unsupervised video object segmentation by spatiotemporal graphical model

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

We propose a novel spatiotemporal graphical model for unsupervised video object segmentation. The core of our model is a layered-CRF (conditional random field) that contains two layers, i.e., pixel layer and supervoxel layer. First, the heat diffusion based segmentation and salient region detection is integrated to obtain the segmentation results of the first frame. The results are used as input seeds to train dual probabilistic models of each object class. In the spatiotemporal layered-CRF framework we extend binary segmentation to multiple object segmentation. We add intra-frame spatial matching potential and inter-frame temporal supervoxels consistent potential to link the pixel layer and the supervoxel layer. This improves the spatiotemporal smoothing throughout the video sequence in the proposed model. The proposed unsupervised method lightens the burden of labeling training samples and obtains a smooth and accurate object boundary in video segmentation. The experiments on two public datasets demonstrate that our method outperforms several state-of-the-art methods in both single and multiple foreground cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Susstrunk S (2012) SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 34:2274–2282

    Article  Google Scholar 

  2. Akamine K, Fukuchi K, Kimura A, Takagi S (2012) Fully automatic extraction of salient objects from videos in near real time. Comput J 55:3–14

    Article  Google Scholar 

  3. Badrinarayanan V, Budvytis I, Cipolla R (2013) Semi-supervised video segmentation using tree structured graphical models. IEEE Transactions on Pattern Analysis and Machine Intelligence 35:2751–2764

    Article  Google Scholar 

  4. Boykov Y, Veksler O, Zabih R (2001) Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence 23:1222–1239

    Article  Google Scholar 

  5. Cheng M.-M, Warrell J, Lin W.-Y, Zheng S, Vineet V, Crook N (2013) Efficient salient region detection with soft image abstraction, 2013 I.E. International Conference on Computer Vision (ICCV) IEEE, pp. 1529–1536

  6. Chiu W.-C, Fritz M (2013) Multi-class video co-segmentation with a generative multi-video model, 2013 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), pp. 321–328

  7. Dong Z, Javed O, Shah M (2013) Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions, 2013 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), pp. 628–635

  8. Endres I, Hoiem D (2010) Category independent object proposals, computer vision–ECCV 2010, Springer, pp 575-588

  9. Gopalakrishnan V, Hu Y, Rajan D (2009) Salient region detection by modeling distributions of color and orientation. IEEE Transactions on Multimedia 11:892–905

    Article  Google Scholar 

  10. Hsien-Ting C, Ahuja N (2012) Exploiting nonlocal spatiotemporal structure for video segmentation, 2012 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), pp. 741–748

  11. Huazhu F, Xiaochun C, Zhuowen T (2013) Cluster-based Co-saliency detection. IEEE Trans Image Process 22:3766–3778

    Article  MathSciNet  Google Scholar 

  12. Huazhu F, Dong X, Bao Z, Lin S (2014) Object-Based Multiple Foreground Video Co-segmentation, 2014 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3166–3173

  13. Joulin A, Bach F, Ponce J (2012) Multi-class cosegmentation, 2012 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), pp. 542–549

  14. Kae A, Marlin B, Learned-Miller E (2014) The Shape-Time Random Field for Semantic Video Labeling, 2014 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), pp. 272–279

  15. Kim G, Xing EP, Fei-Fei L, Kanade T (2011) Distributed cosegmentation via submodular optimization on anisotropic diffusion, 2011 I.E. International Conference on Computer Vision (ICCV), pp. 169–176

  16. Kohli P, Kumar MP, Torr PH (2007) P3 and beyond: Solving energies with higher order cliques, 2007 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8

  17. Kohli P, Ladicky L, Torr P.H.S (2008) Robust higher order potentials for enforcing label consistency, 2008 I.E. Conference on Computer Vision and Pattern Recognition(CVPR), pp. 1–8

  18. Lee YJ, Kim J, Grauman K (2011) key-segments for video object segmentation, 2011 I.E. international conference on computer vision (ICCV) IEEE, pp. 1995-2002

  19. Leung T, Malik J (2001) Representing and recognizing the visual appearance of materials using three-dimensional textons. Int J Comput Vis 43:29–44

    Article  MATH  Google Scholar 

  20. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110

    Article  Google Scholar 

  21. Paris S, Durand F (2007) A topological approach to hierarchical segmentation using mean shift, 2007 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8

  22. Raza S.H, Grundmann M, Essa I (2013) Geometric context from videos, 2013 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3081–3088

  23. Shotton J, Winn J, Rother C, Criminisi A (2006) Textonboost: joint appearance, shape and context modeling for multi-class object recognition and segmentation, computer vision–ECCV 2006, Springer, pp 1-15

  24. Shotton J, Johnson M, Cipolla R (2008) Semantic texton forests for image categorization and segmentation, 2008 I.E. Conference on Computer vision and pattern recognition (CVPR), pp. 1–8

  25. Tianyang M, Latecki LJ (2012) Maximum weight cliques with mutex constraints for video object segmentation, 2012 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), pp. 670–677S

  26. Torralba A, Murphy K, Freeman W (2014) Sharing features: efficient boosting procedures for multiclass object detection. 2004 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), pp. 762–769

  27. Tsai D, Flagg M, Nakazawa A, Rehg J (2012) Motion coherent tracking using multi-label MRF optimization. Int J Comput Vis 100:190–202

    Article  MathSciNet  Google Scholar 

  28. Xu C, Xiong C, Corso JJ (2012) Streaming hierarchical video segmentation, computer vision–ECCV 2012, Springer, pp. 626-639

  29. Zhang D, Javed O, Shah M (2014) Video object Co-segmentation by regulated maximum weight cliques, computer vision–ECCV 2014, Springer, pp. 551-566

Download references

Acknowledgments

This work is supported by National Natural Science Foundation of China (NSFC:61175026), Inte-rnational Science and Technology Cooperation Special Programme (No. 2013DFG12810), Ningbo Municipal Natural Science Foundation of China (2014A610031, 2014A610032), Open Research Fund of Zhejiang First-foremost Key Subject-Information and Communications Engineering of China(XKXL1316),C.Wong Magna Fund in Ningbo University,Open Fund of Zhejiang Provincial Key Academic Project(first level).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lijun Guo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, L., Cheng, T., Huang, Y. et al. Unsupervised video object segmentation by spatiotemporal graphical model. Multimed Tools Appl 76, 1037–1053 (2017). https://doi.org/10.1007/s11042-015-3100-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-015-3100-9

Keywords

Navigation