Unsupervised video object segmentation by spatiotemporal graphical model

Guo, Lijun; Cheng, Tingting; Huang, Yuanjie; Zhao, Jieyu; Zhang, Rong

doi:10.1007/s11042-015-3100-9

Unsupervised video object segmentation by spatiotemporal graphical model

Published: 26 November 2015

Volume 76, pages 1037–1053, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Lijun Guo¹,
Tingting Cheng¹,
Yuanjie Huang¹,
Jieyu Zhao¹ &
…
Rong Zhang¹

386 Accesses
3 Citations
Explore all metrics

Abstract

We propose a novel spatiotemporal graphical model for unsupervised video object segmentation. The core of our model is a layered-CRF (conditional random field) that contains two layers, i.e., pixel layer and supervoxel layer. First, the heat diffusion based segmentation and salient region detection is integrated to obtain the segmentation results of the first frame. The results are used as input seeds to train dual probabilistic models of each object class. In the spatiotemporal layered-CRF framework we extend binary segmentation to multiple object segmentation. We add intra-frame spatial matching potential and inter-frame temporal supervoxels consistent potential to link the pixel layer and the supervoxel layer. This improves the spatiotemporal smoothing throughout the video sequence in the proposed model. The proposed unsupervised method lightens the burden of labeling training samples and obtains a smooth and accurate object boundary in video segmentation. The experiments on two public datasets demonstrate that our method outperforms several state-of-the-art methods in both single and multiple foreground cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Deep learning for video object segmentation: a review

Article Open access 08 April 2022

A survey on instance segmentation: state of the art

Article 03 July 2020

References

Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Susstrunk S (2012) SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 34:2274–2282
Article Google Scholar
Akamine K, Fukuchi K, Kimura A, Takagi S (2012) Fully automatic extraction of salient objects from videos in near real time. Comput J 55:3–14
Article Google Scholar
Badrinarayanan V, Budvytis I, Cipolla R (2013) Semi-supervised video segmentation using tree structured graphical models. IEEE Transactions on Pattern Analysis and Machine Intelligence 35:2751–2764
Article Google Scholar
Boykov Y, Veksler O, Zabih R (2001) Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence 23:1222–1239
Article Google Scholar
Cheng M.-M, Warrell J, Lin W.-Y, Zheng S, Vineet V, Crook N (2013) Efficient salient region detection with soft image abstraction, 2013 I.E. International Conference on Computer Vision (ICCV) IEEE, pp. 1529–1536
Chiu W.-C, Fritz M (2013) Multi-class video co-segmentation with a generative multi-video model, 2013 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), pp. 321–328
Dong Z, Javed O, Shah M (2013) Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions, 2013 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), pp. 628–635
Endres I, Hoiem D (2010) Category independent object proposals, computer vision–ECCV 2010, Springer, pp 575-588
Gopalakrishnan V, Hu Y, Rajan D (2009) Salient region detection by modeling distributions of color and orientation. IEEE Transactions on Multimedia 11:892–905
Article Google Scholar
Hsien-Ting C, Ahuja N (2012) Exploiting nonlocal spatiotemporal structure for video segmentation, 2012 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), pp. 741–748
Huazhu F, Xiaochun C, Zhuowen T (2013) Cluster-based Co-saliency detection. IEEE Trans Image Process 22:3766–3778
Article MathSciNet Google Scholar
Huazhu F, Dong X, Bao Z, Lin S (2014) Object-Based Multiple Foreground Video Co-segmentation, 2014 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3166–3173
Joulin A, Bach F, Ponce J (2012) Multi-class cosegmentation, 2012 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), pp. 542–549
Kae A, Marlin B, Learned-Miller E (2014) The Shape-Time Random Field for Semantic Video Labeling, 2014 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), pp. 272–279
Kim G, Xing EP, Fei-Fei L, Kanade T (2011) Distributed cosegmentation via submodular optimization on anisotropic diffusion, 2011 I.E. International Conference on Computer Vision (ICCV), pp. 169–176
Kohli P, Kumar MP, Torr PH (2007) P3 and beyond: Solving energies with higher order cliques, 2007 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8
Kohli P, Ladicky L, Torr P.H.S (2008) Robust higher order potentials for enforcing label consistency, 2008 I.E. Conference on Computer Vision and Pattern Recognition(CVPR), pp. 1–8
Lee YJ, Kim J, Grauman K (2011) key-segments for video object segmentation, 2011 I.E. international conference on computer vision (ICCV) IEEE, pp. 1995-2002
Leung T, Malik J (2001) Representing and recognizing the visual appearance of materials using three-dimensional textons. Int J Comput Vis 43:29–44
Article MATH Google Scholar
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110
Article Google Scholar
Paris S, Durand F (2007) A topological approach to hierarchical segmentation using mean shift, 2007 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8
Raza S.H, Grundmann M, Essa I (2013) Geometric context from videos, 2013 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3081–3088
Shotton J, Winn J, Rother C, Criminisi A (2006) Textonboost: joint appearance, shape and context modeling for multi-class object recognition and segmentation, computer vision–ECCV 2006, Springer, pp 1-15
Shotton J, Johnson M, Cipolla R (2008) Semantic texton forests for image categorization and segmentation, 2008 I.E. Conference on Computer vision and pattern recognition (CVPR), pp. 1–8
Tianyang M, Latecki LJ (2012) Maximum weight cliques with mutex constraints for video object segmentation, 2012 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), pp. 670–677S
Torralba A, Murphy K, Freeman W (2014) Sharing features: efficient boosting procedures for multiclass object detection. 2004 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), pp. 762–769
Tsai D, Flagg M, Nakazawa A, Rehg J (2012) Motion coherent tracking using multi-label MRF optimization. Int J Comput Vis 100:190–202
Article MathSciNet Google Scholar
Xu C, Xiong C, Corso JJ (2012) Streaming hierarchical video segmentation, computer vision–ECCV 2012, Springer, pp. 626-639
Zhang D, Javed O, Shah M (2014) Video object Co-segmentation by regulated maximum weight cliques, computer vision–ECCV 2014, Springer, pp. 551-566

Download references

Acknowledgments

This work is supported by National Natural Science Foundation of China (NSFC:61175026), Inte-rnational Science and Technology Cooperation Special Programme (No. 2013DFG12810), Ningbo Municipal Natural Science Foundation of China (2014A610031, 2014A610032), Open Research Fund of Zhejiang First-foremost Key Subject-Information and Communications Engineering of China(XKXL1316),C.Wong Magna Fund in Ningbo University,Open Fund of Zhejiang Provincial Key Academic Project(first level).

Author information

Authors and Affiliations

College of Information Science and Engineering, Ningbo University, Ningbo, Zhejiang, 315211, China
Lijun Guo, Tingting Cheng, Yuanjie Huang, Jieyu Zhao & Rong Zhang

Authors

Lijun Guo
View author publications
You can also search for this author in PubMed Google Scholar
Tingting Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Yuanjie Huang
View author publications
You can also search for this author in PubMed Google Scholar
Jieyu Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Rong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lijun Guo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, L., Cheng, T., Huang, Y. et al. Unsupervised video object segmentation by spatiotemporal graphical model. Multimed Tools Appl 76, 1037–1053 (2017). https://doi.org/10.1007/s11042-015-3100-9

Download citation

Received: 06 April 2015
Revised: 01 October 2015
Accepted: 17 November 2015
Published: 26 November 2015
Issue Date: January 2017
DOI: https://doi.org/10.1007/s11042-015-3100-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unsupervised video object segmentation by spatiotemporal graphical model

Abstract

Access this article

Similar content being viewed by others

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Deep learning for video object segmentation: a review

A survey on instance segmentation: state of the art

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Unsupervised video object segmentation by spatiotemporal graphical model

Abstract

Access this article

Similar content being viewed by others

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Deep learning for video object segmentation: a review

A survey on instance segmentation: state of the art

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation