Skip to main content
Log in

Mixture of Trees Probabilistic Graphical Model for Video Segmentation

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

We present a novel mixture of trees probabilistic graphical model for semi-supervised video segmentation. Each component in this mixture represents a tree structured temporal linkage between super-pixels from the first to the last frame of a video sequence. We provide a variational inference scheme for this model to estimate super-pixel labels, their corresponding confidences, as well as the confidences in the temporal linkages. Our algorithm performs inference over full video volume which helps to avoid erroneous label propagation caused by using short time-window processing. In addition, our proposed inference scheme is very efficient both in terms of computational speed and use of RAM and so can be applied in real-time video segmentation scenarios. We bring out the pros and cons of our approach using extensive quantitative comparisons on challenging binary and multi-class video segmentation datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., & Snsstrunk, S. (2010). Slic superpixels. Technical report, EPFL Technical Report no. 149300.

  • Badrinarayanan, V., Galasso, F., & Cipolla, R. (2010). Label propagation in video sequences. In CVPR.

  • Badrinarayanan, V., Budvytis, I., & Cipolla, R. (2013). Semi-supervised video segmentation using tree structured graphical models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 2751–2764.

    Article  Google Scholar 

  • Bai, X., Wang, J., Simons, D., & Sapiro, G. (2009). Video snapcut: Robust video object cutout using localized classifiers. ACM Transactions on Graphics, 28, 70:1–70:11.

    Article  Google Scholar 

  • Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer.

    MATH  Google Scholar 

  • Boykov, Y., & Jolly, M. P. (2001). Interactive graph cuts for optimal boundary and region segmentation of objects in n–d images. In ICCV.

  • Boykov, Y., Veksler, O., & Zabih, R. (1999). Fast approximate energy minimization via graph cuts. In ICCV.

  • Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.

    Article  MATH  Google Scholar 

  • Brostow, G., Fauqueur, J., & Cipolla, R. (2009). Semantic object classes in video: A high-definition ground truth database. Pattern Recognition Letters, 30(2), 88–97.

    Article  Google Scholar 

  • Brox, T., & Malik, J. (2010). Object segmentation by long term analysis of point trajectories. In ECCV.

  • Budvytis, I., Badrinarayanan, V., & Cipolla, R. (2010). Label propagation in complex video sequences using semi-supervised learning. In BMVC.

  • Budvytis, I., Badrinarayanan, V., & Cipolla, R. (2011). Semi-supervised video segmentation using tree structured graphical models. In CVPR.

  • Budvytis, I., Badrinarayanan, V., & Cipolla, R. (2012) Mot: Mixture of trees probabilistic graphical model for video segmentation. In BMVC.

  • Chen, A. Y. C., & Corso,J. J. (2010). Propagating multi-class pixel labels throughout video frames. In Proceedings of Western New York Image Processing Workshop.

  • Cheng, H. -T., & Ahuja, N. (2012). Exploiting nonlocal spatiotemporal structure for video segmentation. In CVPR.

  • Cheung, V., Frey, B. J., & Jojic, N. (2005). Video epitomes. In CVPR.

  • Chockalingam, P., Pradeep, N., & Birchfield, S. (2009). Adaptive fragments-based tracking of non-rigid objects using level sets. In ICCV.

  • Chuang, Y., Agarwala, A., Curless, B., Salesin, D. H., & Szeliski, R. (2002). Video matting of complex scenes. ACM Transactions on Graphics, 21(3), 243–248.

    Article  Google Scholar 

  • Criminisi, A., & Shotton, J. (Eds.). (2013). Decision forests in computer vision and medical image analysis. Advances in computer vision and pattern recognition. Berlin: Springer.

  • Criminisi, A., Sharp, T., Rother, C., & Perez, P. (2010). Geodesic image and video editing. ACM Transactions on Graphics, 29(5), 1–15.

    Google Scholar 

  • Fathi, A., Balcan, M., Ren, X., & Rehg, J. M. (2011). Combining self training and active learning for video segmentation. In BMVC.

  • Grundmann, M., Kwatra, V., Han, M., & Essa, I. (2010). Efficient hierarchical graph-based video segmentation. In CVPR.

  • Kannan, A., Winn, J., & Rother, C. (2006). Clustering appearance and shape by learning jigsaws. In NIPS, (Vol. 19).

  • Kohli, P., & Torr, P.H.S. (2005). Efficiently solving dynamic markov random fields using graph cuts. In ICCV, (Vol. II, pp. 922–929).

  • Lee, K.C., Ho, J., Yang, M.H., & Kriegman, D. (2003). Video-based face recognition using probabilistic appearance manifolds. In CVPR, Madison, WI.

  • Lezama, J., Alahari, K., Sivic, J., & Laptev, I. (2011). Track to the future: Spatio-temporal video segmentation with long-range motion cues. In CVPR.

  • Nagaraja, N.S., Ochs, P., Liu, K., & Brox, T. (2012). Hierarchy of localized random forests for video annotation. In Pattern Recognition (Proceedings of DAGM), Springer, LNCS.

  • Saul, L. K., & Jordan, M. I. (1996). Exploiting tractable substructures in intractable networks. In NIPS.

  • Settles, B. (2012). Active learning literature survey. Technical report, Computer Sciences Technical Report 1648. University of Wisconsin Madison.

  • Shotton, J., Johnson, M., & Cipolla, R. (2008). Semantic texton forests for image categorization and segmentation. In CVPR.

  • Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2006). Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In ECCV.

  • Tsai, D., Flagg, M., & Rehg, J. M. (2010). Motion coherent tracking with multi-label mrf optimization. In BMVC.

  • Turner, R. E., Berkes, P., & Sahani, M. (2008). Two problems with variational expectation maximisation for time-series models. In Workshop on Inference and Estimation in Probabilistic Time-Series Models.

  • Vazquez-Reina, A., Avidan, S., Pfister, H., & Miller, E. (2010). Multiple hypothesis video segmentation from superpixel flows. In ECCV.

  • Vijayanarasimhan, S., & Grauman, K. (2012). Active frame selection for label propagation in videos. In ECCV.

  • Wang, T., & Collomosse, J. (2012). Progressive motion diffusion of labeling priors for coherent video segmentation. IEEE Transactions on Multimedia, 14(2), 389–400.

    Article  Google Scholar 

  • Xu, C., Xiong, C., & Jason J. C. (2012). Streaming hierarchical video segmentation. In ECCV.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vijay Badrinarayanan.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Cite this article

Badrinarayanan, V., Budvytis, I. & Cipolla, R. Mixture of Trees Probabilistic Graphical Model for Video Segmentation. Int J Comput Vis 110, 14–29 (2014). https://doi.org/10.1007/s11263-013-0673-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-013-0673-5

Keywords

Navigation