Abstract
We present a novel mixture of trees probabilistic graphical model for semi-supervised video segmentation. Each component in this mixture represents a tree structured temporal linkage between super-pixels from the first to the last frame of a video sequence. We provide a variational inference scheme for this model to estimate super-pixel labels, their corresponding confidences, as well as the confidences in the temporal linkages. Our algorithm performs inference over full video volume which helps to avoid erroneous label propagation caused by using short time-window processing. In addition, our proposed inference scheme is very efficient both in terms of computational speed and use of RAM and so can be applied in real-time video segmentation scenarios. We bring out the pros and cons of our approach using extensive quantitative comparisons on challenging binary and multi-class video segmentation datasets.









Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., & Snsstrunk, S. (2010). Slic superpixels. Technical report, EPFL Technical Report no. 149300.
Badrinarayanan, V., Galasso, F., & Cipolla, R. (2010). Label propagation in video sequences. In CVPR.
Badrinarayanan, V., Budvytis, I., & Cipolla, R. (2013). Semi-supervised video segmentation using tree structured graphical models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 2751–2764.
Bai, X., Wang, J., Simons, D., & Sapiro, G. (2009). Video snapcut: Robust video object cutout using localized classifiers. ACM Transactions on Graphics, 28, 70:1–70:11.
Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer.
Boykov, Y., & Jolly, M. P. (2001). Interactive graph cuts for optimal boundary and region segmentation of objects in n–d images. In ICCV.
Boykov, Y., Veksler, O., & Zabih, R. (1999). Fast approximate energy minimization via graph cuts. In ICCV.
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.
Brostow, G., Fauqueur, J., & Cipolla, R. (2009). Semantic object classes in video: A high-definition ground truth database. Pattern Recognition Letters, 30(2), 88–97.
Brox, T., & Malik, J. (2010). Object segmentation by long term analysis of point trajectories. In ECCV.
Budvytis, I., Badrinarayanan, V., & Cipolla, R. (2010). Label propagation in complex video sequences using semi-supervised learning. In BMVC.
Budvytis, I., Badrinarayanan, V., & Cipolla, R. (2011). Semi-supervised video segmentation using tree structured graphical models. In CVPR.
Budvytis, I., Badrinarayanan, V., & Cipolla, R. (2012) Mot: Mixture of trees probabilistic graphical model for video segmentation. In BMVC.
Chen, A. Y. C., & Corso,J. J. (2010). Propagating multi-class pixel labels throughout video frames. In Proceedings of Western New York Image Processing Workshop.
Cheng, H. -T., & Ahuja, N. (2012). Exploiting nonlocal spatiotemporal structure for video segmentation. In CVPR.
Cheung, V., Frey, B. J., & Jojic, N. (2005). Video epitomes. In CVPR.
Chockalingam, P., Pradeep, N., & Birchfield, S. (2009). Adaptive fragments-based tracking of non-rigid objects using level sets. In ICCV.
Chuang, Y., Agarwala, A., Curless, B., Salesin, D. H., & Szeliski, R. (2002). Video matting of complex scenes. ACM Transactions on Graphics, 21(3), 243–248.
Criminisi, A., & Shotton, J. (Eds.). (2013). Decision forests in computer vision and medical image analysis. Advances in computer vision and pattern recognition. Berlin: Springer.
Criminisi, A., Sharp, T., Rother, C., & Perez, P. (2010). Geodesic image and video editing. ACM Transactions on Graphics, 29(5), 1–15.
Fathi, A., Balcan, M., Ren, X., & Rehg, J. M. (2011). Combining self training and active learning for video segmentation. In BMVC.
Grundmann, M., Kwatra, V., Han, M., & Essa, I. (2010). Efficient hierarchical graph-based video segmentation. In CVPR.
Kannan, A., Winn, J., & Rother, C. (2006). Clustering appearance and shape by learning jigsaws. In NIPS, (Vol. 19).
Kohli, P., & Torr, P.H.S. (2005). Efficiently solving dynamic markov random fields using graph cuts. In ICCV, (Vol. II, pp. 922–929).
Lee, K.C., Ho, J., Yang, M.H., & Kriegman, D. (2003). Video-based face recognition using probabilistic appearance manifolds. In CVPR, Madison, WI.
Lezama, J., Alahari, K., Sivic, J., & Laptev, I. (2011). Track to the future: Spatio-temporal video segmentation with long-range motion cues. In CVPR.
Nagaraja, N.S., Ochs, P., Liu, K., & Brox, T. (2012). Hierarchy of localized random forests for video annotation. In Pattern Recognition (Proceedings of DAGM), Springer, LNCS.
Saul, L. K., & Jordan, M. I. (1996). Exploiting tractable substructures in intractable networks. In NIPS.
Settles, B. (2012). Active learning literature survey. Technical report, Computer Sciences Technical Report 1648. University of Wisconsin Madison.
Shotton, J., Johnson, M., & Cipolla, R. (2008). Semantic texton forests for image categorization and segmentation. In CVPR.
Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2006). Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In ECCV.
Tsai, D., Flagg, M., & Rehg, J. M. (2010). Motion coherent tracking with multi-label mrf optimization. In BMVC.
Turner, R. E., Berkes, P., & Sahani, M. (2008). Two problems with variational expectation maximisation for time-series models. In Workshop on Inference and Estimation in Probabilistic Time-Series Models.
Vazquez-Reina, A., Avidan, S., Pfister, H., & Miller, E. (2010). Multiple hypothesis video segmentation from superpixel flows. In ECCV.
Vijayanarasimhan, S., & Grauman, K. (2012). Active frame selection for label propagation in videos. In ECCV.
Wang, T., & Collomosse, J. (2012). Progressive motion diffusion of labeling priors for coherent video segmentation. IEEE Transactions on Multimedia, 14(2), 389–400.
Xu, C., Xiong, C., & Jason J. C. (2012). Streaming hierarchical video segmentation. In ECCV.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary material 2 (avi 20899 KB)
Rights and permissions
About this article
Cite this article
Badrinarayanan, V., Budvytis, I. & Cipolla, R. Mixture of Trees Probabilistic Graphical Model for Video Segmentation. Int J Comput Vis 110, 14–29 (2014). https://doi.org/10.1007/s11263-013-0673-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-013-0673-5