Video parsing via spatiotemporally analysis with images

Li, Xuelong; Mou, Lichao; Lu, Xiaoqiang

doi:10.1007/s11042-015-2735-x

Video parsing via spatiotemporally analysis with images

Published: 07 July 2015

Volume 75, pages 11961–11976, (2016)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Xuelong Li¹,
Lichao Mou¹ &
Xiaoqiang Lu¹

221 Accesses
4 Citations
Explore all metrics

Abstract

Effective parsing of video through the spatial and temporal domains is vital to many computer vision problems because it is helpful to automatically label objects in video instead of manual fashion, which is tedious. Some literatures propose to parse the semantic information on individual 2D images or individual video frames, however, these approaches only take use of the spatial information, ignore the temporal continuity information and fail to consider the relevance of frames. On the other hand, some approaches which only consider the spatial information attempt to propagate labels in the temporal domain for parsing the semantic information of the whole video, yet the non-injective and non-surjective natures can cause the black hole effect. In this paper, inspirited by some annotated image datasets (e.g., Stanford Background Dataset, LabelMe, and SIFT-FLOW), we propose to transfer or propagate such labels from images to videos. The proposed approach consists of three main stages: I) the posterior category probability density function (PDF) is learned by an algorithm which combines frame relevance and label propagation from images. II) the prior contextual constraint PDF on the map of pixel categories through whole video is learned by the Markov Random Fields (MRF). III) finally, based on both learned PDFs, the final parsing results are yielded up to the maximum a posterior (MAP) process which is computed via a very efficient graph-cut based integer optimization algorithm. The experiments show that the black hole effect can be effectively handled by the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient frame-sequential label propagation for video object segmentation

Article 01 March 2017

Unsupervised video object segmentation using conditional random fields

Article 26 June 2018

Video object segmentation by integrating trajectories from points and regions

Article 22 August 2014

Notes

¹ A partition of set \(\mathbf {r}^{t}=\{{r_{1}^{t}}, {r_{2}^{t}}, \cdots , r_{n_{t}}^{t}\}\) is equal to a collection of sets \(R_{i}\subset \mathbf {r}^{t}\), where i=1,2,⋯ ,k, i≠j and \(\cup _{i=1}^{\infty } R_{i}=\mathbf {r}^{t}\).
² In this paper, a clique is a set of superpixels that are adjacent neighbors of one another or either a single superpixel.
³ The impulse function is, δ(c)=0, for c≠0, and δ(c)=1, when c=0.

References

Bai X, Sapiro G (2009) Geodesic matting: a framework for fast interactive image and video segmentation and matting. Int J Comput Vis 82:113–132
Article Google Scholar
Baker S, Roth S, Scharstein D, Black M, Lewis J, Szeliski R (2007) A database and evaluation methodology for optical flow. In: Proceedings of international conference on computer vision
Besag J (1974) Spatial interaction and the statistical analysis of lattice systems. in J R Stat Soc B 36:192–236
MathSciNet MATH Google Scholar
Boykov Y, Veksler O, Zabih R (2001) Efficient approximate energy minimization via graph cuts. IEEE Trans Pattern Anal Mach Intell 23:1222–1239
Article Google Scholar
Boykov Y, Kolmogorov V (2004) An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans Pattern Anal Mach Intell 26:1124–1137
Article MATH Google Scholar
Chen X, Jin X, Wang K (2014) Lighting virtual objects in a single image via coarse scene understanding. Sci China Inf Sci 57(9):092105(14)
Article Google Scholar
Chuang Y, Agarwala A, Curless B, Salesin D, Szeliski R (2002) Video matting of complex scenes. In: Proceedings of ACM SIGGRAPH
Criminisi A, Cross G, Blake A, Kolmogorov V (2006) Bilayer segmentation of live video. In: Proceedings of internaltional conference on computer vision and pattern recogintion
Ess A, Mueller T, Grabner H, van Gool L (2009) Segmentation-based urban traffic scene understanding. In: Proceedings of British machine vision conference
Fauqueur J, Brostow G, Cipolla R (2007) Assisted video object labeling by joint tracking of regions and keypoints. In: Proceedings of internaltional conference on computer vision
Geman S, Geman D (1984) Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6:721–741
Article MATH Google Scholar
Gould S, Fulton R, Koller D (2009) Decomposing a scene into geo-metric and semantically consistent regions. In: Proceedings of international conference on computer vision
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of internaltional conference on computer vision and pattern recogintion
Kolmogorov V, Zabih R (2004) What energy functions can be minimized via graph cuts. IEEE Trans Pattern Anal Mach Intell 26:147–159
Article MATH Google Scholar
Kolmogorov (2006) Convergent tree-reweighted message passing for energy minimization. IEEE Trans Pattern Anal Mach Intell 28:1568–1583
Article Google Scholar
Ladicky L, Sturgess P, Russell C, Sengupta S, Bastan-lar Y, Clocksin W, Torr P (2010) Joint optimization for object class segmentation and dense stereo reconstruction. Int J Comput Vis:1–12
Lee H, Battle A, Raina R, Ng AY (2006) Efficient sparse coding algorithms. In: Proceedings of neural information processing systems
Li X, Mou L, Lu X (2014) Scene parsing from an MAP perspective. IEEE Trans Cybern. doi:10.1109/TCYB.2014.2361489
Liu Y, Liu Y, Chan K (2011) Tensor-based locally maximum margin classifier for image and video classification. Comput Vis Image Understand 115:1762–1771
Article Google Scholar
Liu C, Yuen J, Torralba A (2011) Nonparametric scene parsing via label transfer. IEEE Trans Pattern Anal Mach Intell 33:2368–2382
Article Google Scholar
Lu X, Li X, Mou L (2014) Semi-supervised multi-task learning for scene recognition. IEEE Trans Cybern. doi:10.1109/TCYB.2014.2362959
Malisiewicz T, Gupta A, Efros A A (2011) Ensemble of exemplar-SVMs for object detection and beyond. In: Proceedings of internaltional conference on computer vision
Mou L, Lu X, Yuan Y (2013) Object or background: whose call is it in complicated scene classification? In: Proceedings of IEEE China summit and international conference on signal and information processing
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42:145–175
Article MATH Google Scholar
Robertson N, Reid (2006) A general method for human activity recognition in video. Comput Vis Image Understand 104:232–248
Article Google Scholar
Russell B, Torralba A, Murphy K, Freeman W (2008) LabelMe: a database and web-based tool for image annotation. Int J Comput Vis 77:157–173
Article Google Scholar
Shao L, Simon J, Li X (2014) Efficient search and localization of human actions in video databases. IEEE Trans Circuits Syst Video Techn 24:504–512
Article Google Scholar
Theriault C, Thome N, Cord M (2013) Dynamic scene classification: learning motion descriptors with slow features analysis. In: Proceedings of internaltional conference on computer vision and pattern recogintion
Tighe J, Lazebnik S (2013) Finding things: image parsing with regions and per-exemplar detectors. In: Proceedings of internaltional conference on computer vision and pattern recogintion
Tighe J, Lazebnik S (2013) Superparsing - scalable nonparametric image parsing with superpixels. Int J Comput Vis 101:329–349
Article MathSciNet Google Scholar
Wang J, Cohen M (2005) An iterative optimization approach for unified image segmentation and matting. In: Proceedings of international conference on computer vision
Xiao J, Hays J, Ehinger K, Oliva A, Torralba A (2010) SUN database: large-scale scene recognition from abbey to zoo. In: Proceedings of international conference on computer vision and pattern recogintion
Yang X, Gao X, Tao D, Li X, Li J (2015) Object or an efficient MRF embedded level set method for image segmentation. IEEE Trans Image Process 24:9–21
Article MathSciNet Google Scholar
Yedidia J, Freeman W, Weiss Y (2000) Generalized belief propagation. In: Proceedings of neural information processing systems
Yedidia J, Freeman W, Weiss Y (2003) Understanding belief propagation and its generalizations. Explor Artif Intell New Millennium 8:236–239
Google Scholar
Yedidia J S, Freeman W T, Weiss Y (2005) Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans Inf Theory 51:2282–2312
Article MathSciNet MATH Google Scholar
Yuan Y, Mou L, Lu X (2015) Scene recognition by manifold regularized deep learning architecture. IEEE Trans Neural Netw Learn Syst. doi:10.1109/TNNLS.2014.2359471
Zhang C, Wang L, Yang R (2010) Semantic segmentation of urban scenes using dense depth maps. In: Proceedings of European conference on computer vision

Download references

Acknowledgments

This work was supported in part by the National Basic Research Program of China (973 Program) under Grant 2012CB719905, in part by the National Natural Science Foundation of China under Grant 61472413, in part by Chinese Academy of Sciences under Grant LSIT201408 and in part by the Key Research Program of the Chinese Academy of Sciences under Grant KGZD-EW-T03.

Author information

Authors and Affiliations

Center for OPTical IMagery Analysis and Learning (OPTIMAL), State Key Laboratory of Transient Optics and Photonics, Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an, 710119, Shaanxi, People’s Republic of China
Xuelong Li, Lichao Mou & Xiaoqiang Lu

Authors

Xuelong Li
View author publications
You can also search for this author in PubMed Google Scholar
Lichao Mou
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoqiang Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoqiang Lu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, X., Mou, L. & Lu, X. Video parsing via spatiotemporally analysis with images. Multimed Tools Appl 75, 11961–11976 (2016). https://doi.org/10.1007/s11042-015-2735-x

Download citation

Received: 25 January 2015
Revised: 04 June 2015
Accepted: 05 June 2015
Published: 07 July 2015
Issue Date: October 2016
DOI: https://doi.org/10.1007/s11042-015-2735-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Video parsing via spatiotemporally analysis with images

Abstract

Access this article

Similar content being viewed by others

Efficient frame-sequential label propagation for video object segmentation

Unsupervised video object segmentation using conditional random fields

Video object segmentation by integrating trajectories from points and regions

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Video parsing via spatiotemporally analysis with images

Abstract

Access this article

Similar content being viewed by others

Efficient frame-sequential label propagation for video object segmentation

Unsupervised video object segmentation using conditional random fields

Video object segmentation by integrating trajectories from points and regions

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation