Pursuing Atomic Video Words by Information Projection

Zhao, Youdong; Gong, Haifeng; Jia, Yunde

doi:10.1007/978-3-642-19309-5_20

Pursuing Atomic Video Words by Information Projection

Youdong Zhao¹⁹,
Haifeng Gong²⁰ &
Yunde Jia¹⁹

Conference paper

3780 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 6493))

Abstract

In this paper, we study mathematical models of atomic visual patterns from natural videos and establish a generative visual vocabulary for video representation. Empirically, we employ small video patches (e.g., 15×15×5, called video “bricks”) in natural videos as basic analysis unit. There are a variety of brick subspaces (or atomic video words) of varying dimensions in the high dimensional brick space. The structures of the words are characterized by both appearance and motion dynamics. Here, we categorize the words into two pure types: structural video words (SVWs) and textural video words (TVWs). A common generative model is introduced to model these two type video words in a unified form. The representation power of a word is measured by its information gain, based on which words are pursued one by one via a novel pursuit algorithm, and finally a holistic video vocabulary is built up. Experimental results show the potential power of our framework for video representation.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Olshausen, B.A., Field, D.J.: Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609 (1996)
Article Google Scholar
Zhu, S.C., Guo, C.E., Wang, Y.Z., Xu, Z.J.: What are textons? IJCV (2005)
Google Scholar
Shi, K., Zhu, S.C.: Mapping natural image patches by explicit and implicit manifolds. In: CVPR (2007)
Google Scholar
Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: VS-PETS (2005)
Google Scholar
Shechtman, E., Irani, M.: Space-time behavior-based correlation. PAMI (2007)
Google Scholar
Zhu, S.C., Wu, Y.N., Mumford, D.: Filters, random-fields and maximum-entropy (frame): Towards a unified theory for texture modeling. IJCV 27, 107–126 (1998)
Article Google Scholar
Veenman, C., Reinders, M., Backer, E.: Resolving motion correspondence for densely moving points. PAMI (2001)
Google Scholar
Olshausen, B.A.: Learning sparse, overcomplete representations of time-varying natural images. In: ICIP (2003)
Google Scholar
Wu, Y., Hua, G., Yu, T.: Tracking articulated body by dynamic markov network. In: ICCV (2003)
Google Scholar
Soatto, S., Doretto, G., Wu, Y.: Dynamic textures. In: ICCV (2001)
Google Scholar
Wang, Y., Zhu, S.C.: Modeling textured motion: Particle, wave and sketch. In: ICCV (2003)
Google Scholar
Belhumeur, P., Kriegman, D.: What is the set of images of an object under all possible illumination conditions? Int. Journal of Computer Vision 28, 245–260 (1998)
Article Google Scholar
Zhao, Y.D., Gong, H., Lin, L., Jia, Y.: Spatio-temporal patches for night background modeling by subspace learning. In: ICPR (2008)
Google Scholar
Derpanis, K.G., Wildes, R.P.: Early spatiotemporal grouping with a distributed oriented energy representation. In: CVPR (2009)
Google Scholar
Chan, A.B., Vasconcelos, N.: Modeling, clustering, and segmenting video with mixtures of dynamic textures. PAMI 30, 909–926 (2008)
Article Google Scholar
Wu, Y.N., Si, Z., Fleming, C., Zhu, S.C.: Deformable template as active basis. In: ICCV (2007)
Google Scholar
Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: CVPR (2006)
Google Scholar
Comaniciu, D., Meer, P.: Mean shift: A robust approach toward feature space analysis. PAMI (2002)
Google Scholar
Chan, A.B., Vasconcelos, N.: Layered dynamic textures. PAMI 31, 1862–1879 (2009)
Article Google Scholar
Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science (2000)
Google Scholar
Marszalk, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Beijing Institute of Technology, Beijing, 100081, China
Youdong Zhao & Yunde Jia
GRASP Lab., University of Pennsylvania, Philadelphia, PA, 19104, USA
Haifeng Gong

Authors

Youdong Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Haifeng Gong
View author publications
You can also search for this author in PubMed Google Scholar
Yunde Jia
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Technion, Israel Institute of Technology, 32000, Haifa, Israel
Ron Kimmel
The University of Auckland, 37 Kohimarama Road, Mission Bay, 1071, Auckland, New Zealand
Reinhard Klette
National Institute of Informatics, 1018430, Chiyoda, Tokyo, Japan
Akihiro Sugimoto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, Y., Gong, H., Jia, Y. (2011). Pursuing Atomic Video Words by Information Projection. In: Kimmel, R., Klette, R., Sugimoto, A. (eds) Computer Vision – ACCV 2010. ACCV 2010. Lecture Notes in Computer Science, vol 6493. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19309-5_20

Download citation

DOI: https://doi.org/10.1007/978-3-642-19309-5_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19308-8
Online ISBN: 978-3-642-19309-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics