Skip to main content

Pursuing Atomic Video Words by Information Projection

  • Conference paper
  • 3780 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 6493))

Abstract

In this paper, we study mathematical models of atomic visual patterns from natural videos and establish a generative visual vocabulary for video representation. Empirically, we employ small video patches (e.g., 15×15×5, called video “bricks”) in natural videos as basic analysis unit. There are a variety of brick subspaces (or atomic video words) of varying dimensions in the high dimensional brick space. The structures of the words are characterized by both appearance and motion dynamics. Here, we categorize the words into two pure types: structural video words (SVWs) and textural video words (TVWs). A common generative model is introduced to model these two type video words in a unified form. The representation power of a word is measured by its information gain, based on which words are pursued one by one via a novel pursuit algorithm, and finally a holistic video vocabulary is built up. Experimental results show the potential power of our framework for video representation.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Olshausen, B.A., Field, D.J.: Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609 (1996)

    Article  Google Scholar 

  2. Zhu, S.C., Guo, C.E., Wang, Y.Z., Xu, Z.J.: What are textons? IJCV (2005)

    Google Scholar 

  3. Shi, K., Zhu, S.C.: Mapping natural image patches by explicit and implicit manifolds. In: CVPR (2007)

    Google Scholar 

  4. Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: VS-PETS (2005)

    Google Scholar 

  5. Shechtman, E., Irani, M.: Space-time behavior-based correlation. PAMI (2007)

    Google Scholar 

  6. Zhu, S.C., Wu, Y.N., Mumford, D.: Filters, random-fields and maximum-entropy (frame): Towards a unified theory for texture modeling. IJCV 27, 107–126 (1998)

    Article  Google Scholar 

  7. Veenman, C., Reinders, M., Backer, E.: Resolving motion correspondence for densely moving points. PAMI (2001)

    Google Scholar 

  8. Olshausen, B.A.: Learning sparse, overcomplete representations of time-varying natural images. In: ICIP (2003)

    Google Scholar 

  9. Wu, Y., Hua, G., Yu, T.: Tracking articulated body by dynamic markov network. In: ICCV (2003)

    Google Scholar 

  10. Soatto, S., Doretto, G., Wu, Y.: Dynamic textures. In: ICCV (2001)

    Google Scholar 

  11. Wang, Y., Zhu, S.C.: Modeling textured motion: Particle, wave and sketch. In: ICCV (2003)

    Google Scholar 

  12. Belhumeur, P., Kriegman, D.: What is the set of images of an object under all possible illumination conditions? Int. Journal of Computer Vision 28, 245–260 (1998)

    Article  Google Scholar 

  13. Zhao, Y.D., Gong, H., Lin, L., Jia, Y.: Spatio-temporal patches for night background modeling by subspace learning. In: ICPR (2008)

    Google Scholar 

  14. Derpanis, K.G., Wildes, R.P.: Early spatiotemporal grouping with a distributed oriented energy representation. In: CVPR (2009)

    Google Scholar 

  15. Chan, A.B., Vasconcelos, N.: Modeling, clustering, and segmenting video with mixtures of dynamic textures. PAMI 30, 909–926 (2008)

    Article  Google Scholar 

  16. Wu, Y.N., Si, Z., Fleming, C., Zhu, S.C.: Deformable template as active basis. In: ICCV (2007)

    Google Scholar 

  17. Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: CVPR (2006)

    Google Scholar 

  18. Comaniciu, D., Meer, P.: Mean shift: A robust approach toward feature space analysis. PAMI (2002)

    Google Scholar 

  19. Chan, A.B., Vasconcelos, N.: Layered dynamic textures. PAMI 31, 1862–1879 (2009)

    Article  Google Scholar 

  20. Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science (2000)

    Google Scholar 

  21. Marszalk, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhao, Y., Gong, H., Jia, Y. (2011). Pursuing Atomic Video Words by Information Projection. In: Kimmel, R., Klette, R., Sugimoto, A. (eds) Computer Vision – ACCV 2010. ACCV 2010. Lecture Notes in Computer Science, vol 6493. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19309-5_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19309-5_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19308-8

  • Online ISBN: 978-3-642-19309-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics