Tracklet Descriptors for Action Modeling and Video Analysis

Raptis, Michalis; Soatto, Stefano

doi:10.1007/978-3-642-15549-9_42

Michalis Raptis¹⁹ &
Stefano Soatto¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 6311))

Included in the following conference series:

European Conference on Computer Vision

8995 Accesses
63 Citations

Abstract

We present spatio-temporal feature descriptors that can be inferred from video and used as building blocks in action recognition systems. They capture the evolution of “elementary action elements” under a set of assumptions on the image-formation model and are designed to be insensitive to nuisance variability (absolute position, contrast), while retaining discriminative statistics due to the fine-scale motion and the local shape in compact regions of the image. Despite their simplicity, these descriptors, used in conjunction with basic classifiers, attain state of the art performance in the recognition of actions in benchmark datasets.

Download to read the full chapter text

Chapter PDF

Research on Temporal Structure for Action Recognition

A Robust and Efficient Video Representation for Action Recognition

Article 17 July 2015

Heng Wang, Dan Oneata, … Cordelia Schmid

Local polynomial space–time descriptors for action classification

Article 18 December 2014

Olivier Kihl, David Picard & Philippe-Henri Gosselin

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Bay, H., Tuytelaars, T., Van Gool, L.: Surf: Speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006)
Chapter Google Scholar
Birchfield, S.: Klt: An implementation of the kanade-lucas-tomasi feature tracker (1996)
Google Scholar
Bobick, A., Davis, J.: The recognition of human movement using temporal templates. IEEE Trans. on Pattern Anal. and Machine Intell. (2001)
Google Scholar
Chen, M., Mummert, L., Pillai, P., Hauptmann, A., Sukthankar, R.: Exploiting multi-level parallelism for low-latency activity recognition in streaming video. In: Proc. of the First Annual ACM SIGMM Conf. on Multimedia systems. ACM, New York (2010)
Google Scholar
Csurka, G., Dance, C.R., Dan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Proc. of the Eur. Conf. on Computer Vision, ECCV (2004)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proc. IEEE Conf. on Computer Vision and Pattern Recongition (2005)
Google Scholar
Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: VS-PETS (October 2005)
Google Scholar
Efros, A., Berg, A., Mori, G., Malik, J.: Recognizing action at a distance. In: Proc. Intl. Conf. on Computer Vision (2003)
Google Scholar
Frey, B., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972 (2007)
Article MathSciNet Google Scholar
Harris, C., Stephens, M.: A combined corner and edge detector. In: Alvey Vision Conference, Manchester, UK, vol. 15, p. 50 (1988)
Google Scholar
Jhuang, H., Serre, T., Wolf, L., Poggio, T.: A biologically inspired system for action recognition. In: Proc. Intl. Conf. on Computer Vision (2007)
Google Scholar
Johansson, G.: Visual perception of biological motion and a model for its analysis. Perceiving events and objects (1973)
Google Scholar
Kaâniche, M., Brémond, F.: Gesture recognition by learning local motion signatures. In: Proc. Conf. Computer Vision and Pattern Recognition (2010)
Google Scholar
Kläser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3dgradients. In: British Machine Vision Conference, September 2008, pp. 995–1004 (2008)
Google Scholar
Kumar, M., Patel, N., Woo, J.: Clustering seasonality patterns in the presence of errors. In: Proceedings of the Eighth ACM SIGKDD (2002)
Google Scholar
Laptev, I.: On space-time interest points. Intl. J. of Comp. Vis. 64(2), 107–123 (2005)
Article MathSciNet Google Scholar
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proc. Conf. Computer Vision and Pattern Recognition (2008)
Google Scholar
Laptev, I., Pérez, P.: Retrieving actions in movies. In: Proc. Intl. Conf. on Computer Vision (2007)
Google Scholar
Lee, T., Soatto, S.: An end-to-end visual recognition system. Technical Report UCLA-CSD-100008 (February 10, 2010) (revised March 18, 2010)
Google Scholar
Lin, Z., Jiang, Z., Davis, L.: Recognizing actions by shape-motion prototype trees. In: Proc. Intl. Conf. on Computer Vision (2009)
Google Scholar
Liu, J., Luo, J., Shah, M.: Recognizing Realistic Actions from Videos “in the Wild”. In: Proc. IEEE Computer Vision and Pattern Recognition (2009)
Google Scholar
Liu, J., Shah, M.: Learning human actions via information maximization. In: Proc. IEEE Conf. on Computer Vision and Pattern Recongition (2008)
Google Scholar
Lowe, D.: Object recognition from local scale-invariant features. In: Intl. Conf. on Computer Vision (1999)
Google Scholar
Lucas, B., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proc. 7th Int. Joint Conf. on Art. Intell. (1981)
Google Scholar
Matikainen, P., Hebert, M., Sukthankar, R.: Trajectons: Action recognition through the motion analysis of tracked features. In: ICCV workshop on Videooriented Objected and Event Classification (2009)
Google Scholar
Messing, R., Pal, C.: Behavior recognition in video with extended models of feature velocity dynamics. In: AAAI Spring Symposium Technical Report (2009)
Google Scholar
Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: Intl. Conf. on Computer Vision (2009)
Google Scholar
Niebles, J., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. Intl. J. of Comp. Vis. 79(3) (2008)
Google Scholar
Nowozin, S., Bakir, G., Tsuda, K.: Discriminative subsequence mining for action classification. In: Proc. Intl. Conf. on Computer Vision (2007)
Google Scholar
Rabiner, L., Juang, B.: Fundamentals of speech recognition. Prentice Hall, Englewood Cliffs (1993)
Google Scholar
Robert, C.P.: The Bayesian Choice. Springer, New York (2001)
MATH Google Scholar
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing 26(1), 43–49 (1978)
Article MATH Google Scholar
Schindler, K., Van Gool, L.: Action snippets: How many frames does human action recognition require? In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (2008)
Google Scholar
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local SVM approach. In: Proc. Intl. Conf. on Pattern Recognition (2004)
Google Scholar
Shi, J., Tomasi, C.: Good features to track. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (1994)
Google Scholar
Soatto, S.: Towards a mathematical theory of visual information (2010)
Google Scholar
Soatto, S., Yezzi, A.: Deformotion: deforming motion, shape average and the joint segmentation and registration of images. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2352, pp. 32–47. Springer, Heidelberg (2002)
Chapter Google Scholar
Sun, J., Wu, X., Yan, S., Cheong, L., Chua, T., Li, J.: Hierarchical spatio-temporal context modeling for action recognition. In: Proc. IEEE Conf. Computer Vision and Pattern Recognition (2009)
Google Scholar
Veeraraghavan, A., Chellappa, R., Roy-Chowdhury, A.: The function space of an activity. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (2006)
Google Scholar
Wang, H., Ullah, M.M., Kläser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: British Machine Vision Conference (2009)
Google Scholar
Yao, B., Zhu, S.: Learning Deformable Action Templates from Cluttered Videos. In: Intl. Conf. on Computer Vision (2009)
Google Scholar
Yeffet, L., Wolf, L.: Local trinary patterns for human action recognition. In: Proc. Intl. Conf. on Computer Vision (2009)
Google Scholar
Zelnik-Manor, L., Irani, M.: Statistical analysis of dynamic actions. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(9), 1530–1535 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of California, Los Angeles
Michalis Raptis & Stefano Soatto

Authors

Michalis Raptis
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Soatto
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

GRASP Laboratory, University of Pennsylvania, 3330 Walnut Street, 19104, Philadelphia, PA, USA
Kostas Daniilidis
School of Electrical and Computer Engineering, National Technical University of Athens, 15773, Athens, Greece
Petros Maragos
Department of Applied Mathematics, Ecole Centrale de Paris, Grande Voie des Vignes, 92295, Chatenay-Malabry, France
Nikos Paragios

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Raptis, M., Soatto, S. (2010). Tracklet Descriptors for Action Modeling and Video Analysis. In: Daniilidis, K., Maragos, P., Paragios, N. (eds) Computer Vision – ECCV 2010. ECCV 2010. Lecture Notes in Computer Science, vol 6311. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15549-9_42

Download citation

DOI: https://doi.org/10.1007/978-3-642-15549-9_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15548-2
Online ISBN: 978-3-642-15549-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Tracklet Descriptors for Action Modeling and Video Analysis

Abstract

Chapter PDF

Similar content being viewed by others

Research on Temporal Structure for Action Recognition

A Robust and Efficient Video Representation for Action Recognition

Local polynomial space–time descriptors for action classification

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Tracklet Descriptors for Action Modeling and Video Analysis

Abstract

Chapter PDF

Similar content being viewed by others

Research on Temporal Structure for Action Recognition

A Robust and Efficient Video Representation for Action Recognition

Local polynomial space–time descriptors for action classification

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation