Language-Motivated Approaches to Action Recognition

Malgireddy, Manavender R.; Nwogu, I.; Govindaraju, V.

doi:10.1007/978-3-319-57021-1_5

Language-Motivated Approaches to Action Recognition

Manavender R. Malgireddy⁷,
I. Nwogu⁷ &
V. Govindaraju⁷

Chapter
First Online: 20 July 2017

2164 Accesses

Part of the book series: The Springer Series on Challenges in Machine Learning ((SSCML))

Abstract

We present language-motivated approaches to detecting, localizing and classifying activities and gestures in videos. In order to obtain statistical insight into the underlying patterns of motions in activities, we develop a dynamic, hierarchical Bayesian model which connects low-level visual features in videos with poses, motion patterns and classes of activities. This process is somewhat analogous to the method of detecting topics or categories from documents based on the word content of the documents, except that our documents are dynamic. The proposed generative model harnesses both the temporal ordering power of dynamic Bayesian networks such as hidden Markov models (HMMs) and the automatic clustering power of hierarchical Bayesian models such as the latent Dirichlet allocation (LDA) model. We also introduce a probabilistic framework for detecting and localizing pre-specified activities (or gestures) in a video sequence, analogous to the use of filler models for keyword detection in speech processing. We demonstrate the robustness of our classification model and our spotting framework by recognizing activities in unconstrained real-life video sequences and by spotting gestures via a one-shot-learning approach.

Editors: Isabelle Guyon and Vassilis Athitsos.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
When referring to activity spotting purposes, we use the term gestures instead of activities, only to be consistent with the terminology of the ChaLearn Gesture Challenge.
2.
Implementation can be found at http://www.irisa.fr/vista/Equipe/People/Laptev/download.html#stip.
3.
States are modeled as multinomials since our input observables are discrete values.

References

J.K. Aggarwal, M.S. Ryoo, Human activity analysis: a review. ACM Comput. Surv. 43, 1–16 (2011)
Article Google Scholar
Y. Benabbas, A. Lablack, N. Ihaddadene, C. Djeraba, Action recognition using direction models of motion, in Proceedings of the 2010 International Conference on Pattern Recognition, 2010, pp. 4295–4298
Google Scholar
H. Bilen, V.P. Namboodiri, L. Van Gool, Action recognition: a region based approach, in Proceedings of the 2011 IEEE Workshop on the Applications of Computer Vision, 2011, pp. 294–300
Google Scholar
David M. Blei, Andrew Y. Ng, Michael I. Jordan, Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
M. Bregonzio, S. Gong, T. Xiang, Recognising action as clouds of space-time interest points, in Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 1948–1955
Google Scholar
ChaLearn. ChaLearn Gesture Dataset (CGD2011), ChaLearn, California, 2011. http://gesture.chalearn.org/2011-one-shot-learning
N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in Proceedings of the 2005 IEEE Conference on Computer Vision and Pattern Recognition, 2005, pp. 886–893
Google Scholar
K.G. Derpanis, M. Sizintsev, K. Cannons, R.P. Wildes, Efficient action spotting based on a spacetime oriented structure representation, in Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 1990–1997
Google Scholar
P. Dollár, V. Rabaud, G. Cottrell, S. Belongie, Behavior recognition via sparse spatio-temporal features, in Proceedings of the 2005 IEEE Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005, pp. 65–72
Google Scholar
A. Gilbert, J. Illingworth, R. Bowden, Action recognition using mined hierarchical compound features. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 883–897 (2011)
Article Google Scholar
S. Gong, and T. Xiang, Recognition of group activities using dynamic probabilistic networks, in Proceedings of the 2003 IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, 2003, pp. 742–749
Google Scholar
G. Heinrich, Parameter estimation for text analysis. Technical report, University of Leipzig, 2008
Google Scholar
T. Hospedales, S.G. Gong, T. Xiang, A Markov clustering topic model for mining behaviour in video, in Proceedings of the 2009 International Conference on Computer Vision, 2009, pp. 1165–1172
Google Scholar
A. Kläser, M. Marszalek, C. Schmid, A spatio-temporal descriptor based on 3d-gradients, in Proceedings of the 2008 British Machine Vision Conference (2008)
Google Scholar
A. Kovashka, K. Grauman, Learning a hierarchy of discriminative space-time neighborhood features for human action recognition, in Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 2046–2053
Google Scholar
H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, T. Serre, HMDB: a large video database for human motion recognition, in Proceedings of the 2011 International Conference on Computer Vision 2011
Google Scholar
I. Laptev, On space-time interest points. Int. J. Comput. Vis. 64, 107–123 (2005)
Article Google Scholar
I. Laptev, T. Lindeberg, Space-time interest points, in Proceedings of the 2003 International Conference on Computer Vision, 2003, pp. 432–439
Google Scholar
I. Laptev, M. Marszalek, C. Schmid, B. Rozenfeld, Learning realistic human actions from movies, in Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1–8
Google Scholar
P. Matikainen, M. Hebert, R. Sukthankar, Trajectons: Action recognition through the motion analysis of tracked features, in Proceedings of the 2009 IEEE Workshop on Video-Oriented Object and Event Classification (2009)
Google Scholar
P. Matikainen, M. Hebert, R. Sukthankar, Representing pairwise spatial and temporal relations for action recognition, in Proceedings of the 2010 European Conference on Computer Vision 2010
Google Scholar
R. Messing, C. Pal, H. Kautz, Activity recognition using the velocity histories of tracked keypoints, in Proceedings of the 2009 International Conference on Computer Vision 2009
Google Scholar
P. Natarajan, R. Nevatia, Coupled hidden semi Markov models for activity recognition, in Proceedings of the IEEE Workshop on Motion and Video Computing 2007
Google Scholar
N.T. Nguyen, D.Q. Phung, S. Venkatesh, Learning and detecting activities from movement trajectories using the hierarchical hidden Markov models, in Proceedings of the 2005 IEEE Conference on Computer Vision and Pattern Recognition, 2005, pp. 955–960
Google Scholar
E. Nowak, F. Jurie, B. Triggs, Sampling strategies for bag-of-features image classification, in Proceedings of the 2006 European Conference on Computer Vision, 2006, pp. 490–503
Google Scholar
N. Oliver, E. Horvitz, A. Garg, Layered representations for human activity recognition, in Proceedings of the 2002 IEEE International Conference on Multimodal Interfaces, 2002, pp. 3–8
Google Scholar
Nuria M. Oliver, Barbara Rosario, Alex P. Pentland, A Bayesian computer vision system for modeling human interactions. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 831–843 (2000)
Article Google Scholar
J.R. Rohlicek, W. Russell, S. Roukos, H. Gish, Continuous hidden Markov modeling for speaker-independent word spotting, in Proceedings of the 1989 International Conference on Acoustics, Speech, and Signal Processing, 1989, pp. 627–630
Google Scholar
R. Rose, D. Paul, A hidden Markov model based keyword recognition system, in Proceedings of the 1990 International Conference on Acoustics, Speech, and Signal Processing 1990
Google Scholar
C. Schüldt, I. Laptev, B. Caputo, Recognizing human actions: a local svm approach, in Proceedings of the 2004 International Conference on Pattern Recognition, 2004, pp. 32–36
Google Scholar
P. Scovanner, S. Ali, M. Shah, A 3-dimensional sift descriptor and its application to action recognition, in Procedings of the ACM International Conference on Multimedia, 2007, pp. 57–360
Google Scholar
University of Central Florida. University of Central Florida, Computer Vision Lab, 2010. URL http://server.cs.ucf.edu/~vision/data/UCF50.rar
H. Wang, M.M. Ullah, A. Kläser, I. Laptev, C. Schmid,Evaluation of local spatio-temporal features for action recognition, in Proceedings of the 2009 British Machine Vision Conference 2009
Google Scholar
H. Wang, A. Kläser, C. Schmid, L. Cheng-Lin, Action recognition by dense trajectories. in Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 3169–3176
Google Scholar
G. Willems, T. Tuytelaars, L. Gool, An efficient dense and scale-invariant spatio-temporal interest point detector, in Proceedings of the 2008 European Conference on Computer Vision, 2008, pp. 650–663
Google Scholar
J. Yamato, J. Ohya, K. Ishii, Recognizing human action in time-sequential images using hidden Markov model, in Proceedings of the 1992 IEEE Conference on Computer Vision and Pattern Recognition, 1992, pp. 379–385
Google Scholar
L. Yeffet, L. Wolf, Local trinary patterns for human action recognition, in Proceedings of the 2009 International Conference on Computer Vision 2009
Google Scholar
J. Yuan, Z. Liu, Y. Wu, Discriminative subvolume search for efficient action detection, in Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition 2009
Google Scholar

Download references

Acknowledgements

The authors wish to thank the associate editors and anonymous referees for all their advice about the structure, references, experimental illustration and interpretation of this manuscript. The work benefited significantly from our participation in the ChaLearn challenge as well as the accompanying workshops.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University at Buffalo, SUNY, Buffalo, NY, 14260, USA
Manavender R. Malgireddy, I. Nwogu & V. Govindaraju

Authors

Manavender R. Malgireddy
View author publications
You can also search for this author in PubMed Google Scholar
I. Nwogu
View author publications
You can also search for this author in PubMed Google Scholar
V. Govindaraju
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manavender R. Malgireddy .

Editor information

Editors and Affiliations

University of Barcelona, Barcelona, Spain
Sergio Escalera
ChaLearn, Berkeley, California, USA
Isabelle Guyon
Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas, USA
Vassilis Athitsos

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Malgireddy, M.R., Nwogu, I., Govindaraju, V. (2017). Language-Motivated Approaches to Action Recognition. In: Escalera, S., Guyon, I., Athitsos, V. (eds) Gesture Recognition. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-57021-1_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-57021-1_5
Published: 20 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57020-4
Online ISBN: 978-3-319-57021-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics