Simultaneous tracking and action recognition for single actor human actions

Singh, Vivek Kumar; Nevatia, Ram

doi:10.1007/s00371-011-0656-x

Simultaneous tracking and action recognition for single actor human actions

Original Article
Published: 06 November 2011

Volume 27, pages 1115–1123, (2011)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Vivek Kumar Singh¹ &
Ram Nevatia¹

185 Accesses
11 Citations
Explore all metrics

Abstract

This paper presents an approach to simultaneously tracking the pose and recognizing human actions in a video. This is achieved by combining a Dynamic Bayesian Action Network (DBAN) with 2D body part models. Existing DBAN implementation relies on fairly weak observation features, which affects the recognition accuracy. In this work, we use a 2D body part model for accurate pose alignment, which in turn improves both pose estimate and action recognition accuracy. To compensate for the additional time required for alignment, we use an action entropy-based scheme to determine the minimum number of states to be maintained in each frame while avoiding sample impoverishment. In addition, we also present an approach to automation of the keypose selection task for learning 3D action models from a few annotations. We demonstrate our approach on a hand gesture dataset with 500 action sequences, and we show that compared to DBAN our algorithm achieves 6% improvement in accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Collins, M.: Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), July 2002, pp. 1–8 (2002)
Chapter Google Scholar
Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: Computer Vision and Pattern Recognition (CVPR) (2008)
Google Scholar
Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. Int. J. Comput. Vis. 61(1), 55–79 (2005)
Article Google Scholar
Gupta, A., Chen, F., Kimber, D., Davis, L.S.: Context and observation driven latent variable model for human pose estimation. In: Computer Vision and Pattern Recognition (CVPR) (2008)
Google Scholar
Ikizler, N., Forsyth, D.A.: Searching video for complex activities with finite state models. In: Computer Vision and Pattern Recognition (CVPR) (2007)
Google Scholar
Ke, Y., Sukthankar, R., Hebert, M.: Event detection in crowded videos. In: International Conference on Computer Vision (ICCV) (2007)
Google Scholar
Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2–3), 107–123 (2005)
Article Google Scholar
Lee, M.W., Nevatia, R.: Human pose tracking using multi-level structured models. In: ECCV (3), pp. 368–381 (2006)
Google Scholar
Lourakis, M.: Levmar: Levenberg-Marquardt nonlinear least squares algorithms in C/C++. [web page]. http://www.ics.forth.gr/~lourakis/levmar/, Jul. 2004. [Accessed on 31 Jan. 2005.]
Lv, F., Nevatia, R.: Single view human action recognition using key pose matching and Viterbi path searching. In: Computer Vision and Pattern Recognition (CVPR) (2007)
Google Scholar
Morency, L.-P., Quattoni, A., Darrell, T.: Latent-dynamic discriminative models for continuous gesture recognition. In: Computer Vision and Pattern Recognition (2007)
Google Scholar
Natarajan, P., Nevatia, R.: View and scale invariant action recognition using multiview shape-flow models. In: CVPR (2008)
Google Scholar
Natarajan, P., Singh, V.K., Nevatia, R.: Learning 3d action models from a few 2d videos for view invariant action recognition. In: CVPR (2010)
Google Scholar
Shet, V., Prasad, S.N., Elgammal, A., Yacoob, Y., Davis, L.: Multi-cue exemplar-based nonparametric model for gesture recognition. In: ICVGIP (2004)
Google Scholar
Sigal, L., Black, M.J.: Measure locally, reason globally: Occlusion-sensitive articulated pose estimation. In: CVPR, pp. 2041–2048 (2006)
Google Scholar
Singh, V.K., Nevatia, R.: Human action recognition using a dynamic Bayesian action network with 2D part models. In: Proceedings of the Seventh Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP ’10, pp. 17–24 (2010)
Chapter Google Scholar
Sminchisescu, C., Kanaujia, A., Li, Z., Metaxas, D.: Conditional random fields for contextual human motion recognition. In: International Conference on Computer Vision (ICCV), pp. 1808–1815 (2005)
Google Scholar
Taylor, C.J.: Reconstruction of articulated objects from point correspondences in a single uncalibrated image. In: Computer Vision and Image Understanding (CVIU), vol. 80, pp. 349–363 (2000)
Google Scholar
Urtasun, R., Fleet, D.J., Fua, P.: 3D people tracking with Gaussian process dynamical models. In: Computer Vision and Pattern Recognition (CVPR), pp. 238–245 (2006)
Google Scholar
Weinland, D., Ronfard, R., Boyer, E.: Automatic discovery of action taxonomies from multiple views. In: Computer Vision and Pattern Recognition (CVPR), vol. II, pp. 1639–1645 (2006)
Google Scholar
Wu, B., Nevatia, R.: Detection of multiple, partially occluded humans in a single image by Bayesian combination of Edgelet part detectors. In: ICCV, pp. 90–97 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Southern California, Los Angeles, CA, 90089, USA
Vivek Kumar Singh & Ram Nevatia

Authors

Vivek Kumar Singh
View author publications
You can also search for this author in PubMed Google Scholar
Ram Nevatia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vivek Kumar Singh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Singh, V.K., Nevatia, R. Simultaneous tracking and action recognition for single actor human actions. Vis Comput 27, 1115–1123 (2011). https://doi.org/10.1007/s00371-011-0656-x

Download citation

Published: 06 November 2011
Issue Date: December 2011
DOI: https://doi.org/10.1007/s00371-011-0656-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Simultaneous tracking and action recognition for single actor human actions

Abstract

Access this article

Similar content being viewed by others

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

Human activity recognition in artificial intelligence framework: a narrative review

Human action recognition using fusion of multiview and deep features: an application to video surveillance

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Simultaneous tracking and action recognition for single actor human actions

Abstract

Access this article

Similar content being viewed by others

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

Human activity recognition in artificial intelligence framework: a narrative review

Human action recognition using fusion of multiview and deep features: an application to video surveillance

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation