Categorization of human actions with high dynamics in upper extremities based on arm pose modeling

Li, Chongguo; Yung, Nelson H. C.

doi:10.1007/s00138-015-0686-x

Categorization of human actions with high dynamics in upper extremities based on arm pose modeling

Original Paper
Published: 06 May 2015

Volume 26, pages 619–632, (2015)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

211 Accesses
Explore all metrics

Abstract

This paper proposes a novel method to categorize the human actions with high dynamics in upper extremities. It combines generative and discriminative approaches to infer possible arm pose candidates from images and validate their action categories. The validated action can also facilitate deriving the estimated arm poses. The proposed method exploits the complementary relationship between action categorization and arm pose modeling by adopting arm pose prior of hypothetical action category to enhance modeling possible arm poses, and then applying features captured from temporal and spatial action characteristics of arm pose candidates to improve categorization. From a given visual observation, arm pose states can be estimated on a graphical model via dynamic programming under action category hypothesis, which can be validated by a trained discriminative model based on temporal arm pose words from the estimated arm pose candidates. The proposed method has been evaluated by videos of four action types from the Berkeley multimodal human action dataset with categorization success rate of 91.47 and 95.83 % for single and multiple frames, respectively, and images of three action types from the HumanEva-I dataset with categorization success rate of 96.67 %. Its arm pose modeling performance also has improvement for the actions with high dynamics in upper extremities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pose-guided action recognition in static images using lie-group

Article 16 September 2021

Exploiting Human Pose for Weakly-Supervised Temporal Action Localization

Human Action Recognition Using Skeleton Data from Two-Stage Pose Estimation Model

References

Zhou, F., De la Torre, F., Hodgins, J.K.: Aligned cluster analysis for temporal segmentation of human motion. In: IEEE Conference on Automatic Face and Gesture Recognition, pp. 1–7 (2008)
Moeslund, T.B., Hilton, A., Kruger, V., Sigal, L.: Visual Analysis of Humans: Looking at People. Springer, New York (2011)
Eichner, M., Marin-Jimenez, M., Zisserman, A., Ferrari, V.: Articulated Human Pose Estimation and Search in (Almost) Unconstrained Still Images. ETH Zurich, D-ITET, BIWI, Technical Report. No. 272 (2010)
Yao, A., Gall, J., Fanelli, G., Gool, L.V.: Does human action recognition benefit from pose estimation? In: Proceedings of the British Machine Vision Conference, pp. 67.1–67.11 (2011)
Li, C., Yung, N.H.C.: Arm pose modeling for visual surveillance. In: WORLDCOMP Conference: Image Processing, Computer Vision, and Pattern Recognition (2012)
Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., Bajcsy, R.: Berkeley MHAD: a comprehensive multimodal human action database. In: IEEE Workshop on Applications of Computer Vision, pp. 53–60 (2013)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proceedings of the international conference on pattern recognition, pp. 32–36 (2004)
Laptev, I.: On space-time interest points. Int J Comput Vis 64, 107–123 (2005)
Article Google Scholar
Rodriguez, M.D., Ahmed, J., Shah, M.: Action MACH a spatio-temporal maximum average correlation height filter for action recognition. In: Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Sadanand, S., Corso, J.J.: Action bank: a high-level representation of activity in video. In: Computer Vision and Pattern Recognition, pp. 1234–1241 (2012)
Gong W.: 3D motion data aided human action recognition and pose estimation. PhD thesis, Departament d’Arquitectura de Computadors i Sistemes Operatius, Universitat Autnoma de Barcelona (2013)
Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space–time shapes. In: IEEE International Conference on Computer Vision, pp. 1395–1402 (2005)
Saghafi, B., Rajan, D.: Human action recognition using pose-based discriminant embedding. Signal Process. Image Commun. 27, 96–111 (2012)
Article Google Scholar
Veeraraghavan, A., Chellappa, R., Roy-Chowdhury, A.K.: The function space of an activity. In: Computer Vision and Pattern Recognition, pp. 959–968 (2006)
Yamato, J., Ohya, J., Ishii, K.: Recognizing human action in time-sequential images using hidden Markov model. In: Computer Vision and Pattern Recognition, pp. 379–385 (1992)
Juang, B.H., Rabiner, L.R.: Hidden Markov models for speech recognition. Technometrics 33(3), 251–272 (1991)
Article MATH MathSciNet Google Scholar
Natarajan, P., Nevatia, R.: Hierarchical multi-channel hidden semi Markov graphical models for activity recognition. In: Computer Vision and Image Understanding (2012)
Elgammal, A., Shet, V., Yacoob, Y., Davis, L.S.: Learning dynamics for exemplar-based gesture recognition. In: Computer Vision and Pattern Recognition (2003)
Park, S., Aggarwal, J.K.: A hierarchical Bayesian network for event recognition of human actions and interactions. Multimed. Syst. 10, 164–179 (2004)
Article Google Scholar
Niebles, J.C., Wang, H., Li, F.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vis. 79, 299–318 (2008)
Article Google Scholar
Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Wang, L., Yung, N.H.C.: Three-dimensional model-based human detection in crowded scenes. IEEE Trans. Intell. Transp. Syst. 13(2), 691–703 (2012)
Article MathSciNet Google Scholar
Kotsiantis, S., Kanellopoulos, D.: Discretization techniques: a recent survey. GESTS Int. Trans. Comput. Sci. Eng. 32, 47–58 (2006)
Google Scholar
Scarrott, C., MacDonald, A.: A review of extreme value threshold estimation and uncertainty quantification. REVSTAT Stat. J. 10, 33–60 (2012)
MATH MathSciNet Google Scholar
Davison, A.C., Smith, R.L.: Models for exceedances over high thresholds. J. R. Stat. Soc. Ser. B (Methodol.) 52(3), 393–442 (1990)
MATH MathSciNet Google Scholar
Martin, D.R., Fowlkes, C.C., Malik, J.: Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Trans. Pattern Anal. Mach. Intell. 26, 530–549 (2004)
Article Google Scholar
Wang, L., Yung, N.H.C.: Extraction of moving objects from their background based on multiple adaptive thresholds and boundary evaluation. IEEE Trans. Intell. Transp. Syst. 11, 40–51 (2010)
Article Google Scholar
Conaire, C.O., O’Connor, N.E., Smeaton, A.F.: Detector adaptation by maximising agreement between independent data sources. In: Computer Vision and Pattern Recognition, pp. 1–6 (2007)
Ojala, T., Pietikinen, M., Harwood, D.: A comparative study of texture measures with classification based on featured distributions. Pattern Recognit. 29, 51–59 (1996)
Article Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)
Article Google Scholar
Malik, J., Belongie, S., Leung, T., Shi, J.: Contour and texture analysis for image segmentation. Int. J. Comput. Vis. 43, 7–27 (2001)
Article MATH Google Scholar
Torralba, A., Murphy, K.P., Freeman, W.T.: Sharing features: efficient boosting procedures for multiclass object detection. Comput. Vis. Pattern Recognit. 2, 762–769 (2004)
Google Scholar
Ladicky, L., Russell, C., Kohli, P., Torr, P.H.: Associative hierarchical crfs for object class image segmentation. In: International Conference on Computer Vision, pp. 739–746 (2009)
Shotton, J., Winn, J., Rother, C., Criminisi, A.: Textonboost: joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: European Conference on Computer Vision, pp. 1–15. Springer, New York (2006)
Felzenszwalb, P.F., Zabih, R.: Dynamic programming and graph algorithms in computer vision. IEEE Trans. Pattern Anal. Mach. Intell. 33, 721–740 (2011)
Article Google Scholar
Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition, pp. 886–893 (2005)
Sigal, L., Black, M.J.: Humaneva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion, vol. 120. Brown Univertsity TR (2006)
Zuffi, S., Romero, J., Schmid, C., Black, M.J.: Estimating human pose with flowing puppets. In: IEEE Intenational Conference on Computer Vision (2013)
Yang Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: Computer Vision and Pattern Recognition, pp. 1385–1392 (2011)
Tilley, A., Associates, H.D.: The Measure of Man and Woman: Human Factors in Design. Wiley, New York (2002)
Prak, D., Ramanan, D.: $N$-best maximal decoders for part models. In: IEEE International Conference on Computer Vision (2011)

Download references

Author information

Authors and Affiliations

The University of Hong Kong, Pok Fu Lam, Hong Kong
Chongguo Li & Nelson H. C. Yung

Authors

Chongguo Li
View author publications
You can also search for this author inPubMed Google Scholar
Nelson H. C. Yung
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Chongguo Li.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, C., Yung, N.H.C. Categorization of human actions with high dynamics in upper extremities based on arm pose modeling. Machine Vision and Applications 26, 619–632 (2015). https://doi.org/10.1007/s00138-015-0686-x

Download citation

Received: 27 May 2014
Revised: 19 January 2015
Accepted: 11 April 2015
Published: 06 May 2015
Issue Date: July 2015
DOI: https://doi.org/10.1007/s00138-015-0686-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Categorization of human actions with high dynamics in upper extremities based on arm pose modeling

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Pose-guided action recognition in static images using lie-group

Exploiting Human Pose for Weakly-Supervised Temporal Action Localization

Human Action Recognition Using Skeleton Data from Two-Stage Pose Estimation Model

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now