Skip to main content
Log in

Categorization of human actions with high dynamics in upper extremities based on arm pose modeling

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

This paper proposes a novel method to categorize the human actions with high dynamics in upper extremities. It combines generative and discriminative approaches to infer possible arm pose candidates from images and validate their action categories. The validated action can also facilitate deriving the estimated arm poses. The proposed method exploits the complementary relationship between action categorization and arm pose modeling by adopting arm pose prior of hypothetical action category to enhance modeling possible arm poses, and then applying features captured from temporal and spatial action characteristics of arm pose candidates to improve categorization. From a given visual observation, arm pose states can be estimated on a graphical model via dynamic programming under action category hypothesis, which can be validated by a trained discriminative model based on temporal arm pose words from the estimated arm pose candidates. The proposed method has been evaluated by videos of four action types from the Berkeley multimodal human action dataset with categorization success rate of 91.47 and 95.83 % for single and multiple frames, respectively, and images of three action types from the HumanEva-I dataset with categorization success rate of 96.67 %. Its arm pose modeling performance also has improvement for the actions with high dynamics in upper extremities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Zhou, F., De la Torre, F., Hodgins, J.K.: Aligned cluster analysis for temporal segmentation of human motion. In: IEEE Conference on Automatic Face and Gesture Recognition, pp. 1–7 (2008)

  2. Moeslund, T.B., Hilton, A., Kruger, V., Sigal, L.: Visual Analysis of Humans: Looking at People. Springer, New York (2011)

  3. Eichner, M., Marin-Jimenez, M., Zisserman, A., Ferrari, V.: Articulated Human Pose Estimation and Search in (Almost) Unconstrained Still Images. ETH Zurich, D-ITET, BIWI, Technical Report. No. 272 (2010)

  4. Yao, A., Gall, J., Fanelli, G., Gool, L.V.: Does human action recognition benefit from pose estimation? In: Proceedings of the British Machine Vision Conference, pp. 67.1–67.11 (2011)

  5. Li, C., Yung, N.H.C.: Arm pose modeling for visual surveillance. In: WORLDCOMP Conference: Image Processing, Computer Vision, and Pattern Recognition (2012)

  6. Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., Bajcsy, R.: Berkeley MHAD: a comprehensive multimodal human action database. In: IEEE Workshop on Applications of Computer Vision, pp. 53–60 (2013)

  7. Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proceedings of the international conference on pattern recognition, pp. 32–36 (2004)

  8. Laptev, I.: On space-time interest points. Int J Comput Vis 64, 107–123 (2005)

    Article  Google Scholar 

  9. Rodriguez, M.D., Ahmed, J., Shah, M.: Action MACH a spatio-temporal maximum average correlation height filter for action recognition. In: Computer Vision and Pattern Recognition, pp. 1–8 (2008)

  10. Sadanand, S., Corso, J.J.: Action bank: a high-level representation of activity in video. In: Computer Vision and Pattern Recognition, pp. 1234–1241 (2012)

  11. Gong W.: 3D motion data aided human action recognition and pose estimation. PhD thesis, Departament d’Arquitectura de Computadors i Sistemes Operatius, Universitat Autnoma de Barcelona (2013)

  12. Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: Computer Vision and Pattern Recognition, pp. 1–8 (2008)

  13. Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space–time shapes. In: IEEE International Conference on Computer Vision, pp. 1395–1402 (2005)

  14. Saghafi, B., Rajan, D.: Human action recognition using pose-based discriminant embedding. Signal Process. Image Commun. 27, 96–111 (2012)

    Article  Google Scholar 

  15. Veeraraghavan, A., Chellappa, R., Roy-Chowdhury, A.K.: The function space of an activity. In: Computer Vision and Pattern Recognition, pp. 959–968 (2006)

  16. Yamato, J., Ohya, J., Ishii, K.: Recognizing human action in time-sequential images using hidden Markov model. In: Computer Vision and Pattern Recognition, pp. 379–385 (1992)

  17. Juang, B.H., Rabiner, L.R.: Hidden Markov models for speech recognition. Technometrics 33(3), 251–272 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  18. Natarajan, P., Nevatia, R.: Hierarchical multi-channel hidden semi Markov graphical models for activity recognition. In: Computer Vision and Image Understanding (2012)

  19. Elgammal, A., Shet, V., Yacoob, Y., Davis, L.S.: Learning dynamics for exemplar-based gesture recognition. In: Computer Vision and Pattern Recognition (2003)

  20. Park, S., Aggarwal, J.K.: A hierarchical Bayesian network for event recognition of human actions and interactions. Multimed. Syst. 10, 164–179 (2004)

    Article  Google Scholar 

  21. Niebles, J.C., Wang, H., Li, F.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vis. 79, 299–318 (2008)

    Article  Google Scholar 

  22. Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)

  23. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

  24. Wang, L., Yung, N.H.C.: Three-dimensional model-based human detection in crowded scenes. IEEE Trans. Intell. Transp. Syst. 13(2), 691–703 (2012)

    Article  MathSciNet  Google Scholar 

  25. Kotsiantis, S., Kanellopoulos, D.: Discretization techniques: a recent survey. GESTS Int. Trans. Comput. Sci. Eng. 32, 47–58 (2006)

    Google Scholar 

  26. Scarrott, C., MacDonald, A.: A review of extreme value threshold estimation and uncertainty quantification. REVSTAT Stat. J. 10, 33–60 (2012)

    MATH  MathSciNet  Google Scholar 

  27. Davison, A.C., Smith, R.L.: Models for exceedances over high thresholds. J. R. Stat. Soc. Ser. B (Methodol.) 52(3), 393–442 (1990)

    MATH  MathSciNet  Google Scholar 

  28. Martin, D.R., Fowlkes, C.C., Malik, J.: Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Trans. Pattern Anal. Mach. Intell. 26, 530–549 (2004)

    Article  Google Scholar 

  29. Wang, L., Yung, N.H.C.: Extraction of moving objects from their background based on multiple adaptive thresholds and boundary evaluation. IEEE Trans. Intell. Transp. Syst. 11, 40–51 (2010)

    Article  Google Scholar 

  30. Conaire, C.O., O’Connor, N.E., Smeaton, A.F.: Detector adaptation by maximising agreement between independent data sources. In: Computer Vision and Pattern Recognition, pp. 1–6 (2007)

  31. Ojala, T., Pietikinen, M., Harwood, D.: A comparative study of texture measures with classification based on featured distributions. Pattern Recognit. 29, 51–59 (1996)

    Article  Google Scholar 

  32. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)

    Article  Google Scholar 

  33. Malik, J., Belongie, S., Leung, T., Shi, J.: Contour and texture analysis for image segmentation. Int. J. Comput. Vis. 43, 7–27 (2001)

    Article  MATH  Google Scholar 

  34. Torralba, A., Murphy, K.P., Freeman, W.T.: Sharing features: efficient boosting procedures for multiclass object detection. Comput. Vis. Pattern Recognit. 2, 762–769 (2004)

    Google Scholar 

  35. Ladicky, L., Russell, C., Kohli, P., Torr, P.H.: Associative hierarchical crfs for object class image segmentation. In: International Conference on Computer Vision, pp. 739–746 (2009)

  36. Shotton, J., Winn, J., Rother, C., Criminisi, A.: Textonboost: joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: European Conference on Computer Vision, pp. 1–15. Springer, New York (2006)

  37. Felzenszwalb, P.F., Zabih, R.: Dynamic programming and graph algorithms in computer vision. IEEE Trans. Pattern Anal. Mach. Intell. 33, 721–740 (2011)

    Article  Google Scholar 

  38. Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: Computer Vision and Pattern Recognition, pp. 1–8 (2008)

  39. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition, pp. 886–893 (2005)

  40. Sigal, L., Black, M.J.: Humaneva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion, vol. 120. Brown Univertsity TR (2006)

  41. Zuffi, S., Romero, J., Schmid, C., Black, M.J.: Estimating human pose with flowing puppets. In: IEEE Intenational Conference on Computer Vision (2013)

  42. Yang Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: Computer Vision and Pattern Recognition, pp. 1385–1392 (2011)

  43. Tilley, A., Associates, H.D.: The Measure of Man and Woman: Human Factors in Design. Wiley, New York (2002)

  44. Prak, D., Ramanan, D.: \(N\)-best maximal decoders for part models. In: IEEE International Conference on Computer Vision (2011)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chongguo Li.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, C., Yung, N.H.C. Categorization of human actions with high dynamics in upper extremities based on arm pose modeling. Machine Vision and Applications 26, 619–632 (2015). https://doi.org/10.1007/s00138-015-0686-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-015-0686-x

Keywords