Skeleton-Based Human Action Recognition by Pose Specificity and Weighted Voting

Liu, Tingting; Wang, Jiaole; Hutchinson, Seth; Meng, Max Q.-H.

doi:10.1007/s12369-018-0498-z

Skeleton-Based Human Action Recognition by Pose Specificity and Weighted Voting

Published: 30 October 2018

Volume 11, pages 219–234, (2019)
Cite this article

International Journal of Social Robotics Aims and scope Submit manuscript

Tingting Liu¹,
Jiaole Wang ORCID: orcid.org/0000-0002-0941-8003¹,
Seth Hutchinson² &
…
Max Q.-H. Meng^1,3

1132 Accesses
Explore all metrics

Abstract

This paper introduces a human action recognition method based on skeletal data captured by Kinect or other depth sensors. After a series of pre-processing, action features such as position, velocity, and acceleration have been extracted from each frame to capture both dynamic and static information of human motion, which can make full use of the human skeletal data. The most challenging problem in skeleton-based human action recognition is the large variability within and across subjects. To handle this problem, we propose to divide human poses into two major categories: the discriminating pose and the common pose. A pose specificity metric has been proposed to quantify the discriminative level of different poses. Finally, the action recognition is actualized by a weighted voting method. This method uses the k nearest neighbors found from the training dataset for voting and uses the pose specificity as the weight of a ballot. Experiments on two benchmark datasets have been carried out, the results have illustrated that the proposed method outperforms the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effective human action recognition using global and local offsets of skeleton joints

Article 20 July 2018

Human Action Recognition Utilizing Variations in Skeleton Dimensions

Article 20 July 2017

Action Recognition Based on Optimal Joint Selection and Discriminative Depth Descriptor

References

Aggarwal JK, Ryoo MS (2011) Human activity analysis: a review. ACM Comput Surv 43(3):16:1–16:43
Article Google Scholar
Aggarwal JK, Xia L (2014) Human activity recognition from 3D data: a review. Pattern Recognit Lett 48(Supplement C):70–80
Article Google Scholar
Amor BB, Su J, Srivastava A (2016) Action recognition using rate-invariant analysis of skeletal shape trajectories. IEEE Trans Pattern Anal Mach Intell 38(1):1–13
Article Google Scholar
Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 23(3):257–267
Article Google Scholar
Chaaraoui AA, Climent-Pérez P, Flórez-Revuelta F (2012) An efficient approach for multi-view human action recognition based on bag-of-key-poses. In: International workshop on human behavior understanding, Springer, Berlin, pp 29–40
Chaaraoui AA, Padilla-López JR, Climent-Pérez P, Flórez-Revuelta F (2014) Evolutionary joint selection to improve human action recognition with RGB-D devices. Expert Syst Appl 41(3):786–794
Article Google Scholar
Chaquet JM, Carmona EJ, Fernández-Caballero A (2013) A survey of video datasets for human action and activity recognition. Comput Vis Image Underst 117(6):633–659
Article Google Scholar
Chen C, Liu K, Kehtarnavaz N (2016) Real-time human action recognition based on depth motion maps. J Real-Time Image Process 12:155–163
Article Google Scholar
Cippitelli E, Gasparrini S, Gambi E, Spinsante S (2016) A human activity recognition system using skeleton data from RGBD sensors. Intell Neurosci 2016:21–34
Google Scholar
Ding W, Liu K, Cheng F, Zhang J (2016) Learning hierarchical spatio-temporal pattern for human activity prediction. J Vis Commun Image Represent 35(Supplement C):103–111
Article Google Scholar
Ding W, Liu K, Fu X, Cheng F (2016) Profile hmms for skeleton-based human action recognition. Signal Process Image Commun 42:109–119
Article Google Scholar
Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: The IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 1110–1118
Du Y, Fu Y, Wang L (2016) Representation learning of temporal dynamics for skeleton-based action recognition. IEEE Trans Image Process 25(7):3010–3022
Article MathSciNet MATH Google Scholar
Eweiwi A, Cheema MS, Bauckhage C, Gall J (2014) Efficient pose-based action recognition. In: Asian conference on computer vision (ACCV), Springer, Berlin, pp 428–443
Faria DR, Premebida C, Nunes U (2014) A probabilistic approach for human everyday activities recognition using body motion from RGB-D images. In: The 23rd IEEE international symposium on robot and human interactive communication, IEEE, pp 732–737
Gowayyed MA, Torki M, Hussein ME, El-Saban M (2013) Histogram of oriented displacements (hod): describing trajectories of human joints for action recognition. In: International joint conference on artificial intelligence, AAAI Press, pp 1351–1357
Gupta R, Chia AYS, Rajan D (2013) Human activities recognition using depth images. In: Proceedings of the 21st ACM international conference on multimedia, ACM, pp 283–292
Jiang M, Kong J, Bebis G, Huo H (2015) Informative joints based human action recognition using skeleton contexts. Signal Process Image Commun 33(Supplement C):29–40
Article Google Scholar
Joo SW, Chellappa R (2006) Attribute grammar-based event recognition and anomaly detection. In: 2006 conference on computer vision and pattern recognition workshop (CVPRW’06), IEEE, pp 107–107
Ke SR, Thuc HLU, Lee YJ, Hwang JN, Yoo JH, Choi KH (2013) A review on video-based human activity recognition. Computers 2(2):88–131
Article Google Scholar
Ke Y, Sukthankar R, Hebert M (2007) Spatio-temporal shape and flow correlation for action recognition. In: 2007 IEEE conference on computer vision and pattern recognition, IEEE, pp 1–8
Kitani KM, Sato Y, Sugimoto A (2007) Recovering the basic structure of human activities from a video-based symbol string. In: IEEE workshop on motion and video computing, 2007. WMVC’07, IEEE, pp 9–9
Koppula HS, Gupta R, Saxena A (2013) Learning human activities and object affordances from RGB-D videos. Int J Robot Res 32(8):951–970
Article Google Scholar
Lai RYQ, Yuen PC, Lee KKW (2011) Motion capture data completion and denoising by singular value thresholding. In: Proceedings of Eurographics, pp 45–48
Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: 2010 IEEE computer society conference on computer vision and pattern recognition—workshops, IEEE, pp 9–14
Lublinerman R, Ozay N, Zarpalas D, Camps O (2006) Activity recognition from silhouettes using linear systems and model (in) validation techniques. In: 18th international conference on pattern recognition (ICPR’06), IEEE, vol 1, pp 347–350
Ni B, Pei Y, Moulin P, Yan S (2013) Multilevel depth and image fusion for human activity detection. IEEE Trans Cybern 43(5):1383–1394
Article Google Scholar
Ofli F, Chaudhry R, Kurillo G, Vidal R, Bajcsy R (2014) Sequence of the most informative joints (SMIJ): a new representation for human skeletal action recognition. J Vis Commun Image Represent 25(1):24–38
Article Google Scholar
Parisi GI, Weber C, Wermter S (2015) Self-organizing neural integration of pose-motion features for human action recognition. Front Neurorobot 9:3
Article Google Scholar
Piyathilaka L, Kodagoda S (2013) Gaussian mixture based hmm for human daily activity recognition using 3D skeleton features. In: 2013 IEEE 8th conference on industrial electronics and applications (ICIEA), IEEE, pp 567–572
Ryoo MS, Aggarwal JK (2006) Recognition of composite human activities through context-free grammar based representation. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), IEEE, vol 2, pp 1709–1718
Ryoo MS, Aggarwal JK (2009) Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: 2009 IEEE 12th international conference on computer vision, IEEE, pp 1593–1600
Shan J, Akella S (2014) 3D human action segmentation and recognition using pose kinetic energy. In: 2014 IEEE international workshop on advanced robotics and its social impacts, IEEE, pp 69–75
Shotton J, Sharp T, Kipman A, Fitzgibbon A, Finocchio M, Blake A, Cook M, Moore R (2013) Real-time human pose recognition in parts from single depth images. Commun ACM 56(1):116–124
Article Google Scholar
Sparck Jones K (1972) A statistical interpretation of term specificity and its application in retrieval. J Doc 28(1):11–21
Article Google Scholar
Srivastava A, Turaga P, Kurtek S (2012) On advances in differential-geometric approaches for 2D and 3D shape analyses and activity recognition. Image Vis Comput 30(6):398–416
Article Google Scholar
Sung J, Ponce C, Selman B, Saxena A (2012) Unstructured human activity detection from RGBD images. In: 2012 IEEE international conference on robotics and automation, IEEE, pp 842–849
Tao L, Vidal R (2015) Moving poselets: A discriminative and interpretable skeletal motion representation for action recognition. In: The IEEE international conference on computer vision (ICCV) workshops, pp 61–69
Thanh TT, Chen F, Kotani K, Le B (2014) Extraction of discriminative patterns from skeleton sequences for accurate action recognition. Fundam Inform 130(2):247–261
Google Scholar
Veeraraghavan A, Chellappa R, Roy-Chowdhury AK (2006) The function space of an activity. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), IEEE, vol 1, pp 959–968
Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3D skeletons as points in a lie group. In: The IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 588–595
Wang Y, Shi Y, Wei G (2017) A novel local feature descriptor based on energy information for human activity recognition. Neurocomputing 228(Supplement C):19–28
Article Google Scholar
Yang X, Tian Y (2014) Effective 3D action recognition using eigenjoints. J Vis Commun Image Represent 25(1):2–11
Article MathSciNet Google Scholar
Yang X, Tian YL (2012) Eigenjoints-based action recognition using naive-bayes-nearest-neighbor. In: 2012 IEEE computer society conference on computer vision and pattern recognition workshops, IEEE, pp 14–19
Yu E, Aggarwal JK (2006) Detection of fence climbing from monocular video. In: 18th international conference on pattern recognition (ICPR’06), IEEE, vol 1, pp 375–378
Zhang C, Tian Y (2012) RGB-D camera-based daily living activity recognition. J Comput Vis Image Process 2(4):1–7
Article Google Scholar
Zhang D, Gatica-Perez D, Bengio S, McCowan IA, Lathoud G (2006) Modeling individual and group actions in meetings with layered hmms. IEEE Trans Multimed 8(3):509–520
Article Google Scholar
Zhang Z (2012) Microsoft kinect sensor and its effect. IEEE MultiMed 19(2):4–10
Article Google Scholar
Zhu G, Zhang L, Shen P, Song J (2016) Human action recognition using multi-layer codebooks of key poses and atomic motions. Signal Process Image Commun 42:19–30
Article Google Scholar
Zhu G, Zhang L, Shen P, Song J (2016) An online continuous human action recognition algorithm based on the kinect sensor. Sensors 16(2):161
Article Google Scholar
Zhu Y, Chen W, Guo G (2014) Evaluating spatiotemporal interest point features for depth-based action recognition. Image and Vis Comput 32(8):453–464
Article Google Scholar

Download references

Funding

This study was partly funded by RGC (GRF # 14205914, GRF # 14210117), in part by the Shenzhen Science and Technology Innovation projects # JCYJ20170413-161616163 awarded to Max Q.-H. Meng.

Author information

Authors and Affiliations

Department of Electronic Engineering, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, China
Tingting Liu, Jiaole Wang & Max Q.-H. Meng
Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
Seth Hutchinson
The Shenzhen Research Institute, Chinese University of Hong Kong in Shenzhen, Shenzhen, China
Max Q.-H. Meng

Authors

Tingting Liu
View author publications
You can also search for this author inPubMed Google Scholar
Jiaole Wang
View author publications
You can also search for this author inPubMed Google Scholar
Seth Hutchinson
View author publications
You can also search for this author inPubMed Google Scholar
Max Q.-H. Meng
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence to Jiaole Wang or Max Q.-H. Meng.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is supported partly by RGC GRF # 14200618, in part by the RGC GRF Grant # 14205914, in part by the Shenzhen Science and Technology Innovation projects # JCYJ20170413161616163 awarded to Max Q.-H. Meng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, T., Wang, J., Hutchinson, S. et al. Skeleton-Based Human Action Recognition by Pose Specificity and Weighted Voting. Int J of Soc Robotics 11, 219–234 (2019). https://doi.org/10.1007/s12369-018-0498-z

Download citation

Accepted: 11 October 2018
Published: 30 October 2018
Issue Date: 01 April 2019
DOI: https://doi.org/10.1007/s12369-018-0498-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Skeleton-Based Human Action Recognition by Pose Specificity and Weighted Voting

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Effective human action recognition using global and local offsets of skeleton joints

Human Action Recognition Utilizing Variations in Skeleton Dimensions

Action Recognition Based on Optimal Joint Selection and Discriminative Depth Descriptor

References

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now