Abstract
Human action recognition based on the 3D skeleton is an important yet challenging task, because of the instability of skeleton joints and great variations in action length. In this paper we propose a novel method that can effectively deal with unstable joints and significant temporal misalignment. Action recognition is elegantly formulated as a sequence-matching problem on a pre-constructed weighted graph, which can encodes any spatio-temporal features and the transition probabilities between action elements. To classify any input sequence of actions, a global optimal matching algorithm based on dynamic programming is introduced, which can deal with temporal misalignment without pre-segmentation, The weighted graph is constructed in training stage. The proposed approach is evaluated on two benchmark datasets captured by a single depth sensor. Experimental results show that our approach can achieve superior performance to most state-of-the-art algorithms.
Similar content being viewed by others
References
Action Recognition by Learning Mid-level Motion Features, A. Fathi and G. Mori, CVPR, 2008
Beh J, Han DK, Durasiwami R, Ko H (2014) Pattern Recognition Letters, Hidden Markov Model on a unit hypersphere space for gesture trajectory recognition
Deng L, Leung H, Gu N, Yang Y (2010) WAIM, Automated Recognition of Sequential Patterns in Captured Motion Streams
Ellis C, Masood SZ, Tappen MF, Laviola Jr, Joseph J, Sukthankar R (2013) Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition, Int. J. Comput. Vision
Feng Y, Xiao J, Zhuang Y, Yang X, Zhang JJ, Song R (2014) Information Science, Exploiting temporal stability and low-rank structure for motion capture data refinement
Feng Y, Ji M, Xiao J, Yang X, Zhang JJ, Zhuang Y, Li X (2014) IEEE Trans Cybern, Mining Spatial-Temporal Patterns and Structural Sparsity for Human Motion Data Denoising
Hussein ME, Torki M, Gowayyed MA, El-Saban M (2013) IJCAI, Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations
Hyunsook C, Hee-Deok Y (2013) Conditional random field-based gesture recognition with depth information, Pattern Recognition
Instructing people for training gestural interactive systems, Simon Fothergill and Helena M. Mentis and Pushmeet Kohli and Sebastian Nowozin, CHI, 2012
Jiang X, Zhong F, Peng Q, Qin X Robust action recognition based on a hierarchical model. In: 2013 international conference on cyberworlds
Joint Angles Similarities and H O G 2 for Action Recognition, Eshed Ohn-Bar and Mohan M. Trivedi, Computer Vision and Pattern Recognition Workshops-HAU3D, 2013
Kläser A, Marszałek M, Schmid C (2008) A spatio-temporal descriptor based on 3D-gradients. In: British machine vision conference
Laptev I (2005) On Space-Time Interest Points, Int. J. Comput. Vision
Luo J, Wang W, Qi H Group sparsity and geometry constrained dictionary learning for action recognition from depth maps. In: The IEEE international conference on computer vision, ICCV’2013
Matikainen P, Hebert M, Sukthankar R Trajectons: Action Recognition Through the Motion Analysis of Tracked Features, Workshop on Video-Oriented Object and Event Classification, ICCV’2009
Negin F, Ozdemir F, Akgul CB, Yuksel KA, Ercil A (2013) A decision forest based feature selection framework for action recognition from RGB-depth cameras. In: ICIAR
Ramłrez-Corona M, Osorio-Ramos M, Morales EF (2013) CIARP (2), A Non-temporal Approach for Gesture Recognition Using Microsoft Kinect
Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A Real-time human pose recognition in parts from single depth images. In: Proceedings of the 2011 IEEE conference on computer vision and pattern recognition, CVPR ’11
Wang J, Liu Z, Wu Y, Yuan J Mining actionlet ensemble for action recognition with depth cameras. In: CVPR’12
Willems G, Tuytelaars T, Gool L An efficient dense and scale-invariant spatio-temporal interest point detector. In: Proceedings of the 10th european conference on computer vision: Part II, ECCV ’08
Xiao J, Feng Y, Ji M, Yang X, Zhang JJ, Zhuang Y (2014) Signal Processing, Sparse motion bases selection for human motion denoising
Zhao X, Li X, Pang C, Zhu X, Sheng Q Z Online human gesture recognition from motion data streams. In: Proceedings of the 21st ACM international conference on multimedia, MM ’13
Zou W, Wang B, Zhang R (2013) PCM, Human Action Recognition by Mining Discriminative Segment with Novel Skeleton Joint Feature
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jiang, X., Zhong, F., Peng, Q. et al. Action recognition based on global optimal similarity measuring. Multimed Tools Appl 75, 11019–11036 (2016). https://doi.org/10.1007/s11042-015-2829-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-015-2829-5