Abstract
Human action recognition based on 3D skeleton joints is an important yet challenging task. While many research work are devoted to 3D action recognition, they mainly suffer from two problems: complex model representation and low implementation efficiency. To tackle these problems, we propose an effective and efficient framework for 3D action recognition using a global-and-local histogram representation model. Our method consists of a global-and-local featuring phase, a saturation based histogram representation phase, and a classification phase. The global-and-local featuring phase captures the global feature and local feature of each action sequence using the joint displacement between the current frame and the first frame, and the joint displacement between pairwise fixed-skip frames, respectively. The saturation based histogram representation phase captures the histogram representation of each joint considering the motion independence of joints and saturation of each histogram bin. The classification phase measures the distance of each joint histogram-to-class. Besides, we produce a novel action dataset called BJUT Kinect dataset, which consists of multi-period motion clips and intra-class variations. We compare our method with many state-of-the-art methods on BJUT Kinect dataset, UCF Kinect dataset, Florence 3D action dataset, MSR-Action3D dataset, and NTU RGB+D Dataset. The results show that our method achieves both higher accuracy and efficiency for 3D action recognition.











Similar content being viewed by others
References
Agahian S, Negin F, Köse C (2018) Improving bag-of-poses with semi-temporal pose descriptors for skeleton-based action recognition. Vis Comput. https://doi.org/10.1007/s00371-018-1489-7
Anirudh R, Turaga P, Su J, Srivastava A (2017) Elastic functional coding of Riemannian trajectories. IEEE Trans Pattern Anal Mach Intell 39(5):922–936
Beh J, Han D, Durasiwami R, Ko H (2014) Hidden markov model on a unit hypersphere space for gesture trajectory recognition. Pattern Recogn Lett 36:144–153
Boiman O, Shechtman E, Irani M (2008) In defense of nearest-neighbor based image classification. In: IEEE conference on computer vision and pattern recognition, pp 1–8
Chaaraoui A, Padilla-Lopez J, Climent-Perez P, Florez-Revuelta F (2014) Evolutionary joint selection to improve human action recognition with RGB-d devices. Expert Syst Appl 41:786–794
Chen H, Hwang J (2011) Integrated video object tracking with applications in trajectory-based event detection. J Vis Commun Image Represent 22:673–685
Chen W, Guo G (2015) Triviews: a general framework to use 3D depth data effectively for action recognition. J Vis Commun Image Represent 26:182–191
Chen H, Wang G, Xue J-H, He L (2016) A novel hierarchical framework for human action recognition. Pattern Recogn 55:148–159
Cui J, Liu Y, Xu Y, Zhao H, Zha H (2013) Tracking generic human motion via fusion of low-and high-dimensional approaches. IEEE Trans Syst Man Cybern Syst 43:996–1002
Devanne M, Wannous H, Berretti S, Pala P, Daoudi M, Del Bimbo A (2015) 3-D human action recognition by shape analysis of motion trajectories on Riemannian manifold. IEEE Trans Cybern 45(7):1340–1352
Dong J, Sun C, Yang W (2015) A supervised dictionary learning and discriminative weighting model for action recognition. Neurocomputing 158:246–256
Ellis C, Masood S, Tappen M, Laviola J, Sukthankar R (2013) Exploring the trade-off between accuracy and observational latency in action recognition. Int J Comput Vis 101:420–436
Evangelidis G, Singh G, Horaud R (2014) Skeletal quads: human action recognition using joint quadruples. In: International conference on pattern recognition, pp 4513–4518
Eweiwi A, Cheema MS, Bauckhage C, Gall J (2014) Efficient pose-based action recognition. In: Asian conference on computer vision, pp 428–443
Fathi A, Mori G (2008) Action recognition by learning mid-level motion features. In: IEEE conference on computer vision and pattern recognition
Hu J-F, Zheng W-S, Lai J, Zhang J (2015) Jointly learning heterogeneous features for RGB-D activity recognition. In: IEEE conference on computer vision and pattern recognition, pp 5344–5352
Jiang X, Zhong F, Peng Q, Qin X (2016) Action recognition based on global optimal similarity measuring. Multimedia Tools and Applications 75:11019–11036
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: IEEE conference on computer vision and pattern recognition, pp 1–8
Li M, Leung H (2017) Graph-based approach for 3D human skeletal action recognition. Pattern Recogn Lett 87:195–202
Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: IEEE computer society conference on computer vision and pattern recognition workshops, pp 9–14
Liu Y, Cui J, Zhao H, Zha H (2012) Fusion of low-and high-dimensional approaches by trackers sampling for generic human motion tracking. In: International conference on pattern recognition, pp 898–901
Liu Y, Nie L, Han L, Zhang L, Rosenblum DS (2015) Action2activity: recognizing complex activities from sensor data. In: International joint conferences on artificial intelligence, pp 1617–1623
Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115
Liu L, Cheng L, Liu Y, Jia Y, Rosenblum DS (2016) Recognizing complex activities by a probabilistic interval-based model. In: AAAI conference on artificial intelligence, pp 1266–1272
Lu G, Zhou Y (2013) Extraction of action patterns using local temporal self-similarities of skeletal body-joints. In: International congress on image and signal processing, pp 96–100
Lu G, Zhou Y, Li X, Kudo M (2016) Efficient action recognition via local position offset of 3D skeletal body joints. Multimed Tools Appl 75:3479–3494
Lu Y, Wei Y, Liu L, Zhong J, Sun L, Liu Y (2017) Towards unsupervised physical activity recognition using smartphone accelerometers. Multimed Tools Appl 76(8):10701–10719
Luo J, Wang W, Qi H (2014) Spatio-temporal feature extraction and representation for RGB-d human action recognition. Pattern Recogn Lett 50:139–148
Luvizon DC, Tabia H, Picard D (2017) Learning features combination for human action recognition from skeleton sequences. Pattern Recogn Lett 99:13–20
Matikainen P, Hebert M, Sukthankar R (2009) Trajectons: Action recognition through the motion analysis of tracked features. In: IEEE 12th international conference on computer vision workshops, pp 514–521
Negin F, Özdemir F, Akgül CB, Yüksel KA, Erçil A (2013) A decision forest based feature selection framework for action recognition from rgb-depth cameras. In: International conference image analysis and recognition, pp 648–657
Ohn-bar E, Trivedi M (2013) Joint angles similiarities and HOG2 for action recognition. In: IEEE international conference of computer vision and pattern recognition workshops, pp 465–470
Oreifej O, Liu Z (2013) Hon4d: histogram of oriented 4D normals for activity recognition from depth sequences. In: IEEE conference on computer vision and pattern recognition, pp 716–723
Pirsiavash H, Ramanan D (2012) Detecting activities of daily living in first-person camera views. In: IEEE conference on computer vision and pattern recognition, pp 2847–2854
Qiao R, Liu L, Shen C, van den Hengel A (2017) Learning discriminative trajectorylet detector sets for accurate skeleton-based action recognition. Pattern Recogn 66:202–212
Roccetti M, Marfia G, Semeraro A (2012) Playing into the wild: a gesture-based interface for gaming in public spaces. J Vis Commun Image Represent 23:426–440
Seidenari L, Varano V, Berretti S, Del Bimbo A, Pala P (2013) Recognizing actions from depth cameras as weakly aligned multi-part bag-of-poses. In: IEEE conference on computer vision and pattern recognition workshops, pp 479–485
Shahroudy A, Liu J, Ng T-T, Wang G (2016) NTU RGB+ D: a large scale dataset for 3D human activity analysis. In: IEEE conference on computer vision and pattern recognition, pp 1010–1019
Sheng B, Yang W, Sun C (2015) Action recognition using direction-dependent feature pairs and non-negative low rank sparse model. Neurocomputing 158:73–80
Slama R, Wannous H, Daoudi M, Srivastava A (2015) Accurate 3D action recognition using learning on the grassmann manifold. Pattern Recogn 48:556–567
Sung J, Ponce C, Selman B, Saxena A (2012) Unstructured human activity detection from RGBD images. In: IEEE international conference on robotics and automation, pp 842–849
Veeriah V, Zhuang N, Qi G-J (2015) Differential recurrent neural networks for action recognition. In: IEEE international conference on computer vision, IEEE, pp 4041–4049
Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3D skeletons as points in a lie group. In: IEEE conference on computer vision and pattern recognition, pp 588– 595
Vieira AW, Nascimento ER, Oliveira GL, Liu Z, Campos MF (2012) STOP: space-time occupancy patterns for 3D action recognition from depth map sequences. In: Iberoamerican congress on pattern recognition, pp 252–259
Vieira A, Nascimento E, Oliveira G, Liu Z, Campos M (2014) On the improvement of human action recognition from depth map sequences using space-time occupancy patterns. Pattern Recogn Lett 36:221–227
Wang J, Liu Z, Wu Y, Yuan J (2012) Mining actionlet ensemble for action recognition with depth cameras. In: IEEE conference on computer vision and pattern recognition, pp 1290–1297
Wang J, Liu Z, Chorowski J, Chen Z, Wu Y (2012) Robust 3D action recognition with random occupancy patterns. In: European conference on computer vision, pp 872–885
Wang J, Liu Z, Wu Y (2014) Learning actionlet ensemble for 3D human action recognition. IEEE Trans Pattern Anal Mach Intell 36(5):914–927
Wang P, Yuan C, Hu W, Li B, Zhang Y (2016) Graph based skeleton motion representation and similarity measurement for action recognition. In: European conference on computer vision, pp 370– 385
Xia L, Chen C, Aggarwal J (2012) View invariant human action recognition using histograms of 3D joints. In: IEEE computer society conference on computer vision and pattern recognition workshops, pp 20–27
Yang X, Tian Y (2014) Effective 3D action recognition using eigenjoints. J Vis Commun Image Represent 25:2–11
Yang X, Tian Y (2014) Super normal vector for activity recognition using depth sequences. In: IEEE conference on computer vision and pattern recognition, pp 804–811
Yang X, Zhang C, Tian Y (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients. In: ACM international conference on multimedia, pp 1057–1060
Yang Y, Deng C, Tao D, Zhang S, Liu W, Gao X (2017) Latent max-margin multitask learning with skelets for 3-D action recognition. IEEE Trans Cybern 47(2):439–448
Zhang S, Liu X, Xiao J (2017) On geometric features for skeleton-based action recognition using multilayer LSTM networks. In: IEEE winter conference on applications of computer vision, pp 148– 157
Zhou Y, Ming A (2016) Human action recognition with skeleton induced discriminative approximate rigid part model. Pattern Recogn Lett 83:261–267
Zhu Y, Dariush B, Fujimura K (2010) Kinematic self retargeting: a framework for human pose estimation. Comput Vis Image Underst 114:1362–1375
Zhu Y, Chen W, Guo G (2013) Fusing spatiotemporal features and joints for 3D action recognition. In: IEEE conference on computer vision and pattern recognition workshops, pp 486–491
Acknowledgments
This work was supported by National Natural Science Foundation of China (No. 61772048, 61632006), Beijing Natural Science Foundation (No. 4162009), Beijing Municipal Science and Technology Project(No. Z161100001116072, Z171100004417023).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sun, B., Kong, D., Wang, S. et al. Effective human action recognition using global and local offsets of skeleton joints. Multimed Tools Appl 78, 6329–6353 (2019). https://doi.org/10.1007/s11042-018-6370-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6370-1