Abstract
To accurately recognize human actions in less computational time is one important aspect for practical usage. This paper presents an efficient framework for recognizing actions by a RGB-D camera. The novel action patterns in the framework are extracted via computing position offset of 3D skeletal body joints locally in the temporal extent of video. Action recognition is then performed by assembling these offset vectors using a bag-of-words framework and also by considering the spatial independence of body joints. We conducted extensive experiments on two benchmarking datasets: UCF dataset and MSRC-12 dataset, to demonstrate the effectiveness of the proposed framework. Experimental results suggest that the proposed framework 1) is very fast to extract action patterns and very simple in implementation; and 2) can achieve a comparable or a better performance in recognition accuracy compared with the state-of-the-art approaches.
Similar content being viewed by others
Notes
Here, it is noted that we do not give the universal value of Δt because it is determined by the observation settings, e.g., the sampling rate of camera. As a result, we need to make an estimation to confirm this value prior to practical usage, as will be presented in the following experiments.
References
Beh J, Han DK, Durasiwami R, Ko H (2014) Hidden Markov Model on a unit hypersphere space for gesture trajectory recognition. Pattern Recogn Lett 36:144–153
Blank M, Gorelick L, Shechtman E, Irani M, Basri R. (2005) Actions as space-time shapes. In: IEEE International Conference of Computer Vision (ICCV), pp 1395–1402
Boiman O, Shechtman E, Irani M. (2008) In defense of nearest-neighbor based image classification. In: IEEE International Conference of Computer Vision and Pattern Recognition (CVPR), pp 1–8
Chaaraoui AA, Padilla-Lopez JR, Climent-Perez P, Florez-Revuelta F (2014) Evolutionary joint selection to improve human action recognition with RGB-D devices. Expert Syst Appl 41 (3):786–794
Ellis C, Masood S, Tappen M, Laviola J, Sukthankar R (2013) Exploring the trade-off between accuracy and observational latency in action recognition. Int J Comput Vision 101(3):420–436
Fathi A, Mori G. (2008) Action recognition by learning mid-level motion features. In: IEEE International Conference of Computer Vision and Pattern Recognition (CVPR), pp 1–8
Federico I I (2014) Human Gesture Recognition and Robot Attentional Regulation for Human-Robot Interaction. Doctoral dissertation. University Degli Studi Di Napoli Federico II
Fothergill S, Mentis HM, Tibshirani P (2012) Instructing people for training gestural interactive system. In: Proceedings of ACM conference on human factors in computing systems, pp 1737–1746
Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings the Thirtieth Annual Acm Symposium on Theory of Computing, pp 604–613
Kobayashi T, Otsu N (2012) Motion recognition using local auto-correlation of space-time gradients. Pattern Recogn Lett 33(9):1188–1195
Kovashka A, Grauman K (2010) Learning a hierarchy of discriminative spacetime neighborhood features for human action recognition. In: IEEE International Conference of Computer Vision and Pattern Recognition (CVPR), pp 2046–2053
Liu T, Guo X, Wang G (2012) Elderly-falling detection using distributed direction-sensitive pyroelectric infrared sensor arrays. Multidim Syst Sign Process 23(4):451–467
Liu L, Shao L (2013) Learning Discriminative Representations from RGB-D Video Data. In: International Joint Conference on Artificial Intelligence (IJCAI), pp 1493–1500
Lu G, Kudo M (2013) Self-Similarities in Difference Images: A New Cue for Single-Person Oriented Action Recognition. IEICE Trans Inf Syst 95(5):1238–1242
Lu G, Kudo M (2014) Learning Action Patterns in Difference Images for Efficient Action Recognition. Neurocomputing 123:328–336
Lu G, Kudo M, Toyama J (2012) Selection of characteristic frames in video for efficient action recognition. IEICE Trans Inf Syst 95(10):2514–2521
Lu G, Kudo M, Toyama J (2013) Temporal Segmentation and Assignment of Successive Actions in a Long-Term Video. Pattern Recogn Lett 34(15):1936–1944
Lu G, Zhou Y (2013) Extraction of Action Patterns using Local Temporal Self-Similarities of Skeletal Body-Joints. In: 2013 6th International Congress on Image and Signal Processing (CISP 2013), pp 96–100
Masood SZ, Ellis C, Nagaraja A, Tappen MF, Laviola JJ, Sukthankar R (2011) Measuring and reducing observational latency when recognizing actions. In: IEEE International Conference of Computer Vision Workshops (ICCV Workshops), pp 422–429
Ming Y, Ruan Q, Hauptmann AG (2012) Activity Recognition from RGB-D Camera with 3D Local Spatio-temporal Features. In: IEEE International Conference of Multimedia and Expo (ICME), pp 344–349
Niebles JC, Wang H, Fei-Fei L (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vis 79(3):299–318
Ohn-bar E, Trivedi MM (2013) Joint angles similiarities and HOG2 for action recognition. In: IEEE International Conference of Computer Vision and Pattern Recognition Workshops: Human Activity Understanding from 3D Data, pp 465–470
Oreifeu O, Liu Z (2013) Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. In: IEEE International Conference of Computer Vision and Pattern Recognition (CVPR), pp 716–723
Poppe R (2007) Vision-based human motin analysis: An overview. Comput Vis Image Underst 108:4–18
Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput 28(6):976–990
Rabie A, Handmann U (2011) Fusion of audio-and visual cues for real-life emotional human robot interaction. Lect Notes Comput Sci 6835:346–355
Slama R, Wannous H, Daoudi M, Srivastava A (2014) Accurate 3D Action Recognition using Learning on the Grassmann Manifold. Pattern Recogn. In press doi:10.1016/j.patcog.2014.08.011
Song Y, Morency LP, Davis R (2013) Distribution-Sensitive Learning for Imbalanced Datasets. In: IEEE International Conference of Automatic Face and Gesture Recognition (FG), pp 1–6
Turaga P, Chellappa R, Subrahmanian VS, Udrea O (2008) Machine recognition of human activities: a survey. IEEE Trans Circ Syst Video Technol 18(11):1473–1488
Weinland D, Ozuysal M, Fua P (2010) Making action recognition robust to occlusions and viewpoint changes, in Computer Vision-ECCV2010. Springer, Berlin Heidelberg, pp 635–648
Yang X, Tian Y (2014) Effective 3D Action Recognition Using EigenJoints. J Vis Commun Image Represent 25(1):2–11
Yang X, Zhang C, Tian Y (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients. In: Proceedings of ACM Conference on Multimedia, pp 1057–1060
Zhang H, Du WX, H. Li (2012) Kinect Gesture Recognition for Interactive System, Stanford University term paper for CS http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.278.3810
Zhu Y, Dariush B, Fujimura K (2010) Kinematic self retargeting: A framework for human pose estimation. Comput Vis Image Underst 114(12):1362–1375
Acknowledgments
This work is financially supported by National Natural Science Foundation of China (61403232), Natural Science Foundation of Shandong Province, China (ZR2014FQ025) and Fundamental Research Funds of Shandong University (2014TB004).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lu, G., Zhou, Y., Li, X. et al. Efficient action recognition via local position offset of 3D skeletal body joints. Multimed Tools Appl 75, 3479–3494 (2016). https://doi.org/10.1007/s11042-015-2448-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-015-2448-1