Skip to main content
Log in

Efficient action recognition via local position offset of 3D skeletal body joints

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

To accurately recognize human actions in less computational time is one important aspect for practical usage. This paper presents an efficient framework for recognizing actions by a RGB-D camera. The novel action patterns in the framework are extracted via computing position offset of 3D skeletal body joints locally in the temporal extent of video. Action recognition is then performed by assembling these offset vectors using a bag-of-words framework and also by considering the spatial independence of body joints. We conducted extensive experiments on two benchmarking datasets: UCF dataset and MSRC-12 dataset, to demonstrate the effectiveness of the proposed framework. Experimental results suggest that the proposed framework 1) is very fast to extract action patterns and very simple in implementation; and 2) can achieve a comparable or a better performance in recognition accuracy compared with the state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. Here, it is noted that we do not give the universal value of Δt because it is determined by the observation settings, e.g., the sampling rate of camera. As a result, we need to make an estimation to confirm this value prior to practical usage, as will be presented in the following experiments.

References

  1. Beh J, Han DK, Durasiwami R, Ko H (2014) Hidden Markov Model on a unit hypersphere space for gesture trajectory recognition. Pattern Recogn Lett 36:144–153

    Article  Google Scholar 

  2. Blank M, Gorelick L, Shechtman E, Irani M, Basri R. (2005) Actions as space-time shapes. In: IEEE International Conference of Computer Vision (ICCV), pp 1395–1402

  3. Boiman O, Shechtman E, Irani M. (2008) In defense of nearest-neighbor based image classification. In: IEEE International Conference of Computer Vision and Pattern Recognition (CVPR), pp 1–8

  4. Chaaraoui AA, Padilla-Lopez JR, Climent-Perez P, Florez-Revuelta F (2014) Evolutionary joint selection to improve human action recognition with RGB-D devices. Expert Syst Appl 41 (3):786–794

    Article  Google Scholar 

  5. Ellis C, Masood S, Tappen M, Laviola J, Sukthankar R (2013) Exploring the trade-off between accuracy and observational latency in action recognition. Int J Comput Vision 101(3):420–436

    Article  Google Scholar 

  6. Fathi A, Mori G. (2008) Action recognition by learning mid-level motion features. In: IEEE International Conference of Computer Vision and Pattern Recognition (CVPR), pp 1–8

  7. Federico I I (2014) Human Gesture Recognition and Robot Attentional Regulation for Human-Robot Interaction. Doctoral dissertation. University Degli Studi Di Napoli Federico II

  8. Fothergill S, Mentis HM, Tibshirani P (2012) Instructing people for training gestural interactive system. In: Proceedings of ACM conference on human factors in computing systems, pp 1737–1746

  9. Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings the Thirtieth Annual Acm Symposium on Theory of Computing, pp 604–613

  10. Kobayashi T, Otsu N (2012) Motion recognition using local auto-correlation of space-time gradients. Pattern Recogn Lett 33(9):1188–1195

    Article  Google Scholar 

  11. Kovashka A, Grauman K (2010) Learning a hierarchy of discriminative spacetime neighborhood features for human action recognition. In: IEEE International Conference of Computer Vision and Pattern Recognition (CVPR), pp 2046–2053

  12. Liu T, Guo X, Wang G (2012) Elderly-falling detection using distributed direction-sensitive pyroelectric infrared sensor arrays. Multidim Syst Sign Process 23(4):451–467

    Article  MathSciNet  MATH  Google Scholar 

  13. Liu L, Shao L (2013) Learning Discriminative Representations from RGB-D Video Data. In: International Joint Conference on Artificial Intelligence (IJCAI), pp 1493–1500

  14. Lu G, Kudo M (2013) Self-Similarities in Difference Images: A New Cue for Single-Person Oriented Action Recognition. IEICE Trans Inf Syst 95(5):1238–1242

    Article  Google Scholar 

  15. Lu G, Kudo M (2014) Learning Action Patterns in Difference Images for Efficient Action Recognition. Neurocomputing 123:328–336

    Article  Google Scholar 

  16. Lu G, Kudo M, Toyama J (2012) Selection of characteristic frames in video for efficient action recognition. IEICE Trans Inf Syst 95(10):2514–2521

    Article  Google Scholar 

  17. Lu G, Kudo M, Toyama J (2013) Temporal Segmentation and Assignment of Successive Actions in a Long-Term Video. Pattern Recogn Lett 34(15):1936–1944

    Article  Google Scholar 

  18. Lu G, Zhou Y (2013) Extraction of Action Patterns using Local Temporal Self-Similarities of Skeletal Body-Joints. In: 2013 6th International Congress on Image and Signal Processing (CISP 2013), pp 96–100

  19. Masood SZ, Ellis C, Nagaraja A, Tappen MF, Laviola JJ, Sukthankar R (2011) Measuring and reducing observational latency when recognizing actions. In: IEEE International Conference of Computer Vision Workshops (ICCV Workshops), pp 422–429

  20. Ming Y, Ruan Q, Hauptmann AG (2012) Activity Recognition from RGB-D Camera with 3D Local Spatio-temporal Features. In: IEEE International Conference of Multimedia and Expo (ICME), pp 344–349

  21. Niebles JC, Wang H, Fei-Fei L (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vis 79(3):299–318

    Article  Google Scholar 

  22. Ohn-bar E, Trivedi MM (2013) Joint angles similiarities and HOG2 for action recognition. In: IEEE International Conference of Computer Vision and Pattern Recognition Workshops: Human Activity Understanding from 3D Data, pp 465–470

  23. Oreifeu O, Liu Z (2013) Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. In: IEEE International Conference of Computer Vision and Pattern Recognition (CVPR), pp 716–723

  24. Poppe R (2007) Vision-based human motin analysis: An overview. Comput Vis Image Underst 108:4–18

    Article  Google Scholar 

  25. Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput 28(6):976–990

    Article  Google Scholar 

  26. Rabie A, Handmann U (2011) Fusion of audio-and visual cues for real-life emotional human robot interaction. Lect Notes Comput Sci 6835:346–355

    Article  Google Scholar 

  27. Slama R, Wannous H, Daoudi M, Srivastava A (2014) Accurate 3D Action Recognition using Learning on the Grassmann Manifold. Pattern Recogn. In press doi:10.1016/j.patcog.2014.08.011

  28. Song Y, Morency LP, Davis R (2013) Distribution-Sensitive Learning for Imbalanced Datasets. In: IEEE International Conference of Automatic Face and Gesture Recognition (FG), pp 1–6

  29. Turaga P, Chellappa R, Subrahmanian VS, Udrea O (2008) Machine recognition of human activities: a survey. IEEE Trans Circ Syst Video Technol 18(11):1473–1488

    Article  Google Scholar 

  30. Weinland D, Ozuysal M, Fua P (2010) Making action recognition robust to occlusions and viewpoint changes, in Computer Vision-ECCV2010. Springer, Berlin Heidelberg, pp 635–648

    Google Scholar 

  31. Yang X, Tian Y (2014) Effective 3D Action Recognition Using EigenJoints. J Vis Commun Image Represent 25(1):2–11

    Article  MathSciNet  Google Scholar 

  32. Yang X, Zhang C, Tian Y (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients. In: Proceedings of ACM Conference on Multimedia, pp 1057–1060

  33. Zhang H, Du WX, H. Li (2012) Kinect Gesture Recognition for Interactive System, Stanford University term paper for CS http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.278.3810

  34. Zhu Y, Dariush B, Fujimura K (2010) Kinematic self retargeting: A framework for human pose estimation. Comput Vis Image Underst 114(12):1362–1375

    Article  Google Scholar 

Download references

Acknowledgments

This work is financially supported by National Natural Science Foundation of China (61403232), Natural Science Foundation of Shandong Province, China (ZR2014FQ025) and Fundamental Research Funds of Shandong University (2014TB004).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guoliang Lu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, G., Zhou, Y., Li, X. et al. Efficient action recognition via local position offset of 3D skeletal body joints. Multimed Tools Appl 75, 3479–3494 (2016). https://doi.org/10.1007/s11042-015-2448-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-015-2448-1

Keywords

Navigation