Efficient action recognition via local position offset of 3D skeletal body joints

Lu, Guoliang; Zhou, Yiqi; Li, Xueyong; Kudo, Mineichi

doi:10.1007/s11042-015-2448-1

Efficient action recognition via local position offset of 3D skeletal body joints

Published: 18 January 2015

Volume 75, pages 3479–3494, (2016)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Guoliang Lu^1,2,
Yiqi Zhou^1,2,
Xueyong Li^1,2 &
…
Mineichi Kudo³

875 Accesses
22 Citations
Explore all metrics

Abstract

To accurately recognize human actions in less computational time is one important aspect for practical usage. This paper presents an efficient framework for recognizing actions by a RGB-D camera. The novel action patterns in the framework are extracted via computing position offset of 3D skeletal body joints locally in the temporal extent of video. Action recognition is then performed by assembling these offset vectors using a bag-of-words framework and also by considering the spatial independence of body joints. We conducted extensive experiments on two benchmarking datasets: UCF dataset and MSRC-12 dataset, to demonstrate the effectiveness of the proposed framework. Experimental results suggest that the proposed framework 1) is very fast to extract action patterns and very simple in implementation; and 2) can achieve a comparable or a better performance in recognition accuracy compared with the state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved use of descriptors for early recognition of actions in video

Article 01 July 2022

Action recognition by fusing depth video and skeletal data information

Article 04 July 2018

Weakly Aligned Multi-part Bag-of-Poses for Action Recognition from Depth Cameras

Notes

Here, it is noted that we do not give the universal value of Δt because it is determined by the observation settings, e.g., the sampling rate of camera. As a result, we need to make an estimation to confirm this value prior to practical usage, as will be presented in the following experiments.

References

Beh J, Han DK, Durasiwami R, Ko H (2014) Hidden Markov Model on a unit hypersphere space for gesture trajectory recognition. Pattern Recogn Lett 36:144–153
Article Google Scholar
Blank M, Gorelick L, Shechtman E, Irani M, Basri R. (2005) Actions as space-time shapes. In: IEEE International Conference of Computer Vision (ICCV), pp 1395–1402
Boiman O, Shechtman E, Irani M. (2008) In defense of nearest-neighbor based image classification. In: IEEE International Conference of Computer Vision and Pattern Recognition (CVPR), pp 1–8
Chaaraoui AA, Padilla-Lopez JR, Climent-Perez P, Florez-Revuelta F (2014) Evolutionary joint selection to improve human action recognition with RGB-D devices. Expert Syst Appl 41 (3):786–794
Article Google Scholar
Ellis C, Masood S, Tappen M, Laviola J, Sukthankar R (2013) Exploring the trade-off between accuracy and observational latency in action recognition. Int J Comput Vision 101(3):420–436
Article Google Scholar
Fathi A, Mori G. (2008) Action recognition by learning mid-level motion features. In: IEEE International Conference of Computer Vision and Pattern Recognition (CVPR), pp 1–8
Federico I I (2014) Human Gesture Recognition and Robot Attentional Regulation for Human-Robot Interaction. Doctoral dissertation. University Degli Studi Di Napoli Federico II
Fothergill S, Mentis HM, Tibshirani P (2012) Instructing people for training gestural interactive system. In: Proceedings of ACM conference on human factors in computing systems, pp 1737–1746
Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings the Thirtieth Annual Acm Symposium on Theory of Computing, pp 604–613
Kobayashi T, Otsu N (2012) Motion recognition using local auto-correlation of space-time gradients. Pattern Recogn Lett 33(9):1188–1195
Article Google Scholar
Kovashka A, Grauman K (2010) Learning a hierarchy of discriminative spacetime neighborhood features for human action recognition. In: IEEE International Conference of Computer Vision and Pattern Recognition (CVPR), pp 2046–2053
Liu T, Guo X, Wang G (2012) Elderly-falling detection using distributed direction-sensitive pyroelectric infrared sensor arrays. Multidim Syst Sign Process 23(4):451–467
Article MathSciNet MATH Google Scholar
Liu L, Shao L (2013) Learning Discriminative Representations from RGB-D Video Data. In: International Joint Conference on Artificial Intelligence (IJCAI), pp 1493–1500
Lu G, Kudo M (2013) Self-Similarities in Difference Images: A New Cue for Single-Person Oriented Action Recognition. IEICE Trans Inf Syst 95(5):1238–1242
Article Google Scholar
Lu G, Kudo M (2014) Learning Action Patterns in Difference Images for Efficient Action Recognition. Neurocomputing 123:328–336
Article Google Scholar
Lu G, Kudo M, Toyama J (2012) Selection of characteristic frames in video for efficient action recognition. IEICE Trans Inf Syst 95(10):2514–2521
Article Google Scholar
Lu G, Kudo M, Toyama J (2013) Temporal Segmentation and Assignment of Successive Actions in a Long-Term Video. Pattern Recogn Lett 34(15):1936–1944
Article Google Scholar
Lu G, Zhou Y (2013) Extraction of Action Patterns using Local Temporal Self-Similarities of Skeletal Body-Joints. In: 2013 6th International Congress on Image and Signal Processing (CISP 2013), pp 96–100
Masood SZ, Ellis C, Nagaraja A, Tappen MF, Laviola JJ, Sukthankar R (2011) Measuring and reducing observational latency when recognizing actions. In: IEEE International Conference of Computer Vision Workshops (ICCV Workshops), pp 422–429
Ming Y, Ruan Q, Hauptmann AG (2012) Activity Recognition from RGB-D Camera with 3D Local Spatio-temporal Features. In: IEEE International Conference of Multimedia and Expo (ICME), pp 344–349
Niebles JC, Wang H, Fei-Fei L (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vis 79(3):299–318
Article Google Scholar
Ohn-bar E, Trivedi MM (2013) Joint angles similiarities and HOG2 for action recognition. In: IEEE International Conference of Computer Vision and Pattern Recognition Workshops: Human Activity Understanding from 3D Data, pp 465–470
Oreifeu O, Liu Z (2013) Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. In: IEEE International Conference of Computer Vision and Pattern Recognition (CVPR), pp 716–723
Poppe R (2007) Vision-based human motin analysis: An overview. Comput Vis Image Underst 108:4–18
Article Google Scholar
Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput 28(6):976–990
Article Google Scholar
Rabie A, Handmann U (2011) Fusion of audio-and visual cues for real-life emotional human robot interaction. Lect Notes Comput Sci 6835:346–355
Article Google Scholar
Slama R, Wannous H, Daoudi M, Srivastava A (2014) Accurate 3D Action Recognition using Learning on the Grassmann Manifold. Pattern Recogn. In press doi:10.1016/j.patcog.2014.08.011
Song Y, Morency LP, Davis R (2013) Distribution-Sensitive Learning for Imbalanced Datasets. In: IEEE International Conference of Automatic Face and Gesture Recognition (FG), pp 1–6
Turaga P, Chellappa R, Subrahmanian VS, Udrea O (2008) Machine recognition of human activities: a survey. IEEE Trans Circ Syst Video Technol 18(11):1473–1488
Article Google Scholar
Weinland D, Ozuysal M, Fua P (2010) Making action recognition robust to occlusions and viewpoint changes, in Computer Vision-ECCV2010. Springer, Berlin Heidelberg, pp 635–648
Google Scholar
Yang X, Tian Y (2014) Effective 3D Action Recognition Using EigenJoints. J Vis Commun Image Represent 25(1):2–11
Article MathSciNet Google Scholar
Yang X, Zhang C, Tian Y (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients. In: Proceedings of ACM Conference on Multimedia, pp 1057–1060
Zhang H, Du WX, H. Li (2012) Kinect Gesture Recognition for Interactive System, Stanford University term paper for CS http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.278.3810
Zhu Y, Dariush B, Fujimura K (2010) Kinematic self retargeting: A framework for human pose estimation. Comput Vis Image Underst 114(12):1362–1375
Article Google Scholar

Download references

Acknowledgments

This work is financially supported by National Natural Science Foundation of China (61403232), Natural Science Foundation of Shandong Province, China (ZR2014FQ025) and Fundamental Research Funds of Shandong University (2014TB004).

Author information

Authors and Affiliations

School of Mechanical Engineering, Shandong University, Jinan, China
Guoliang Lu, Yiqi Zhou & Xueyong Li
Key Laboratory of High-efficiency and Clean Mechanical Manufacture of MOE, Shandong University, Jinan, China
Guoliang Lu, Yiqi Zhou & Xueyong Li
Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Japan
Mineichi Kudo

Authors

Guoliang Lu
View author publications
You can also search for this author in PubMed Google Scholar
Yiqi Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Xueyong Li
View author publications
You can also search for this author in PubMed Google Scholar
Mineichi Kudo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guoliang Lu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, G., Zhou, Y., Li, X. et al. Efficient action recognition via local position offset of 3D skeletal body joints. Multimed Tools Appl 75, 3479–3494 (2016). https://doi.org/10.1007/s11042-015-2448-1

Download citation

Received: 18 April 2014
Revised: 17 November 2014
Accepted: 02 January 2015
Published: 18 January 2015
Issue Date: March 2016
DOI: https://doi.org/10.1007/s11042-015-2448-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient action recognition via local position offset of 3D skeletal body joints

Abstract

Access this article

Similar content being viewed by others

Improved use of descriptors for early recognition of actions in video

Action recognition by fusing depth video and skeletal data information

Weakly Aligned Multi-part Bag-of-Poses for Action Recognition from Depth Cameras

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient action recognition via local position offset of 3D skeletal body joints

Abstract

Access this article

Similar content being viewed by others

Improved use of descriptors for early recognition of actions in video

Action recognition by fusing depth video and skeletal data information

Weakly Aligned Multi-part Bag-of-Poses for Action Recognition from Depth Cameras

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation