Skip to main content
Log in

Adaptive most joint selection and covariance descriptions for a robust skeleton-based human action recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this paper, we propose two effective manners of utilizing skeleton data for human action recognition (HAR). The proposed method on one hand takes advantage of the skeleton data thanks to their robustness to human appearance change as well as the real-time performance. On the other hand, it avoids inherent drawbacks of the skeleton data such as noises, incorrect human skeleton estimation due to self-occlusion of human pose. To this end, in terms of feature designing, we propose to extract covariance descriptors from joint velocity and combine them with those of joint position. In terms of 3-D skeleton-based activity representation, we propose two schemes to select the most informative joints. The proposed method is evaluated on two benchmark datasets. On the MSRAction-3D dataset, the proposed method outperformed different hand-designed features-based methods. On the challenging dataset CMDFall, the proposed method significantly improves accuracy when compared with techniques based on recent neuronal networks. Finally, we investigate the robustness of the proposed method via a cross-dataset evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23

Similar content being viewed by others

References

  1. Afsar P, Cortez P, Santos H (2015) Automatic visual detection of human behavior: A review from 2000 to 2014. Expert Syst Appl 42(20):6935–6956 . http://www.sciencedirect.com/science/article/pii/S0957417415003516

    Article  Google Scholar 

  2. Aggarwal J, Ryoo M (2011) Human activity analysis: A review. ACM Comput Surv 43(3)

  3. Cao Z, Hidalgo Martinez G, Simon T, Wei S, Sheikh YA (2019) Openpose: Realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans Pattern Anal Machine Intell 1–1

  4. Carbonera Luvizon D, Tabia H, Picard D (2017) Learning features combination for human action recognition from skeleton sequences. Pattern Recognit Lett 99:13–20. https://hal.archives-ouvertes.fr/hal-01515376

    Article  Google Scholar 

  5. Caruccio L, Deufemia V, Polese G (2016) Relaxed functional dependencies–A survey of approaches. IEEE Trans Knowl Data Eng 28(1):147–165. https://doi.org/10.1109/TKDE.2015.2472010

    Article  Google Scholar 

  6. Caruccio L, Deufemia V, Polese G (2020) Mining relaxed functional dependencies from data. Data Min Knowl Discov 34(2):443–477. https://doi.org/10.1007/s10618-019-00667-7

    Article  MathSciNet  Google Scholar 

  7. Caruccio L, Polese G, Tortora G, Iannone D (2019) Edcar: A knowledge representation framework to enhance automatic video surveillance. Expert Syst Appl 131:190–207. http://www.sciencedirect.com/science/article/pii/S0957417419302623

    Article  Google Scholar 

  8. Chaquet JM, Carmona EJ, Fernández-Caballero A (2013) A survey of video datasets for human action and activity recognition. Comput Vision Image Understand 117(6):633–659. http://www.sciencedirect.com/science/article/pii/S1077314213000295

    Article  Google Scholar 

  9. Chen Y, Tian Y, He M (2020) Monocular human pose estimation: A survey of deep learning-based methods. Comput Vision Image Understand 192:102897. http://www.sciencedirect.com/science/article/pii/S1077314219301778

    Article  Google Scholar 

  10. Chuankun L, Wang P, Wang S, Hou Y, Li W (2017) Skeleton-based action recognition using LSTM and CNN. In: The proceeding of IEEE international conference on multimedia expo workshops (ICMEW) 2017, pp 585–590

  11. Ding W, Liu K, XU B, Cheng F (2017) Skeleton-based human action recognition via screw matrices. Chin J Electron 26(4):790–796

    Article  Google Scholar 

  12. Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: The proceeding of IEEE conference on computer vision and pattern recognition (CVPR) 2015, pp 1110–1118

  13. El-Ghaish H, Shoukry A, Hussein M (2018) Covp3DJ: Skeleton-parts-based-covariance descriptor for human action recognition. VISAPP. https://doi.org/10.5220/0006625703430350

  14. Franco A, Magnani A, Maio D (2020) A multimodal approach for human activity recognition based on skeleton and RGB data. Pattern Recognit Lett

  15. Han F, Reily B, Hoff W, Zhang H (2017) Space-time representation of people based on 3D skeletal data: A review. Comput Vis Image Underst 158:85–105

    Article  Google Scholar 

  16. Hoang VN, Le TL, Tran TH, Nguyen VT et al (2019) 3D skeleton-based action recognition with convolutional neural networks. In: The proceeding of international conference on multimedia analysis and pattern recognition (MAPR) 2019. IEEE, pp 1–6

  17. Hussein M, Torki, Goawayyed, El-Saban (2013) Human action recognition using a temporal hierarchy of covariance descriptors on 3D joints locations. Int Joint Conf Artif Intell,

  18. Hussein ME, Torki M, Gowayyed MA, El-Saban M (2013) Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations. In: The proceeding of twenty-third international joint conference on artificial intelligence

  19. Jiang M, Kong J, Bebis G, Huo H (2015) Informative joints based human action recognition using skeleton contexts. Sig Proc Image Comm 33:29–40

    Article  Google Scholar 

  20. Johansson G (1973) Visual perception of biological motion and a model for its analysis. Percept Psycho 14(2):201–211

    Article  Google Scholar 

  21. Joshi KA, Thakore D (2012) A survey on moving object detection and tracking in video surveillance system. Int J Soft Comput Eng (IJSCE) 2

  22. Kim TS, Reiter A (2017) Interpretable 3D human action analysis with temporal convolutional networks. arXiv:1704.04516

  23. Lavee G, Rivlin E, Rudzsky M (2009) Understanding video events: A survey of methods for automatic interpretation of semantic occurrences in video. IEEE Trans Syst Man Cybern Part C (Appl Rev) 39(5):489–504

    Article  Google Scholar 

  24. Le Guilly M, Petit JM, Scuturici VM (2020) Evaluating Classification Feasibility Using Functional Dependencies. Springer, Berlin , pp 132–159. https://doi.org/10.1007/978-3-662-62271-1_5

    Google Scholar 

  25. Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: Proceeding of IEEE computer society conference on computer vision and pattern recognition - workshops 2010, pp 9–14

  26. Liu Y, Lu Z, Li J, Yang T (2019) Hierarchically learned view-invariant representations for cross-view action recognition. IEEE Trans Circ Syst Video Technol 29(8):2416–2430

    Article  Google Scholar 

  27. Lo Presti L, La Cascia M (2015) 3D skeleton-based human action classification: A survey. Pattern Recogn 53. https://doi.org/10.1016/j.patcog.2015.11.019

  28. Nguyen T, Pham D, Le T, Vu H, Tran T (2018) Novel skeleton-based action recognition using covariance descriptors on most informative joints. In: The proceeding of 10th international conference on knowledge and systems engineering (KSE) 2018, pp 50–55

  29. Ofli F, Chaudhry R, Kurillo G, Vidal R, Bajcsy R (2012) Sequence of the most informative joints (SMIJ): A new representation for human skeletal action recognition. In: The proceeding of IEEE computer society conference on computer vision and pattern recognition workshops 2012, pp 8–13

  30. Park E, Han X, Berg TL, Berg AC (2016) Combining multiple sources of knowledge in deep CNNs for action recognition. In: The proceeding of IEEE winter conference on applications of computer vision (WACV) 2016, pp 1–8

  31. Pham D, Nguyen T, Le T, Vu H (2019) Analyzing role of joint subset selection in human action recognition. In: The proceeding of 6th NAFOSTED conference on information and computer science (NICS) 2019, pp 61–66

  32. Shahroudy A, Liu J, Ng T, Wang G (2016) NTU RGB+D: A large scale dataset for 3D human activity analysis. In: The proceeding of IEEE conference on computer vision and pattern recognition (CVPR) 2016, pp 1010–1019

  33. Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp 12018–12027

  34. Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from single depth images. In: The proceeding of CVPR 2011, pp 1297–1304

  35. Si C, Jing Y, Wang W, Wang L, Tan T (2018) Skeleton-based action recognition with spatial reasoning and temporal stack learning. arXiv:1805.02335

  36. Song YF, Zhang Z, Wang L (2019) Richly activated graph convolutional network for action recognition with incomplete skeletons. In: The proceeding of international conference on image processing (ICIP). IEEE

  37. Tran TH, Le TL, Pham DT, Hoang VN, Khong VM, Tran QT, Nguyen TS, Pham C (2018) A multi-modal multi-view dataset for human fall analysis and preliminary investigation on modality. In: The proceeding of 24th international conference on pattern recognition (ICPR) 2018. IEEE, pp 1947–1952

  38. Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Machine Learn Res 9:2579–2605. http://www.jmlr.org/papers/v9/vandermaaten08a.html

    MATH  Google Scholar 

  39. Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3D skeletons as points in a Lie group. In: The proceeding of IEEE conference on computer vision and pattern recognition 2014, pp 588–595

  40. Vemulapalli R, Chellappa R (2016) Rolling rotations for recognizing human actions from 3D skeletal data. In: The proceeding of IEEE conference on computer vision and pattern recognition (CVPR) 2016, pp 4471–4479

  41. Wang J, Liu Z, Wu Y, Yuan J (2012) Mining actionlet ensemble for action recognition with depth cameras. In: The proceeding of IEEE conference on computer vision and pattern recognition 2012, pp 1290–1297

  42. Wang T, Qiao M, Deng Y, Zhou Y, Wang H, Lyu Q, Snoussi H (2018) Abnormal event detection based on analysis of movement information of video sequence. Optik 152:50–60. http://www.sciencedirect.com/science/article/pii/S0030402617308872

    Article  Google Scholar 

  43. Wang L, Zhang J, Zhou L, Tang C, Li W (2015) Beyond covariance: Feature representation with nonlinear kernel matrices. In: The proceeding of IEEE international conference on computer vision (ICCV) 2015, pp 4570–4578

  44. Wu Q, Wang Z, Deng F, Xia Y, Kang W, Feng DD (2013) Discriminative two-level feature selection for realistic human action recognition. J Visual Commun Image Represent 24(7):1064–1074. https://doi.org/10.1016/j.jvcir.2013.07.001. https://www.sciencedirect.com/science/article/pii/S1047320313001338

    Article  Google Scholar 

  45. Xie C, Li C, Zhang B, Chen C, Han J, Zou C, Liu J (2018) Memory attention networks for skeleton-based action recognition. arXiv:1804.08254

  46. Xu Z, Wang Y, Jiang J, Yao J, Li L (2020) Adaptive feature selection with reinforcement learning for skeleton-based action recognition. IEEE Access 8:213038–213051. https://doi.org/10.1109/ACCESS.2020.3038235

    Article  Google Scholar 

  47. Xue H, Liu Y, Cai D, He X (2016) Tracking people in RGBD videos using deep learning and motion clues. Neurocomputing 204:70–76. http://www.sciencedirect.com/science/article/pii/S0925231216300959. Big Learning in Social Media Analytics

    Article  Google Scholar 

  48. Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: The proceeding of AAAI

  49. Yao H, Hamilton H (2008) Mining functional dependencies from data. Data Min Knowl Discov 16:197–219. https://doi.org/10.1007/s10618-007-0083-9

    Article  MathSciNet  Google Scholar 

  50. Yao G, Lei T, Zhong J (2019) A review of convolutional-neural-network-based action recognition. Pattern Recogn Lett 118:14–22

    Article  Google Scholar 

  51. Zhang J, Shum HPH, Han J, Shao L (2018) Action recognition from arbitrary views using transferable dictionary learning. IEEE Trans Image Process 27(10):4709–4723

    Article  MathSciNet  Google Scholar 

  52. Zhang S, Wei Z, Nie J, Huang L, Wang S, Li Z (2017) A review on human activity recognition using vision-based method. J Healthc Eng 42(20):6935–6956

    Google Scholar 

  53. Zhang HB, Zhang YX, Zhong B, Lei Q, Yang L, Du JX, Chen DS (2019) A comprehensive survey of vision-based human action recognition methods. In: Sensors

  54. Zhang T, Zhangg Y, Cai J, Kot AC (2016) Efficient object feature selection for action recognition. In: 2016 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 2707–2711. https://doi.org/10.1109/ICASSP.2016.7472169

Download references

Acknowledgements

This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 102.01-2017.12.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thi-Lan Le.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nguyen, VT., Nguyen, TN., Le, TL. et al. Adaptive most joint selection and covariance descriptions for a robust skeleton-based human action recognition. Multimed Tools Appl 80, 27757–27783 (2021). https://doi.org/10.1007/s11042-021-10866-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-10866-4

Keywords

Navigation