Abstract
In this paper, we propose two effective manners of utilizing skeleton data for human action recognition (HAR). The proposed method on one hand takes advantage of the skeleton data thanks to their robustness to human appearance change as well as the real-time performance. On the other hand, it avoids inherent drawbacks of the skeleton data such as noises, incorrect human skeleton estimation due to self-occlusion of human pose. To this end, in terms of feature designing, we propose to extract covariance descriptors from joint velocity and combine them with those of joint position. In terms of 3-D skeleton-based activity representation, we propose two schemes to select the most informative joints. The proposed method is evaluated on two benchmark datasets. On the MSRAction-3D dataset, the proposed method outperformed different hand-designed features-based methods. On the challenging dataset CMDFall, the proposed method significantly improves accuracy when compared with techniques based on recent neuronal networks. Finally, we investigate the robustness of the proposed method via a cross-dataset evaluation.























Similar content being viewed by others
References
Afsar P, Cortez P, Santos H (2015) Automatic visual detection of human behavior: A review from 2000 to 2014. Expert Syst Appl 42(20):6935–6956 . http://www.sciencedirect.com/science/article/pii/S0957417415003516
Aggarwal J, Ryoo M (2011) Human activity analysis: A review. ACM Comput Surv 43(3)
Cao Z, Hidalgo Martinez G, Simon T, Wei S, Sheikh YA (2019) Openpose: Realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans Pattern Anal Machine Intell 1–1
Carbonera Luvizon D, Tabia H, Picard D (2017) Learning features combination for human action recognition from skeleton sequences. Pattern Recognit Lett 99:13–20. https://hal.archives-ouvertes.fr/hal-01515376
Caruccio L, Deufemia V, Polese G (2016) Relaxed functional dependencies–A survey of approaches. IEEE Trans Knowl Data Eng 28(1):147–165. https://doi.org/10.1109/TKDE.2015.2472010
Caruccio L, Deufemia V, Polese G (2020) Mining relaxed functional dependencies from data. Data Min Knowl Discov 34(2):443–477. https://doi.org/10.1007/s10618-019-00667-7
Caruccio L, Polese G, Tortora G, Iannone D (2019) Edcar: A knowledge representation framework to enhance automatic video surveillance. Expert Syst Appl 131:190–207. http://www.sciencedirect.com/science/article/pii/S0957417419302623
Chaquet JM, Carmona EJ, Fernández-Caballero A (2013) A survey of video datasets for human action and activity recognition. Comput Vision Image Understand 117(6):633–659. http://www.sciencedirect.com/science/article/pii/S1077314213000295
Chen Y, Tian Y, He M (2020) Monocular human pose estimation: A survey of deep learning-based methods. Comput Vision Image Understand 192:102897. http://www.sciencedirect.com/science/article/pii/S1077314219301778
Chuankun L, Wang P, Wang S, Hou Y, Li W (2017) Skeleton-based action recognition using LSTM and CNN. In: The proceeding of IEEE international conference on multimedia expo workshops (ICMEW) 2017, pp 585–590
Ding W, Liu K, XU B, Cheng F (2017) Skeleton-based human action recognition via screw matrices. Chin J Electron 26(4):790–796
Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: The proceeding of IEEE conference on computer vision and pattern recognition (CVPR) 2015, pp 1110–1118
El-Ghaish H, Shoukry A, Hussein M (2018) Covp3DJ: Skeleton-parts-based-covariance descriptor for human action recognition. VISAPP. https://doi.org/10.5220/0006625703430350
Franco A, Magnani A, Maio D (2020) A multimodal approach for human activity recognition based on skeleton and RGB data. Pattern Recognit Lett
Han F, Reily B, Hoff W, Zhang H (2017) Space-time representation of people based on 3D skeletal data: A review. Comput Vis Image Underst 158:85–105
Hoang VN, Le TL, Tran TH, Nguyen VT et al (2019) 3D skeleton-based action recognition with convolutional neural networks. In: The proceeding of international conference on multimedia analysis and pattern recognition (MAPR) 2019. IEEE, pp 1–6
Hussein M, Torki, Goawayyed, El-Saban (2013) Human action recognition using a temporal hierarchy of covariance descriptors on 3D joints locations. Int Joint Conf Artif Intell,
Hussein ME, Torki M, Gowayyed MA, El-Saban M (2013) Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations. In: The proceeding of twenty-third international joint conference on artificial intelligence
Jiang M, Kong J, Bebis G, Huo H (2015) Informative joints based human action recognition using skeleton contexts. Sig Proc Image Comm 33:29–40
Johansson G (1973) Visual perception of biological motion and a model for its analysis. Percept Psycho 14(2):201–211
Joshi KA, Thakore D (2012) A survey on moving object detection and tracking in video surveillance system. Int J Soft Comput Eng (IJSCE) 2
Kim TS, Reiter A (2017) Interpretable 3D human action analysis with temporal convolutional networks. arXiv:1704.04516
Lavee G, Rivlin E, Rudzsky M (2009) Understanding video events: A survey of methods for automatic interpretation of semantic occurrences in video. IEEE Trans Syst Man Cybern Part C (Appl Rev) 39(5):489–504
Le Guilly M, Petit JM, Scuturici VM (2020) Evaluating Classification Feasibility Using Functional Dependencies. Springer, Berlin , pp 132–159. https://doi.org/10.1007/978-3-662-62271-1_5
Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: Proceeding of IEEE computer society conference on computer vision and pattern recognition - workshops 2010, pp 9–14
Liu Y, Lu Z, Li J, Yang T (2019) Hierarchically learned view-invariant representations for cross-view action recognition. IEEE Trans Circ Syst Video Technol 29(8):2416–2430
Lo Presti L, La Cascia M (2015) 3D skeleton-based human action classification: A survey. Pattern Recogn 53. https://doi.org/10.1016/j.patcog.2015.11.019
Nguyen T, Pham D, Le T, Vu H, Tran T (2018) Novel skeleton-based action recognition using covariance descriptors on most informative joints. In: The proceeding of 10th international conference on knowledge and systems engineering (KSE) 2018, pp 50–55
Ofli F, Chaudhry R, Kurillo G, Vidal R, Bajcsy R (2012) Sequence of the most informative joints (SMIJ): A new representation for human skeletal action recognition. In: The proceeding of IEEE computer society conference on computer vision and pattern recognition workshops 2012, pp 8–13
Park E, Han X, Berg TL, Berg AC (2016) Combining multiple sources of knowledge in deep CNNs for action recognition. In: The proceeding of IEEE winter conference on applications of computer vision (WACV) 2016, pp 1–8
Pham D, Nguyen T, Le T, Vu H (2019) Analyzing role of joint subset selection in human action recognition. In: The proceeding of 6th NAFOSTED conference on information and computer science (NICS) 2019, pp 61–66
Shahroudy A, Liu J, Ng T, Wang G (2016) NTU RGB+D: A large scale dataset for 3D human activity analysis. In: The proceeding of IEEE conference on computer vision and pattern recognition (CVPR) 2016, pp 1010–1019
Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp 12018–12027
Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from single depth images. In: The proceeding of CVPR 2011, pp 1297–1304
Si C, Jing Y, Wang W, Wang L, Tan T (2018) Skeleton-based action recognition with spatial reasoning and temporal stack learning. arXiv:1805.02335
Song YF, Zhang Z, Wang L (2019) Richly activated graph convolutional network for action recognition with incomplete skeletons. In: The proceeding of international conference on image processing (ICIP). IEEE
Tran TH, Le TL, Pham DT, Hoang VN, Khong VM, Tran QT, Nguyen TS, Pham C (2018) A multi-modal multi-view dataset for human fall analysis and preliminary investigation on modality. In: The proceeding of 24th international conference on pattern recognition (ICPR) 2018. IEEE, pp 1947–1952
Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Machine Learn Res 9:2579–2605. http://www.jmlr.org/papers/v9/vandermaaten08a.html
Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3D skeletons as points in a Lie group. In: The proceeding of IEEE conference on computer vision and pattern recognition 2014, pp 588–595
Vemulapalli R, Chellappa R (2016) Rolling rotations for recognizing human actions from 3D skeletal data. In: The proceeding of IEEE conference on computer vision and pattern recognition (CVPR) 2016, pp 4471–4479
Wang J, Liu Z, Wu Y, Yuan J (2012) Mining actionlet ensemble for action recognition with depth cameras. In: The proceeding of IEEE conference on computer vision and pattern recognition 2012, pp 1290–1297
Wang T, Qiao M, Deng Y, Zhou Y, Wang H, Lyu Q, Snoussi H (2018) Abnormal event detection based on analysis of movement information of video sequence. Optik 152:50–60. http://www.sciencedirect.com/science/article/pii/S0030402617308872
Wang L, Zhang J, Zhou L, Tang C, Li W (2015) Beyond covariance: Feature representation with nonlinear kernel matrices. In: The proceeding of IEEE international conference on computer vision (ICCV) 2015, pp 4570–4578
Wu Q, Wang Z, Deng F, Xia Y, Kang W, Feng DD (2013) Discriminative two-level feature selection for realistic human action recognition. J Visual Commun Image Represent 24(7):1064–1074. https://doi.org/10.1016/j.jvcir.2013.07.001. https://www.sciencedirect.com/science/article/pii/S1047320313001338
Xie C, Li C, Zhang B, Chen C, Han J, Zou C, Liu J (2018) Memory attention networks for skeleton-based action recognition. arXiv:1804.08254
Xu Z, Wang Y, Jiang J, Yao J, Li L (2020) Adaptive feature selection with reinforcement learning for skeleton-based action recognition. IEEE Access 8:213038–213051. https://doi.org/10.1109/ACCESS.2020.3038235
Xue H, Liu Y, Cai D, He X (2016) Tracking people in RGBD videos using deep learning and motion clues. Neurocomputing 204:70–76. http://www.sciencedirect.com/science/article/pii/S0925231216300959. Big Learning in Social Media Analytics
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: The proceeding of AAAI
Yao H, Hamilton H (2008) Mining functional dependencies from data. Data Min Knowl Discov 16:197–219. https://doi.org/10.1007/s10618-007-0083-9
Yao G, Lei T, Zhong J (2019) A review of convolutional-neural-network-based action recognition. Pattern Recogn Lett 118:14–22
Zhang J, Shum HPH, Han J, Shao L (2018) Action recognition from arbitrary views using transferable dictionary learning. IEEE Trans Image Process 27(10):4709–4723
Zhang S, Wei Z, Nie J, Huang L, Wang S, Li Z (2017) A review on human activity recognition using vision-based method. J Healthc Eng 42(20):6935–6956
Zhang HB, Zhang YX, Zhong B, Lei Q, Yang L, Du JX, Chen DS (2019) A comprehensive survey of vision-based human action recognition methods. In: Sensors
Zhang T, Zhangg Y, Cai J, Kot AC (2016) Efficient object feature selection for action recognition. In: 2016 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 2707–2711. https://doi.org/10.1109/ICASSP.2016.7472169
Acknowledgements
This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 102.01-2017.12.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Nguyen, VT., Nguyen, TN., Le, TL. et al. Adaptive most joint selection and covariance descriptions for a robust skeleton-based human action recognition. Multimed Tools Appl 80, 27757–27783 (2021). https://doi.org/10.1007/s11042-021-10866-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-10866-4