Adaptive most joint selection and covariance descriptions for a robust skeleton-based human action recognition

Nguyen, Van-Toi; Nguyen, Tien-Nam; Le, Thi-Lan; Pham, Dinh-Tan; Vu, Hai

doi:10.1007/s11042-021-10866-4

Adaptive most joint selection and covariance descriptions for a robust skeleton-based human action recognition

Published: 25 May 2021

Volume 80, pages 27757–27783, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Van-Toi Nguyen^1,2,
Tien-Nam Nguyen^3,4,
Thi-Lan Le ORCID: orcid.org/0000-0001-9541-3905^3,5,
Dinh-Tan Pham^3,6 &
…
Hai Vu^3,5

508 Accesses
Explore all metrics

Abstract

In this paper, we propose two effective manners of utilizing skeleton data for human action recognition (HAR). The proposed method on one hand takes advantage of the skeleton data thanks to their robustness to human appearance change as well as the real-time performance. On the other hand, it avoids inherent drawbacks of the skeleton data such as noises, incorrect human skeleton estimation due to self-occlusion of human pose. To this end, in terms of feature designing, we propose to extract covariance descriptors from joint velocity and combine them with those of joint position. In terms of 3-D skeleton-based activity representation, we propose two schemes to select the most informative joints. The proposed method is evaluated on two benchmark datasets. On the MSRAction-3D dataset, the proposed method outperformed different hand-designed features-based methods. On the challenging dataset CMDFall, the proposed method significantly improves accuracy when compared with techniques based on recent neuronal networks. Finally, we investigate the robustness of the proposed method via a cross-dataset evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Skeleton-Based Activity Recognition: Preprocessing and Approaches

Global Regularizer and Temporal-Aware Cross-Entropy for Skeleton-Based Early Action Recognition

Effective human action recognition using global and local offsets of skeleton joints

Article 20 July 2018

References

Afsar P, Cortez P, Santos H (2015) Automatic visual detection of human behavior: A review from 2000 to 2014. Expert Syst Appl 42(20):6935–6956 . http://www.sciencedirect.com/science/article/pii/S0957417415003516
Article Google Scholar
Aggarwal J, Ryoo M (2011) Human activity analysis: A review. ACM Comput Surv 43(3)
Cao Z, Hidalgo Martinez G, Simon T, Wei S, Sheikh YA (2019) Openpose: Realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans Pattern Anal Machine Intell 1–1
Carbonera Luvizon D, Tabia H, Picard D (2017) Learning features combination for human action recognition from skeleton sequences. Pattern Recognit Lett 99:13–20. https://hal.archives-ouvertes.fr/hal-01515376
Article Google Scholar
Caruccio L, Deufemia V, Polese G (2016) Relaxed functional dependencies–A survey of approaches. IEEE Trans Knowl Data Eng 28(1):147–165. https://doi.org/10.1109/TKDE.2015.2472010
Article Google Scholar
Caruccio L, Deufemia V, Polese G (2020) Mining relaxed functional dependencies from data. Data Min Knowl Discov 34(2):443–477. https://doi.org/10.1007/s10618-019-00667-7
Article MathSciNet Google Scholar
Caruccio L, Polese G, Tortora G, Iannone D (2019) Edcar: A knowledge representation framework to enhance automatic video surveillance. Expert Syst Appl 131:190–207. http://www.sciencedirect.com/science/article/pii/S0957417419302623
Article Google Scholar
Chaquet JM, Carmona EJ, Fernández-Caballero A (2013) A survey of video datasets for human action and activity recognition. Comput Vision Image Understand 117(6):633–659. http://www.sciencedirect.com/science/article/pii/S1077314213000295
Article Google Scholar
Chen Y, Tian Y, He M (2020) Monocular human pose estimation: A survey of deep learning-based methods. Comput Vision Image Understand 192:102897. http://www.sciencedirect.com/science/article/pii/S1077314219301778
Article Google Scholar
Chuankun L, Wang P, Wang S, Hou Y, Li W (2017) Skeleton-based action recognition using LSTM and CNN. In: The proceeding of IEEE international conference on multimedia expo workshops (ICMEW) 2017, pp 585–590
Ding W, Liu K, XU B, Cheng F (2017) Skeleton-based human action recognition via screw matrices. Chin J Electron 26(4):790–796
Article Google Scholar
Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: The proceeding of IEEE conference on computer vision and pattern recognition (CVPR) 2015, pp 1110–1118
El-Ghaish H, Shoukry A, Hussein M (2018) Covp3DJ: Skeleton-parts-based-covariance descriptor for human action recognition. VISAPP. https://doi.org/10.5220/0006625703430350
Franco A, Magnani A, Maio D (2020) A multimodal approach for human activity recognition based on skeleton and RGB data. Pattern Recognit Lett
Han F, Reily B, Hoff W, Zhang H (2017) Space-time representation of people based on 3D skeletal data: A review. Comput Vis Image Underst 158:85–105
Article Google Scholar
Hoang VN, Le TL, Tran TH, Nguyen VT et al (2019) 3D skeleton-based action recognition with convolutional neural networks. In: The proceeding of international conference on multimedia analysis and pattern recognition (MAPR) 2019. IEEE, pp 1–6
Hussein M, Torki, Goawayyed, El-Saban (2013) Human action recognition using a temporal hierarchy of covariance descriptors on 3D joints locations. Int Joint Conf Artif Intell,
Hussein ME, Torki M, Gowayyed MA, El-Saban M (2013) Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations. In: The proceeding of twenty-third international joint conference on artificial intelligence
Jiang M, Kong J, Bebis G, Huo H (2015) Informative joints based human action recognition using skeleton contexts. Sig Proc Image Comm 33:29–40
Article Google Scholar
Johansson G (1973) Visual perception of biological motion and a model for its analysis. Percept Psycho 14(2):201–211
Article Google Scholar
Joshi KA, Thakore D (2012) A survey on moving object detection and tracking in video surveillance system. Int J Soft Comput Eng (IJSCE) 2
Kim TS, Reiter A (2017) Interpretable 3D human action analysis with temporal convolutional networks. arXiv:1704.04516
Lavee G, Rivlin E, Rudzsky M (2009) Understanding video events: A survey of methods for automatic interpretation of semantic occurrences in video. IEEE Trans Syst Man Cybern Part C (Appl Rev) 39(5):489–504
Article Google Scholar
Le Guilly M, Petit JM, Scuturici VM (2020) Evaluating Classification Feasibility Using Functional Dependencies. Springer, Berlin , pp 132–159. https://doi.org/10.1007/978-3-662-62271-1_5
Google Scholar
Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: Proceeding of IEEE computer society conference on computer vision and pattern recognition - workshops 2010, pp 9–14
Liu Y, Lu Z, Li J, Yang T (2019) Hierarchically learned view-invariant representations for cross-view action recognition. IEEE Trans Circ Syst Video Technol 29(8):2416–2430
Article Google Scholar
Lo Presti L, La Cascia M (2015) 3D skeleton-based human action classification: A survey. Pattern Recogn 53. https://doi.org/10.1016/j.patcog.2015.11.019
Nguyen T, Pham D, Le T, Vu H, Tran T (2018) Novel skeleton-based action recognition using covariance descriptors on most informative joints. In: The proceeding of 10th international conference on knowledge and systems engineering (KSE) 2018, pp 50–55
Ofli F, Chaudhry R, Kurillo G, Vidal R, Bajcsy R (2012) Sequence of the most informative joints (SMIJ): A new representation for human skeletal action recognition. In: The proceeding of IEEE computer society conference on computer vision and pattern recognition workshops 2012, pp 8–13
Park E, Han X, Berg TL, Berg AC (2016) Combining multiple sources of knowledge in deep CNNs for action recognition. In: The proceeding of IEEE winter conference on applications of computer vision (WACV) 2016, pp 1–8
Pham D, Nguyen T, Le T, Vu H (2019) Analyzing role of joint subset selection in human action recognition. In: The proceeding of 6th NAFOSTED conference on information and computer science (NICS) 2019, pp 61–66
Shahroudy A, Liu J, Ng T, Wang G (2016) NTU RGB+D: A large scale dataset for 3D human activity analysis. In: The proceeding of IEEE conference on computer vision and pattern recognition (CVPR) 2016, pp 1010–1019
Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp 12018–12027
Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A (2011) Real-time human pose recognition in parts from single depth images. In: The proceeding of CVPR 2011, pp 1297–1304
Si C, Jing Y, Wang W, Wang L, Tan T (2018) Skeleton-based action recognition with spatial reasoning and temporal stack learning. arXiv:1805.02335
Song YF, Zhang Z, Wang L (2019) Richly activated graph convolutional network for action recognition with incomplete skeletons. In: The proceeding of international conference on image processing (ICIP). IEEE
Tran TH, Le TL, Pham DT, Hoang VN, Khong VM, Tran QT, Nguyen TS, Pham C (2018) A multi-modal multi-view dataset for human fall analysis and preliminary investigation on modality. In: The proceeding of 24th international conference on pattern recognition (ICPR) 2018. IEEE, pp 1947–1952
Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Machine Learn Res 9:2579–2605. http://www.jmlr.org/papers/v9/vandermaaten08a.html
MATH Google Scholar
Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3D skeletons as points in a Lie group. In: The proceeding of IEEE conference on computer vision and pattern recognition 2014, pp 588–595
Vemulapalli R, Chellappa R (2016) Rolling rotations for recognizing human actions from 3D skeletal data. In: The proceeding of IEEE conference on computer vision and pattern recognition (CVPR) 2016, pp 4471–4479
Wang J, Liu Z, Wu Y, Yuan J (2012) Mining actionlet ensemble for action recognition with depth cameras. In: The proceeding of IEEE conference on computer vision and pattern recognition 2012, pp 1290–1297
Wang T, Qiao M, Deng Y, Zhou Y, Wang H, Lyu Q, Snoussi H (2018) Abnormal event detection based on analysis of movement information of video sequence. Optik 152:50–60. http://www.sciencedirect.com/science/article/pii/S0030402617308872
Article Google Scholar
Wang L, Zhang J, Zhou L, Tang C, Li W (2015) Beyond covariance: Feature representation with nonlinear kernel matrices. In: The proceeding of IEEE international conference on computer vision (ICCV) 2015, pp 4570–4578
Wu Q, Wang Z, Deng F, Xia Y, Kang W, Feng DD (2013) Discriminative two-level feature selection for realistic human action recognition. J Visual Commun Image Represent 24(7):1064–1074. https://doi.org/10.1016/j.jvcir.2013.07.001. https://www.sciencedirect.com/science/article/pii/S1047320313001338
Article Google Scholar
Xie C, Li C, Zhang B, Chen C, Han J, Zou C, Liu J (2018) Memory attention networks for skeleton-based action recognition. arXiv:1804.08254
Xu Z, Wang Y, Jiang J, Yao J, Li L (2020) Adaptive feature selection with reinforcement learning for skeleton-based action recognition. IEEE Access 8:213038–213051. https://doi.org/10.1109/ACCESS.2020.3038235
Article Google Scholar
Xue H, Liu Y, Cai D, He X (2016) Tracking people in RGBD videos using deep learning and motion clues. Neurocomputing 204:70–76. http://www.sciencedirect.com/science/article/pii/S0925231216300959. Big Learning in Social Media Analytics
Article Google Scholar
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: The proceeding of AAAI
Yao H, Hamilton H (2008) Mining functional dependencies from data. Data Min Knowl Discov 16:197–219. https://doi.org/10.1007/s10618-007-0083-9
Article MathSciNet Google Scholar
Yao G, Lei T, Zhong J (2019) A review of convolutional-neural-network-based action recognition. Pattern Recogn Lett 118:14–22
Article Google Scholar
Zhang J, Shum HPH, Han J, Shao L (2018) Action recognition from arbitrary views using transferable dictionary learning. IEEE Trans Image Process 27(10):4709–4723
Article MathSciNet Google Scholar
Zhang S, Wei Z, Nie J, Huang L, Wang S, Li Z (2017) A review on human activity recognition using vision-based method. J Healthc Eng 42(20):6935–6956
Google Scholar
Zhang HB, Zhang YX, Zhong B, Lei Q, Yang L, Du JX, Chen DS (2019) A comprehensive survey of vision-based human action recognition methods. In: Sensors
Zhang T, Zhangg Y, Cai J, Kot AC (2016) Efficient object feature selection for action recognition. In: 2016 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 2707–2711. https://doi.org/10.1109/ICASSP.2016.7472169

Download references

Acknowledgements

This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 102.01-2017.12.

Author information

Authors and Affiliations

PHENIKAA Research and Technology Institute (PRATI), AA Green Phoenix Group JSC, No.167 Hoang Ngan, Trung Hoa, Cau Giay, Hanoi, 11313, Vietnam
Van-Toi Nguyen
Faculty of Electrical and Electronic Engineering, PHENIKAA University, Yen Nghia, Ha Dong, Hanoi, 12116, Vietnam
Van-Toi Nguyen
Computer Vision Department, MICA International Research Institute, Hanoi University of Science and Technology, Hanoi, Vietnam
Tien-Nam Nguyen, Thi-Lan Le, Dinh-Tan Pham & Hai Vu
Laboratoire Informatique Image Interaction (L3i), La Rochelle University, La Rochelle, 17042, France
Tien-Nam Nguyen
School of Electronics and Telecommunications, Hanoi University of Science and Technology, Hanoi, Vietnam
Thi-Lan Le & Hai Vu
Faculty of Information Technology, Hanoi University of Mining and Geology, Hanoi, Vietnam
Dinh-Tan Pham

Authors

Van-Toi Nguyen
View author publications
You can also search for this author inPubMed Google Scholar
Tien-Nam Nguyen
View author publications
You can also search for this author inPubMed Google Scholar
Thi-Lan Le
View author publications
You can also search for this author inPubMed Google Scholar
Dinh-Tan Pham
View author publications
You can also search for this author inPubMed Google Scholar
Hai Vu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Thi-Lan Le.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nguyen, VT., Nguyen, TN., Le, TL. et al. Adaptive most joint selection and covariance descriptions for a robust skeleton-based human action recognition. Multimed Tools Appl 80, 27757–27783 (2021). https://doi.org/10.1007/s11042-021-10866-4

Download citation

Received: 23 April 2020
Revised: 10 March 2021
Accepted: 17 March 2021
Published: 25 May 2021
Issue Date: July 2021
DOI: https://doi.org/10.1007/s11042-021-10866-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive most joint selection and covariance descriptions for a robust skeleton-based human action recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Skeleton-Based Activity Recognition: Preprocessing and Approaches

Global Regularizer and Temporal-Aware Cross-Entropy for Skeleton-Based Early Action Recognition

Effective human action recognition using global and local offsets of skeleton joints

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now