Combining skeleton and accelerometer data for human fine-grained activity recognition and abnormal behaviour detection with deep temporal convolutional networks

Pham, Cuong; Nguyen, Linh; Nguyen, Anh; Nguyen, Ngon; Nguyen, Van-Toi

doi:10.1007/s11042-021-11058-w

Combining skeleton and accelerometer data for human fine-grained activity recognition and abnormal behaviour detection with deep temporal convolutional networks

Published: 15 June 2021

Volume 80, pages 28919–28940, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Cuong Pham¹,
Linh Nguyen^1,2,
Anh Nguyen¹,
Ngon Nguyen¹ &
…
Van-Toi Nguyen ORCID: orcid.org/0000-0001-6934-3923^3,4

972 Accesses
12 Citations
1 Altmetric
Explore all metrics

Abstract

Single sensing modality is widely adopted for human activity recognition (HAR) for decades and it has made a significant stride. However, it often suffers from challenges such as noises, obstacles, or dropped signals, which might negatively impact on the recognition performance. In this paper, we propose a multi-sensing modality framework for human fine-grained activity recognition and abnormal behaviour detection by combining skeleton and acceleration data at feature level (so-called feature-level fusion). Firstly, deep temporal convolutional networks (TCN), consisting of the dilated causal convolution components, are utilized for feature learning and handling temporal properties. The feature map learnt and represented with convolutional layers in TCN is fed into two fully connected layers for the prediction. Secondly, we conduct an empirical experiment to verify our proposed method. Experimental results have shown that the proposed method could achieve 83% F1-score and surpassed several single modality models as well as early and late fusion methods on the Continuous Multimodal Multi-view Dataset of Human Fall Dataset (CMDFALL), comprised of 20 fine-grained normal and abnormal activities collected from 50 subjects. Moreover, our proposed architecture achieves 96.98% accuracy on the UTD-MHAD dataset, which has 8 subjects and 27 activities. These results indicate the effectiveness of our proposed method for the classification of human fine-grained normal and abnormal activities as well as the potential for HAR-based situated service applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human Activity Recognition (HAR) Using Deep Learning: Review, Methodologies, Progress and Future Research Directions

Article 12 August 2023

Human activity recognition in artificial intelligence framework: a narrative review

Article 18 January 2022

Human action recognition using fusion of multiview and deep features: an application to video surveillance

Article 14 March 2020

Notes

The code with a brief guide is available at: https://github.com/nda97531/tcnfusion, and CMDFALL dataset is at: https://www.mica.edu.vn/perso/Tran-Thi-Thanh-Hai/CMDFALL.html.
https://github.com/nda97531/imran2019

References

Aguileta AA, Brena RF, Mayora O, Molino-Minero-Re E, Trejo LA (2019) Multi-sensor fusion for activity recognition - A survey. Sensors 19 (17):3808
Article Google Scholar
Ahmad Z, Khan N (2019) Human action recognition using deep multilevel multimodal (m2) fusion of depth and inertial sensors. IEEE Sensors J
Attal F, Mohammed S, Dedabrishvili M, Chamroukhi F, Oukhellou L, Amirat Y (2015) Physical human activity recognition using wearable sensors. Sensors 15(12):31314–31338
Article Google Scholar
Bai S, Kolter J, Koltun V (2018) An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. 03
Chen C, Jafari R, Kehtarnavaz N (2015) Utd-mhad: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: 2015 IEEE International conference on image processing (ICIP). IEEE, pp 168–172
Chen C, Rosa S, Miao Y, Lu CX, Wu W, Markham A, Trigoni N (2019) Selective sensor fusion for neural visual-inertial odometry. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 10542–10551
Cippitelli E, Gasparrini S, Gambi E, Spinsante S (2016) A human activity recognition system using skeleton data from rgbd sensors. Computational intelligence and neuroscience 2016
Dawar N, Kehtarnavaz N (2018) Action detection and recognition in continuous action streams by deep learning-based sensing fusion. IEEE Sensors J 18 (23):9660–9668
Article Google Scholar
Du Y, Fu Y, Wang L (2015) Skeleton based action recognition with convolutional neural network. In: 2015 3rd IAPR asian conference on pattern recognition (ACPR). IEEE, pp 579–583
Gao Y, Long Y, Guan Y, Basu A, Baggaley J, Ploetz T (2019) Towards reliable, automated general movement assessment for perinatal stroke screening in infants using wearable accelerometers. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol 3(1):12:1–12:22
Google Scholar
Guan Y, Plötz T (2017) Ensembles of deep lstm learners for activity recognition using wearables. Proc ACM Interact Mob Wearable Ubiquitous Technol 1 (2):1–28
Article Google Scholar
Hoang V, Le T, Tran T, Hai-vu, Nguyen V (2019) 3d skeleton-based action recognition with convolutional neural networks. In: 2019 international conference on multimedia analysis and pattern recognition (MAPR), pp 1–6
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Hoey J, Plötz T, Jackson D, Monk A, Pham C, Olivier P (2011) Rapid specification and automated generation of prompting systems to assist people with dementia. Pervasive Mob Comput 7(3):299–318
Article Google Scholar
Ignatov A (2018) Real-time human activity recognition from accelerometer data using convolutional neural networks. Appl Soft Comput 62:915–922
Article Google Scholar
Imran J, Raman B (2019) Evaluating fusion of rgb-d and inertial sensors for multimodal human action recognition. J Ambient Intell Human Comput 1–20
Jang E, Gu S, Poole B (2016) Categorical reparameterization with gumbel-softmax, arXiv:1611.01144
Kalman RE (1960) A new approach to linear filtering and prediction problems. Trans ASME–J Basic Eng 82(Series D):35–45
Article MathSciNet Google Scholar
Khan A, Mellor S, Berlin E, Thompson R, McNaney R, Olivier P, Plötz T (2015) Beyond activity recognition: Skill assessment from accelerometer data. In: Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, UbiComp ’15. ACM, pp 1155–1166
Lea C, Flynn M, Vidal R, Reiter A, Hager G (2017) Temporal convolutional networks for action segmentation and detection. 1003–1012, 07
Liang C, Liu D, Qi L, Guan L (2020) Multi-modal human action recognition with sub-action exploiting and class-privacy preserved collaborative representation learning. IEEE Access 8:39920–39933
Article Google Scholar
Liu K, Chen C, Jafari R, Kehtarnavaz N (2014) Fusion of inertial and depth sensor data for robust hand gesture recognition. IEEE Sensors J 14 (6):1898–1903
Article Google Scholar
Liu L, Jiang H, He P, Chen W, Liu X, Gao J, Han J (2019) On the variance of the adaptive learning rate and beyond. arXiv:1908.03265
Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal lstm with trust gates for 3d human action recognition. In: European conference on computer vision. Springer, pp 816–833
Luo F, Poslad S, Bodanese E (2020) Temporal convolutional networks for multiperson activity recognition using a 2-d lidar. IEEE Int Things J 7:7432–7442
Article Google Scholar
Maddison CJ, Mnih A, Teh YW (2016) The concrete distribution: A continuous relaxation of discrete random variables. arXiv:1611.00712
Mannini A, Intille SS (2019) Classifier personalization for activity recognition using wrist accelerometers. IEEE J Biomed Health Inform 23(4):1585–1594
Article Google Scholar
Memmesheimer R, Theisen N, Paulus D (2020) Gimme Signals: Discriminative signal encoding for multimodal activity recognition. arXiv:2003.06156
Münzner S, Schmidt P, Reiss A, Hanselmann M, Stiefelhagen R, Dürichen R (2017) Cnn-based sensor fusion techniques for multimodal human activity recognition. In: Proceedings of the 2017 ACM international symposium on wearable computers, pp 158–165
Nguyen T, Pham D, Le T, Vu H, Tran T (2018) Novel skeleton-based action recognition using covariance descriptors on most informative joints. In: 2018 10th international conference on knowledge and systems engineering (KSE), pp 50–55
Ordóñez F, Roggen D (2016) Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition. Sensors 16(1):115
Article Google Scholar
Pavllo D, Feichtenhofer C, Grangier D, Auli M (2019) 3d human pose estimation in video with temporal convolutions and semi-supervised training. In: The IEEE conference on computer vision and pattern recognition (CVPR)
Pham C, Diep NN, Phuong TM (2017) E-shoes: Smart shoes for unobtrusive human activity recognition. In: 9th International Conference on Knowledge and Systems Engineering, KSE 2017, Hue, Vietnam October 19-21, 2017, pp 269–274
Tran T, Le T, Pham D, Hoang V, Khong V, Tran Q, Nguyen T, Pham C (2018) A multi-modal multi-view dataset for human fall analysis and preliminary investigation on modality. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp 1947–1952
Um TT, Pfister FMJ, Pichler D, Endo S, Lang M, Hirche S, Fietzek U, Kulić D (2017) Data augmentation of wearable sensor data for parkinson’s disease monitoring using convolutional neural networks. In: Proceedings of the 19th ACM international conference on multimodal interaction, ICMI 2017. ACM, New York, pp 216–220
Wu Q, Wang Z, Deng F, Chi Z, Feng DD (2013) Realistic human action recognition with multimodal feature selection and fusion. IEEE Trans Syst Man Cybern Syst 43(4):875–885
Article Google Scholar

Download references

Acknowledgements

This research was funded by the Vietnam Ministry of Education and Training under grant number CT2020.02.BKA.02.

Author information

Authors and Affiliations

Posts and Telecommunications Institute of Technology, Ha Noi, Vietnam
Cuong Pham, Linh Nguyen, Anh Nguyen & Ngon Nguyen
Thai Nguyen College of Economics and Finance, Thai Nguyen, Vietnam
Linh Nguyen
PHENIKAA Research and Technology Institute (PRATI), AA Green Phoenix Group JSC, No.167 Hoang Ngan, Trung Hoa, Cau Giay, Hanoi, 11313, Vietnam
Van-Toi Nguyen
Faculty of Electrical and Electronic Engineering, PHENIKAA University, Yen Nghia, Ha Dong, Hanoi, 12116, Vietnam
Van-Toi Nguyen

Authors

Cuong Pham
View author publications
You can also search for this author in PubMed Google Scholar
Linh Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Anh Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Ngon Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Van-Toi Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Van-Toi Nguyen.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pham, C., Nguyen, L., Nguyen, A. et al. Combining skeleton and accelerometer data for human fine-grained activity recognition and abnormal behaviour detection with deep temporal convolutional networks. Multimed Tools Appl 80, 28919–28940 (2021). https://doi.org/10.1007/s11042-021-11058-w

Download citation

Received: 04 March 2020
Revised: 31 January 2021
Accepted: 05 May 2021
Published: 15 June 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s11042-021-11058-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combining skeleton and accelerometer data for human fine-grained activity recognition and abnormal behaviour detection with deep temporal convolutional networks

Abstract

Access this article

Similar content being viewed by others

Human Activity Recognition (HAR) Using Deep Learning: Review, Methodologies, Progress and Future Research Directions

Human activity recognition in artificial intelligence framework: a narrative review

Human action recognition using fusion of multiview and deep features: an application to video surveillance

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Combining skeleton and accelerometer data for human fine-grained activity recognition and abnormal behaviour detection with deep temporal convolutional networks

Abstract

Access this article

Similar content being viewed by others

Human Activity Recognition (HAR) Using Deep Learning: Review, Methodologies, Progress and Future Research Directions

Human activity recognition in artificial intelligence framework: a narrative review

Human action recognition using fusion of multiview and deep features: an application to video surveillance

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation