Abstract
Single sensing modality is widely adopted for human activity recognition (HAR) for decades and it has made a significant stride. However, it often suffers from challenges such as noises, obstacles, or dropped signals, which might negatively impact on the recognition performance. In this paper, we propose a multi-sensing modality framework for human fine-grained activity recognition and abnormal behaviour detection by combining skeleton and acceleration data at feature level (so-called feature-level fusion). Firstly, deep temporal convolutional networks (TCN), consisting of the dilated causal convolution components, are utilized for feature learning and handling temporal properties. The feature map learnt and represented with convolutional layers in TCN is fed into two fully connected layers for the prediction. Secondly, we conduct an empirical experiment to verify our proposed method. Experimental results have shown that the proposed method could achieve 83% F1-score and surpassed several single modality models as well as early and late fusion methods on the Continuous Multimodal Multi-view Dataset of Human Fall Dataset (CMDFALL), comprised of 20 fine-grained normal and abnormal activities collected from 50 subjects. Moreover, our proposed architecture achieves 96.98% accuracy on the UTD-MHAD dataset, which has 8 subjects and 27 activities. These results indicate the effectiveness of our proposed method for the classification of human fine-grained normal and abnormal activities as well as the potential for HAR-based situated service applications.
Similar content being viewed by others
Notes
The code with a brief guide is available at: https://github.com/nda97531/tcnfusion, and CMDFALL dataset is at: https://www.mica.edu.vn/perso/Tran-Thi-Thanh-Hai/CMDFALL.html.
References
Aguileta AA, Brena RF, Mayora O, Molino-Minero-Re E, Trejo LA (2019) Multi-sensor fusion for activity recognition - A survey. Sensors 19 (17):3808
Ahmad Z, Khan N (2019) Human action recognition using deep multilevel multimodal (m2) fusion of depth and inertial sensors. IEEE Sensors J
Attal F, Mohammed S, Dedabrishvili M, Chamroukhi F, Oukhellou L, Amirat Y (2015) Physical human activity recognition using wearable sensors. Sensors 15(12):31314–31338
Bai S, Kolter J, Koltun V (2018) An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. 03
Chen C, Jafari R, Kehtarnavaz N (2015) Utd-mhad: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: 2015 IEEE International conference on image processing (ICIP). IEEE, pp 168–172
Chen C, Rosa S, Miao Y, Lu CX, Wu W, Markham A, Trigoni N (2019) Selective sensor fusion for neural visual-inertial odometry. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 10542–10551
Cippitelli E, Gasparrini S, Gambi E, Spinsante S (2016) A human activity recognition system using skeleton data from rgbd sensors. Computational intelligence and neuroscience 2016
Dawar N, Kehtarnavaz N (2018) Action detection and recognition in continuous action streams by deep learning-based sensing fusion. IEEE Sensors J 18 (23):9660–9668
Du Y, Fu Y, Wang L (2015) Skeleton based action recognition with convolutional neural network. In: 2015 3rd IAPR asian conference on pattern recognition (ACPR). IEEE, pp 579–583
Gao Y, Long Y, Guan Y, Basu A, Baggaley J, Ploetz T (2019) Towards reliable, automated general movement assessment for perinatal stroke screening in infants using wearable accelerometers. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol 3(1):12:1–12:22
Guan Y, Plötz T (2017) Ensembles of deep lstm learners for activity recognition using wearables. Proc ACM Interact Mob Wearable Ubiquitous Technol 1 (2):1–28
Hoang V, Le T, Tran T, Hai-vu, Nguyen V (2019) 3d skeleton-based action recognition with convolutional neural networks. In: 2019 international conference on multimedia analysis and pattern recognition (MAPR), pp 1–6
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Hoey J, Plötz T, Jackson D, Monk A, Pham C, Olivier P (2011) Rapid specification and automated generation of prompting systems to assist people with dementia. Pervasive Mob Comput 7(3):299–318
Ignatov A (2018) Real-time human activity recognition from accelerometer data using convolutional neural networks. Appl Soft Comput 62:915–922
Imran J, Raman B (2019) Evaluating fusion of rgb-d and inertial sensors for multimodal human action recognition. J Ambient Intell Human Comput 1–20
Jang E, Gu S, Poole B (2016) Categorical reparameterization with gumbel-softmax, arXiv:1611.01144
Kalman RE (1960) A new approach to linear filtering and prediction problems. Trans ASME–J Basic Eng 82(Series D):35–45
Khan A, Mellor S, Berlin E, Thompson R, McNaney R, Olivier P, Plötz T (2015) Beyond activity recognition: Skill assessment from accelerometer data. In: Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, UbiComp ’15. ACM, pp 1155–1166
Lea C, Flynn M, Vidal R, Reiter A, Hager G (2017) Temporal convolutional networks for action segmentation and detection. 1003–1012, 07
Liang C, Liu D, Qi L, Guan L (2020) Multi-modal human action recognition with sub-action exploiting and class-privacy preserved collaborative representation learning. IEEE Access 8:39920–39933
Liu K, Chen C, Jafari R, Kehtarnavaz N (2014) Fusion of inertial and depth sensor data for robust hand gesture recognition. IEEE Sensors J 14 (6):1898–1903
Liu L, Jiang H, He P, Chen W, Liu X, Gao J, Han J (2019) On the variance of the adaptive learning rate and beyond. arXiv:1908.03265
Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal lstm with trust gates for 3d human action recognition. In: European conference on computer vision. Springer, pp 816–833
Luo F, Poslad S, Bodanese E (2020) Temporal convolutional networks for multiperson activity recognition using a 2-d lidar. IEEE Int Things J 7:7432–7442
Maddison CJ, Mnih A, Teh YW (2016) The concrete distribution: A continuous relaxation of discrete random variables. arXiv:1611.00712
Mannini A, Intille SS (2019) Classifier personalization for activity recognition using wrist accelerometers. IEEE J Biomed Health Inform 23(4):1585–1594
Memmesheimer R, Theisen N, Paulus D (2020) Gimme Signals: Discriminative signal encoding for multimodal activity recognition. arXiv:2003.06156
Münzner S, Schmidt P, Reiss A, Hanselmann M, Stiefelhagen R, Dürichen R (2017) Cnn-based sensor fusion techniques for multimodal human activity recognition. In: Proceedings of the 2017 ACM international symposium on wearable computers, pp 158–165
Nguyen T, Pham D, Le T, Vu H, Tran T (2018) Novel skeleton-based action recognition using covariance descriptors on most informative joints. In: 2018 10th international conference on knowledge and systems engineering (KSE), pp 50–55
Ordóñez F, Roggen D (2016) Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition. Sensors 16(1):115
Pavllo D, Feichtenhofer C, Grangier D, Auli M (2019) 3d human pose estimation in video with temporal convolutions and semi-supervised training. In: The IEEE conference on computer vision and pattern recognition (CVPR)
Pham C, Diep NN, Phuong TM (2017) E-shoes: Smart shoes for unobtrusive human activity recognition. In: 9th International Conference on Knowledge and Systems Engineering, KSE 2017, Hue, Vietnam October 19-21, 2017, pp 269–274
Tran T, Le T, Pham D, Hoang V, Khong V, Tran Q, Nguyen T, Pham C (2018) A multi-modal multi-view dataset for human fall analysis and preliminary investigation on modality. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp 1947–1952
Um TT, Pfister FMJ, Pichler D, Endo S, Lang M, Hirche S, Fietzek U, Kulić D (2017) Data augmentation of wearable sensor data for parkinson’s disease monitoring using convolutional neural networks. In: Proceedings of the 19th ACM international conference on multimodal interaction, ICMI 2017. ACM, New York, pp 216–220
Wu Q, Wang Z, Deng F, Chi Z, Feng DD (2013) Realistic human action recognition with multimodal feature selection and fusion. IEEE Trans Syst Man Cybern Syst 43(4):875–885
Acknowledgements
This research was funded by the Vietnam Ministry of Education and Training under grant number CT2020.02.BKA.02.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pham, C., Nguyen, L., Nguyen, A. et al. Combining skeleton and accelerometer data for human fine-grained activity recognition and abnormal behaviour detection with deep temporal convolutional networks. Multimed Tools Appl 80, 28919–28940 (2021). https://doi.org/10.1007/s11042-021-11058-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11058-w