Skip to main content
Log in

Combining skeleton and accelerometer data for human fine-grained activity recognition and abnormal behaviour detection with deep temporal convolutional networks

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Single sensing modality is widely adopted for human activity recognition (HAR) for decades and it has made a significant stride. However, it often suffers from challenges such as noises, obstacles, or dropped signals, which might negatively impact on the recognition performance. In this paper, we propose a multi-sensing modality framework for human fine-grained activity recognition and abnormal behaviour detection by combining skeleton and acceleration data at feature level (so-called feature-level fusion). Firstly, deep temporal convolutional networks (TCN), consisting of the dilated causal convolution components, are utilized for feature learning and handling temporal properties. The feature map learnt and represented with convolutional layers in TCN is fed into two fully connected layers for the prediction. Secondly, we conduct an empirical experiment to verify our proposed method. Experimental results have shown that the proposed method could achieve 83% F1-score and surpassed several single modality models as well as early and late fusion methods on the Continuous Multimodal Multi-view Dataset of Human Fall Dataset (CMDFALL), comprised of 20 fine-grained normal and abnormal activities collected from 50 subjects. Moreover, our proposed architecture achieves 96.98% accuracy on the UTD-MHAD dataset, which has 8 subjects and 27 activities. These results indicate the effectiveness of our proposed method for the classification of human fine-grained normal and abnormal activities as well as the potential for HAR-based situated service applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. The code with a brief guide is available at: https://github.com/nda97531/tcnfusion, and CMDFALL dataset is at: https://www.mica.edu.vn/perso/Tran-Thi-Thanh-Hai/CMDFALL.html.

  2. https://github.com/nda97531/imran2019

References

  1. Aguileta AA, Brena RF, Mayora O, Molino-Minero-Re E, Trejo LA (2019) Multi-sensor fusion for activity recognition - A survey. Sensors 19 (17):3808

    Article  Google Scholar 

  2. Ahmad Z, Khan N (2019) Human action recognition using deep multilevel multimodal (m2) fusion of depth and inertial sensors. IEEE Sensors J

  3. Attal F, Mohammed S, Dedabrishvili M, Chamroukhi F, Oukhellou L, Amirat Y (2015) Physical human activity recognition using wearable sensors. Sensors 15(12):31314–31338

    Article  Google Scholar 

  4. Bai S, Kolter J, Koltun V (2018) An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. 03

  5. Chen C, Jafari R, Kehtarnavaz N (2015) Utd-mhad: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: 2015 IEEE International conference on image processing (ICIP). IEEE, pp 168–172

  6. Chen C, Rosa S, Miao Y, Lu CX, Wu W, Markham A, Trigoni N (2019) Selective sensor fusion for neural visual-inertial odometry. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 10542–10551

  7. Cippitelli E, Gasparrini S, Gambi E, Spinsante S (2016) A human activity recognition system using skeleton data from rgbd sensors. Computational intelligence and neuroscience 2016

  8. Dawar N, Kehtarnavaz N (2018) Action detection and recognition in continuous action streams by deep learning-based sensing fusion. IEEE Sensors J 18 (23):9660–9668

    Article  Google Scholar 

  9. Du Y, Fu Y, Wang L (2015) Skeleton based action recognition with convolutional neural network. In: 2015 3rd IAPR asian conference on pattern recognition (ACPR). IEEE, pp 579–583

  10. Gao Y, Long Y, Guan Y, Basu A, Baggaley J, Ploetz T (2019) Towards reliable, automated general movement assessment for perinatal stroke screening in infants using wearable accelerometers. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol 3(1):12:1–12:22

    Google Scholar 

  11. Guan Y, Plötz T (2017) Ensembles of deep lstm learners for activity recognition using wearables. Proc ACM Interact Mob Wearable Ubiquitous Technol 1 (2):1–28

    Article  Google Scholar 

  12. Hoang V, Le T, Tran T, Hai-vu, Nguyen V (2019) 3d skeleton-based action recognition with convolutional neural networks. In: 2019 international conference on multimedia analysis and pattern recognition (MAPR), pp 1–6

  13. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  14. Hoey J, Plötz T, Jackson D, Monk A, Pham C, Olivier P (2011) Rapid specification and automated generation of prompting systems to assist people with dementia. Pervasive Mob Comput 7(3):299–318

    Article  Google Scholar 

  15. Ignatov A (2018) Real-time human activity recognition from accelerometer data using convolutional neural networks. Appl Soft Comput 62:915–922

    Article  Google Scholar 

  16. Imran J, Raman B (2019) Evaluating fusion of rgb-d and inertial sensors for multimodal human action recognition. J Ambient Intell Human Comput 1–20

  17. Jang E, Gu S, Poole B (2016) Categorical reparameterization with gumbel-softmax, arXiv:1611.01144

  18. Kalman RE (1960) A new approach to linear filtering and prediction problems. Trans ASME–J Basic Eng 82(Series D):35–45

    Article  MathSciNet  Google Scholar 

  19. Khan A, Mellor S, Berlin E, Thompson R, McNaney R, Olivier P, Plötz T (2015) Beyond activity recognition: Skill assessment from accelerometer data. In: Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, UbiComp ’15. ACM, pp 1155–1166

  20. Lea C, Flynn M, Vidal R, Reiter A, Hager G (2017) Temporal convolutional networks for action segmentation and detection. 1003–1012, 07

  21. Liang C, Liu D, Qi L, Guan L (2020) Multi-modal human action recognition with sub-action exploiting and class-privacy preserved collaborative representation learning. IEEE Access 8:39920–39933

    Article  Google Scholar 

  22. Liu K, Chen C, Jafari R, Kehtarnavaz N (2014) Fusion of inertial and depth sensor data for robust hand gesture recognition. IEEE Sensors J 14 (6):1898–1903

    Article  Google Scholar 

  23. Liu L, Jiang H, He P, Chen W, Liu X, Gao J, Han J (2019) On the variance of the adaptive learning rate and beyond. arXiv:1908.03265

  24. Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal lstm with trust gates for 3d human action recognition. In: European conference on computer vision. Springer, pp 816–833

  25. Luo F, Poslad S, Bodanese E (2020) Temporal convolutional networks for multiperson activity recognition using a 2-d lidar. IEEE Int Things J 7:7432–7442

    Article  Google Scholar 

  26. Maddison CJ, Mnih A, Teh YW (2016) The concrete distribution: A continuous relaxation of discrete random variables. arXiv:1611.00712

  27. Mannini A, Intille SS (2019) Classifier personalization for activity recognition using wrist accelerometers. IEEE J Biomed Health Inform 23(4):1585–1594

    Article  Google Scholar 

  28. Memmesheimer R, Theisen N, Paulus D (2020) Gimme Signals: Discriminative signal encoding for multimodal activity recognition. arXiv:2003.06156

  29. Münzner S, Schmidt P, Reiss A, Hanselmann M, Stiefelhagen R, Dürichen R (2017) Cnn-based sensor fusion techniques for multimodal human activity recognition. In: Proceedings of the 2017 ACM international symposium on wearable computers, pp 158–165

  30. Nguyen T, Pham D, Le T, Vu H, Tran T (2018) Novel skeleton-based action recognition using covariance descriptors on most informative joints. In: 2018 10th international conference on knowledge and systems engineering (KSE), pp 50–55

  31. Ordóñez F, Roggen D (2016) Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition. Sensors 16(1):115

    Article  Google Scholar 

  32. Pavllo D, Feichtenhofer C, Grangier D, Auli M (2019) 3d human pose estimation in video with temporal convolutions and semi-supervised training. In: The IEEE conference on computer vision and pattern recognition (CVPR)

  33. Pham C, Diep NN, Phuong TM (2017) E-shoes: Smart shoes for unobtrusive human activity recognition. In: 9th International Conference on Knowledge and Systems Engineering, KSE 2017, Hue, Vietnam October 19-21, 2017, pp 269–274

  34. Tran T, Le T, Pham D, Hoang V, Khong V, Tran Q, Nguyen T, Pham C (2018) A multi-modal multi-view dataset for human fall analysis and preliminary investigation on modality. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp 1947–1952

  35. Um TT, Pfister FMJ, Pichler D, Endo S, Lang M, Hirche S, Fietzek U, Kulić D (2017) Data augmentation of wearable sensor data for parkinson’s disease monitoring using convolutional neural networks. In: Proceedings of the 19th ACM international conference on multimodal interaction, ICMI 2017. ACM, New York, pp 216–220

  36. Wu Q, Wang Z, Deng F, Chi Z, Feng DD (2013) Realistic human action recognition with multimodal feature selection and fusion. IEEE Trans Syst Man Cybern Syst 43(4):875–885

    Article  Google Scholar 

Download references

Acknowledgements

This research was funded by the Vietnam Ministry of Education and Training under grant number CT2020.02.BKA.02.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Van-Toi Nguyen.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pham, C., Nguyen, L., Nguyen, A. et al. Combining skeleton and accelerometer data for human fine-grained activity recognition and abnormal behaviour detection with deep temporal convolutional networks. Multimed Tools Appl 80, 28919–28940 (2021). https://doi.org/10.1007/s11042-021-11058-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11058-w

Keywords

Navigation