Skip to main content
Log in

Human skeleton pose and spatio-temporal feature-based activity recognition using ST-GCN

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Skeleton-based Human Activity Recognition has recently sparked a lot of attention because skeleton data has proven resistant to changes in lighting, body sizes, dynamic camera perspectives, and complicated backgrounds. The Spatial-Temporal Graph Convolutional Networks (ST-GCN) model has been exposed to study spatial and temporal dependencies effectively from skeleton data. However, efficient use of 3D skeleton in-depth information remains a significant challenge, specifically for human joint motion patterns and linkages information. This study attempts a promising solution through a custom ST-GCN model and skeleton joints for human activity recognition. Special attention was given to spatial & temporal features, which were further fed to the classification model for better pose estimation. A comparative study is presented for activity recognition using large-scale databases such as NTU-RGB-D, Kinetics-Skeleton, and Florence 3D datasets. The Custom ST-GCN model outperforms (Top-1 accuracy) the state-of-the-art method on NTU-RGB-D, Kinetics-Skeleton & Florence 3D dataset with a higher margin by 0.7%, 1.25%, and 1.92%, respectively. Similarly, with Top-5 accuracy, the Custom ST-GCN model offers results hike by 0.5%, 0.73% & 1.52%, respectively. It shows that the presented graph-based topologies capture the changing aspects of a motion-based skeleton sequence better than some of the other approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data availability

The data that support the findings of this study are openly available and cited/reference in text.

References

  1. Agahian S, Negin F, Köse C (2020) An efficient human action recognition framework with pose-based spatiotemporal features. Engineering Science and Technology, an International Journal 23(1):196–203

    Article  Google Scholar 

  2. Al-Janabi S, Salman AH (2021) Sensitive integration of multilevel optimization model in human activity recognition for smartphone and smartwatch applications. Big Data Mining and Analytics 4(2):124–138

    Article  Google Scholar 

  3. Avola D, Cinque L, Foresti GL, Marini MR (2019) An interactive and low-cost full body rehabilitation framework based on 3D immersive serious games. J Biomed Inform 89:81–100

    Article  Google Scholar 

  4. Cao X, Kudo W, Ito C, Shuzo M, Maeda E (2019) Activity recognition using ST-GCN with 3D motion data. In: Adjunct Proceedings of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers, pp 689–692

  5. Cao Z, Simon T, Wei SE, Sheikh Y (2017) Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7291–7299

  6. Chen X, Koskela M (2015) Skeleton-based action recognition with extreme learning machines. Neurocomputing 149:387–396

    Article  Google Scholar 

  7. Chunduru V, Roy M, Chittawadigi RG et al (2021) Hand tracking in 3D space using mediapipe and PNP method for intuitive control of virtual globe. In: 2021 IEEE 9th Region 10 Humanitarian Technology Conference (R10-HTC), pp 1–6. IEEE

  8. Devanne M, Wannous H, Pala P, Berretti S, Daoudi M, Del Bimbo A (2015) Combined shape analysis of human poses and motion units for action segmentation and recognition. In: 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), vol. 7, pp 1–6. IEEE

  9. Dhiman C, Vishwakarma DK (2020) View-invariant deep architecture for human action recognition using two-stream motion and shape temporal dynamics. IEEE Trans Image Process 29:3835–3844

    Article  Google Scholar 

  10. Dhiman C, Vishwakarma DK, Agarwal P (2021) Part-wise spatio-temporal attention driven CNN-based 3d human action recognition. ACM Trans Multimed Comput Commun Appl 17(3):1–24

    Article  Google Scholar 

  11. Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1110–1118

  12. Fernando B, Gavves E, Oramas JM, Ghodrati A, Tuytelaars T (2015) Modeling video evolution for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5378–5387

  13. Guan Y, Plötz T (2017) Ensembles of deep LSTM learners for activity recognition using wearables. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1(2):1–28

    Article  Google Scholar 

  14. Hbali Y, Hbali S, Ballihi L, Sadgal M (2018) Skeleton-based human activity recognition for elderly monitoring systems. IET Comput Vision 12(1):16–26

    Article  Google Scholar 

  15. Kay W, Carreira J, Simonyan K, Zhang B, Hillier C, Vijayanarasimhan S, Viola F, Green T, Back T, Natsev P et al (2017) The kinetics human action video dataset. arXiv preprint arXiv:1705.06950

  16. Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2017) A new representation of skeleton sequences for 3d action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3288–3297

  17. Kim TS, Reiter A (2017) Interpretable 3D human action analysis with temporal convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 20–28

  18. Lea C, Flynn MD, Vidal R, Reiter A, Hager GD (2017) Temporal convolutional networks for action segmentation and detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 156–165

  19. Li C, Cui Z, Zheng W, Xu C, Yang J (2018) Spatio-temporal graph convolution for skeleton based action recognition. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence

  20. Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal LSTM with trust gates for 3D human action recognition. In: European Conference on Computer Vision, pp 816–833. Springer

  21. Mukherjee S, Anvitha L, Lahari TM (2020) Human activity recognition in RGB-D videos by dynamic images. Multimedia Tools and Applications 79(27):19787–19801

    Article  Google Scholar 

  22. Nie F, Wang X, Jordan M, Huang H (2016) The constrained Laplacian rank algorithm for graph-based clustering. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30

  23. Patel AS, Vyas R, Vyas OP, Ojha M, Tiwari V (2022) Motion-compensated online object tracking for activity detection and crowd behavior analysis. In: The Visual Computer, pp 1–21

  24. Patnaik SK, Babu CN (2021) Bhave M (2021) Intelligent and adaptive web data extraction system using convolutional and long short-term memory deep learning networks. Big Data Mining and Analytics 4(4):279–297

    Article  Google Scholar 

  25. Pawar K, Jalem RS, Tiwari V (2019) Stock market price prediction using LSTM RNN. In: Emerging Trends in Expert Applications and Security: Proceedings of ICETEAS 2018, pp 493–503. Springer

  26. Seidenari L, Varano V, Berretti S, Del Bimbo A, Pala P (2013) Weakly aligned multi-part bag-of-poses for action recognition from depth cameras. In: International Conference on Image Analysis and Processing, pp 446–455. Springer

  27. Setiawan F, Yahya BN, Chun SJ, Lee SL (2022) Sequential inter-hop graph convolution neural network (SIHGCN) for skeleton-based human action recognition. Expert Syst Appl 195:116566

    Article  Google Scholar 

  28. Shahroudy A, Liu J, Ng TT, Wang G (2016) Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1010–1019

  29. Singh T, Vishwakarma DK (2021) A deeply coupled convnet for human activity recognition using dynamic and rgb images. Neural Comput Appl 33:469–485

    Article  Google Scholar 

  30. Snoun A, Jlidi N, Bouchrika T, Jemai O, Zaied M (2021) Towards a deep human activity recognition approach based on video to image transformation with skeleton data. Multimedia Tools and Applications 80(19):29675–29698

    Article  Google Scholar 

  31. Song S, Lan C, Xing J, Zeng W, Liu J (2017) An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31

  32. Tania S, Rowaida R (2016) A comparative study of various image filtering techniques for removing various noisy pixels in aerial image. International Journal of Signal Processing, Image Processing and Pattern Recognition 9(3):113–124

    Article  Google Scholar 

  33. Veeriah V, Zhuang N, Qi GJ (2015) Differential recurrent neural networks for action recognition. In: Proceedings of the IEEE international conference on computer vision, pp 4041–4049

  34. Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3D skeletons as points in a lie group. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 588–595

  35. Vishwakarma DK, Kapoor R (2012) Simple and intelligent system to recognize the expression of speech-disabled person. In: 2012 4th International Conference on Intelligent Human Computer Interaction (IHCI), pp 1–6. IEEE

  36. Vishwakarma DK, Dhiman C (2019) A unified model for human activity recognition using spatial distribution of gradients and difference of gaussian kernel. Vis Comput 35(11):1595–1613

    Article  Google Scholar 

  37. Vishwakarma DK, Jain K (2022) Three-dimensional human activity recognition by forming a movement polygon using posture skeletal data from depth sensor. ETRI J 44(2):286–299

    Article  Google Scholar 

  38. Vishwakarma DK, Kapoor R (2015) Integrated approach for human action recognition using edge spatial distribution, direction pixel and-transform. Adv Robot 29(23):1553–1562

    Article  Google Scholar 

  39. Vishwakarma DK, Kapoor R (2017) An efficient interpretation of hand gestures to control smart interactive television. International Journal of Computational Vision and Robotics 7(4):454–471

    Article  Google Scholar 

  40. Vishwakarma DK, Singh K (2016) Human activity recognition based on spatial distribution of gradients at sublevels of average energy silhouette images. IEEE Transactions on Cognitive and Developmental Systems 9(4):316–327

    Article  Google Scholar 

  41. Vishwakarma DK, Dhiman A, Maheshwari R, Kapoor R (2015) Human motion analysis by fusion of silhouette orientation and shape features. Procedia Computer Science 57:438–447

    Article  Google Scholar 

  42. Vishwakarma DK, Rawat P, Kapoor R (2015) Human activity recognition using Gabor wavelet transform and Ridgelet transform. Procedia Computer Science 57:630–636

    Article  Google Scholar 

  43. Vishwakarma DK, Kapoor R, Dhiman A (2016) A proposed unified framework for the recognition of human activity by exploiting the characteristics of action dynamics. Robot Auton Syst 77:25–38

    Article  Google Scholar 

  44. Vishwakarma DK, Kapoor R, Dhiman A (2016) Unified framework for human activity recognition: an approach using spatial edge distribution and r-transform. AEU-International Journal of Electronics and Communications 70(3):341–353

    Google Scholar 

  45. Vrigkas M, Nikou C, Kakadiaris IA (2015) A review of human activity recognition methods. Frontiers in Robotics and AI 2:28

    Article  Google Scholar 

  46. Wan S, Qi L, Xu X, Tong C, Gu Z (2020) Deep learning models for real-time human activity recognition with smartphones. Mobile Networks and Applications 25(2):743–755

    Article  Google Scholar 

  47. Wang Q, Zhang K, Asghar MA (2022) Skeleton-based ST-GCN for human action recognition with extended skeleton graph and partitioning strategy. IEEE Access 10:41403–41410

    Article  Google Scholar 

  48. Wang X, ZhaoJ, Zhu L, Zhou X, Li Z, Feng J, Deng C, Zhang Y (2021) Adaptive multi-receptive field spatial-temporal graph convolutional network for traffic forecasting. In: 2021 IEEE Global Communications Conference (GLOBECOM), pp 1–7. IEEE

  49. Yadav A, Vishwakarma DK (2020) A unified framework of deep networks for genre classification using movie trailer. Appl Soft Comput 96:106624

    Article  Google Scholar 

  50. Yang Y, Deng C, Gao S, Liu W, Tao D, Gao X (2016) Discriminative multi-instance multitask learning for 3d action recognition. IEEE Trans Multimedia 19(3):519–529

    Article  Google Scholar 

  51. Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-Second AAAI Conference on Artificial Intelligence

  52. Yin J, Han J, Wang C, Zhang B, Zeng X (2019) A skeleton-based action recognition system for medical condition detection. In: 2019 IEEE Biomedical Circuits and Systems Conference (BioCAS), pp 1–4. IEEE

  53. Yu Y, Li M, Liu L, Li Y, Wang J (2019) Clinical big data and deep learning: Applications, challenges, and future outlooks. Big Data Mining and Analytics 2(4):288–305

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vivek Tiwari.

Ethics declarations

Conflict of interest

I declare on behalf of the author that there is not any conflict of interest, either non-financial or commercial among the author.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lovanshi, M., Tiwari, V. Human skeleton pose and spatio-temporal feature-based activity recognition using ST-GCN. Multimed Tools Appl 83, 12705–12730 (2024). https://doi.org/10.1007/s11042-023-16001-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16001-9

Keywords

Navigation