Skip to main content
Log in

Toward action recognition and assessment using SFAGCN and combinative regression model of spatiotemporal features

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Human skeleton contains intuitive information of motions, therefore, it has been widely studied in action analysis tasks. As a part of action analysis, traditional models human action assessment by handcrafted-feature-based methods, such as dynamic time warping (DTW). These methods only extract the similarity of particular spatiotemporal features, whereas the global spatio-temporal relevance of action analysis tends to be ignored. In this paper, we propose a regression assessment model for action spatio-temporal features, which encodes the temporal features, spatial features and fused features respectively. The self-attention mechanism is taken advantage of to fuse the decoupling features, and then the overall score of action was calculated by regression. Specifically, via structure-feature fusion adaptive graph convolutional networks (SFAGCN), our action assessment network models the deep dependence of global spatio-temporal feature to address the difficulties of limited expressive ability and generalization. Furthermore, the topology of the skeletal graph and the features of the joints are merged by decoupling the spatio-temporal correlations. To confirm the effectiveness of our assessment model, we conduct experiments on six Olympic Games assessment tasks and exceed the state-of-the-art performance in Spearman’s rank correlation analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Alexiadis DS, Daras P (2014) Quaternionic signal processing techniques for automatic evaluation of dance performances from mocap data. IEEE Trans Multimed 16(5):1391–1406

    Article  Google Scholar 

  2. Lea C, Reiter A, Vidal R, Hager GD (2016) Segmental spatiotemporal cnns for fine-grained action segmentation. In: European conference on computer vision, pp 36–52

  3. Lea C, Flynn MD, Vidal R, Reiter A, Hager GD (2017) Temporal convolutional networks for action segmentation and detection. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 156–165

  4. Li C, Zhong Q, Xie D, Pu S (2018) Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. arXiv:1804.06055

  5. Song S, Lan C, Xing J, Zeng W, Liu J (2017) An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: Proceedings of the AAAI conference on artificial intelligence, vol 31

  6. Li W, Wen L, Chang M-C, Lim SN, Lyu S (2017) Adaptive rnn tree for large-scale human action recognition. In: Proceedings of the IEEE international conference on computer vision, pp 1444–1452

  7. Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 32

  8. Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12026–12035

  9. Shi L, Zhang Y, Cheng J, Lu H (2019) Skeleton-based action recognition with directed graph neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7912–7921

  10. Cheng K, Zhang Y, He X, Chen W, Cheng J, Lu H (2020) Skeleton-based action recognition with shift graph convolutional network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 183–192

  11. Pan J-H, Gao J, Zheng W-S (2019) Action assessment by joint relation graphs. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6331–6340

  12. Gao J, Zheng W-S, Pan J-H, Gao C, Wang Y, Zeng W, Lai J (2020) An asymmetric modeling for action assessment. In: European conference on computer vision. Springer, pp 222–238

  13. Shahroudy A, Liu J, Ng T-T, Wang G (2016) Ntu rgb+ d: a large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1010–1019

  14. Liu J, Shahroudy A, Perez M, Wang G, Duan L-Y, Kot AC (2019) Ntu rgb+ d 120: A large-scale benchmark for 3d human activity understanding. IEEE Trans Pattern Anal Mach Intell 42 (10):2684–2701

    Article  Google Scholar 

  15. Kay W, Carreira J, Simonyan K, Zhang B, Hillier C, Vijayanarasimhan S, Viola F, Green T, Back T, Natsev P et al (2017) The kinetics human action video dataset. arXiv:1705.06950

  16. Parmar P, Morris BT (2017) Learning to score olympic events. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 20–28

  17. Zhang Z, Wang Z, Zhuang S, Huang F (2020) Structure-feature fusion adaptive graph convolutional networks for skeleton-based action recognition. IEEE Access 8:228108–228117

    Article  Google Scholar 

  18. Zia A, Sharma Y, Bettadapura V, Sarin EL, Essa I (2018) Video and accelerometer-based motion analysis for automated surgical skills assessment. Int J CARS 13(3):443–455

    Article  Google Scholar 

  19. Zia A, Sharma Y, Bettadapura V, Sarin EL, Ploetz T, Clements MA, Essa I (2016) Automated video-based assessment of surgical skills for training and evaluation in medical schools. International Journal of Computer Assisted Radiology and Surgery 11(9):1623–1636

    Article  Google Scholar 

  20. Bertasius G, Park HS, Yu SX, Shi J (2017) Am i a baller? basketball performance assessment from first-person videos. In: Proceedings of the IEEE international conference on computer vision, pp 2177–2185

  21. Doughty H, Mayol-Cuevas W, Damen D (2019) The pros and cons: Rank-aware temporal attention for skill determination in long videos. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7862–7871

  22. Parmar P, Morris B (2019) Action quality assessment across multiple actions. In: 2019 IEEE Winter conference on applications of computer vision (WACV), pp 1468–1476

  23. Pirsiavash H, Vondrick C, Torralba A (2014) Assessing the quality of actions. In: European conference on computer vision, pp 556–571

  24. van den Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) Wavenet: A generative model for raw audio. arXiv:1609.03499

  25. Li X, Wang W, Hu X, Yang J (2019) Selective kernel networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 510–519

  26. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXiv:1706.03762

  27. Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 588–595

  28. Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2017) View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2117–2126

  29. Kim ST, Reiter A (2017) Interpretable 3d human action analysis with temporal convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) Workshops, pp 1623–1631

  30. Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2017) A new representation of skeleton sequences for 3d action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3288–3297

  31. Liu M, Liu H, Chen C (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recogn 68(68):346–362

    Article  Google Scholar 

  32. Li C, Zhong Q, Xie D, Pu S (2018) Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. In: Twenty-seventh international joint conference on artificial intelligence (IJCAI), pp 786– 792

  33. Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 12026–12035

  34. Li B, Li X, Zhang Z, Wu F (2019) Spatio-temporal graph routing for skeleton-based action recognition. In: Thirty-third AAAI conference on artificial intelligence, vol 33, pp 8561–8568

  35. Liu Z, Zhang H, Chen Z, Wang Z, Ouyang W (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 143–152

  36. Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal lstm with trust gates for 3d human action recognition. In: European conference on computer vision. Springer, pp 816–833

  37. Liu J, Wang G, Hu P, Duan L-Y, Kot AC (2017) Global context-aware attention lstm networks for 3d action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1647–1656

  38. Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2018) Learning clip representations for skeleton-based 3d action recognition. IEEE Trans Image Process 27(6):2842–2855

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The paper is pratially supported by the National Nature Science Foundation of China(No. 61972267) and Nature Science Foundation of Hebei Province (No. F2019210306).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zhengyou Wang or Shanna Zhuang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Z., Wang, Z., Zhuang, S. et al. Toward action recognition and assessment using SFAGCN and combinative regression model of spatiotemporal features. Appl Intell 53, 757–768 (2023). https://doi.org/10.1007/s10489-022-03411-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03411-9

Keywords

Navigation