Toward action recognition and assessment using SFAGCN and combinative regression model of spatiotemporal features

Zhang, Zhitao; Wang, Zhengyou; Zhuang, Shanna; Wang, Jiahui

doi:10.1007/s10489-022-03411-9

Toward action recognition and assessment using SFAGCN and combinative regression model of spatiotemporal features

Published: 21 April 2022

Volume 53, pages 757–768, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Zhitao Zhang¹,
Zhengyou Wang ORCID: orcid.org/0000-0002-9976-8583^1,2,
Shanna Zhuang^1,2 &
…
Jiahui Wang¹

312 Accesses
1 Citation
Explore all metrics

Abstract

Human skeleton contains intuitive information of motions, therefore, it has been widely studied in action analysis tasks. As a part of action analysis, traditional models human action assessment by handcrafted-feature-based methods, such as dynamic time warping (DTW). These methods only extract the similarity of particular spatiotemporal features, whereas the global spatio-temporal relevance of action analysis tends to be ignored. In this paper, we propose a regression assessment model for action spatio-temporal features, which encodes the temporal features, spatial features and fused features respectively. The self-attention mechanism is taken advantage of to fuse the decoupling features, and then the overall score of action was calculated by regression. Specifically, via structure-feature fusion adaptive graph convolutional networks (SFAGCN), our action assessment network models the deep dependence of global spatio-temporal feature to address the difficulties of limited expressive ability and generalization. Furthermore, the topology of the skeletal graph and the features of the joints are merged by decoupling the spatio-temporal correlations. To confirm the effectiveness of our assessment model, we conduct experiments on six Olympic Games assessment tasks and exceed the state-of-the-art performance in Spearman’s rank correlation analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An attentional spatial temporal graph convolutional network with co-occurrence feature learning for action recognition

Article 22 January 2020

Dong Tian, Zhe-Ming Lu, … Long-Hua Ma

Multi-skeleton structures graph convolutional network for action quality assessment in long videos

Article 09 June 2023

Qing Lei, Huiying Li, … Shangce Gao

Skeleton-Based Action Recognition with Dense Spatial Temporal Graph Network

References

Alexiadis DS, Daras P (2014) Quaternionic signal processing techniques for automatic evaluation of dance performances from mocap data. IEEE Trans Multimed 16(5):1391–1406
Article Google Scholar
Lea C, Reiter A, Vidal R, Hager GD (2016) Segmental spatiotemporal cnns for fine-grained action segmentation. In: European conference on computer vision, pp 36–52
Lea C, Flynn MD, Vidal R, Reiter A, Hager GD (2017) Temporal convolutional networks for action segmentation and detection. In: proceedings of the IEEE conference on computer vision and pattern recognition, pp 156–165
Li C, Zhong Q, Xie D, Pu S (2018) Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. arXiv:1804.06055
Song S, Lan C, Xing J, Zeng W, Liu J (2017) An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: Proceedings of the AAAI conference on artificial intelligence, vol 31
Li W, Wen L, Chang M-C, Lim SN, Lyu S (2017) Adaptive rnn tree for large-scale human action recognition. In: Proceedings of the IEEE international conference on computer vision, pp 1444–1452
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 32
Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12026–12035
Shi L, Zhang Y, Cheng J, Lu H (2019) Skeleton-based action recognition with directed graph neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7912–7921
Cheng K, Zhang Y, He X, Chen W, Cheng J, Lu H (2020) Skeleton-based action recognition with shift graph convolutional network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 183–192
Pan J-H, Gao J, Zheng W-S (2019) Action assessment by joint relation graphs. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6331–6340
Gao J, Zheng W-S, Pan J-H, Gao C, Wang Y, Zeng W, Lai J (2020) An asymmetric modeling for action assessment. In: European conference on computer vision. Springer, pp 222–238
Shahroudy A, Liu J, Ng T-T, Wang G (2016) Ntu rgb+ d: a large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1010–1019
Liu J, Shahroudy A, Perez M, Wang G, Duan L-Y, Kot AC (2019) Ntu rgb+ d 120: A large-scale benchmark for 3d human activity understanding. IEEE Trans Pattern Anal Mach Intell 42 (10):2684–2701
Article Google Scholar
Kay W, Carreira J, Simonyan K, Zhang B, Hillier C, Vijayanarasimhan S, Viola F, Green T, Back T, Natsev P et al (2017) The kinetics human action video dataset. arXiv:1705.06950
Parmar P, Morris BT (2017) Learning to score olympic events. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 20–28
Zhang Z, Wang Z, Zhuang S, Huang F (2020) Structure-feature fusion adaptive graph convolutional networks for skeleton-based action recognition. IEEE Access 8:228108–228117
Article Google Scholar
Zia A, Sharma Y, Bettadapura V, Sarin EL, Essa I (2018) Video and accelerometer-based motion analysis for automated surgical skills assessment. Int J CARS 13(3):443–455
Article Google Scholar
Zia A, Sharma Y, Bettadapura V, Sarin EL, Ploetz T, Clements MA, Essa I (2016) Automated video-based assessment of surgical skills for training and evaluation in medical schools. International Journal of Computer Assisted Radiology and Surgery 11(9):1623–1636
Article Google Scholar
Bertasius G, Park HS, Yu SX, Shi J (2017) Am i a baller? basketball performance assessment from first-person videos. In: Proceedings of the IEEE international conference on computer vision, pp 2177–2185
Doughty H, Mayol-Cuevas W, Damen D (2019) The pros and cons: Rank-aware temporal attention for skill determination in long videos. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7862–7871
Parmar P, Morris B (2019) Action quality assessment across multiple actions. In: 2019 IEEE Winter conference on applications of computer vision (WACV), pp 1468–1476
Pirsiavash H, Vondrick C, Torralba A (2014) Assessing the quality of actions. In: European conference on computer vision, pp 556–571
van den Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) Wavenet: A generative model for raw audio. arXiv:1609.03499
Li X, Wang W, Hu X, Yang J (2019) Selective kernel networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 510–519
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXiv:1706.03762
Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 588–595
Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2017) View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2117–2126
Kim ST, Reiter A (2017) Interpretable 3d human action analysis with temporal convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) Workshops, pp 1623–1631
Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2017) A new representation of skeleton sequences for 3d action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3288–3297
Liu M, Liu H, Chen C (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recogn 68(68):346–362
Article Google Scholar
Li C, Zhong Q, Xie D, Pu S (2018) Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. In: Twenty-seventh international joint conference on artificial intelligence (IJCAI), pp 786– 792
Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 12026–12035
Li B, Li X, Zhang Z, Wu F (2019) Spatio-temporal graph routing for skeleton-based action recognition. In: Thirty-third AAAI conference on artificial intelligence, vol 33, pp 8561–8568
Liu Z, Zhang H, Chen Z, Wang Z, Ouyang W (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 143–152
Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal lstm with trust gates for 3d human action recognition. In: European conference on computer vision. Springer, pp 816–833
Liu J, Wang G, Hu P, Duan L-Y, Kot AC (2017) Global context-aware attention lstm networks for 3d action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1647–1656
Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2018) Learning clip representations for skeleton-based 3d action recognition. IEEE Trans Image Process 27(6):2842–2855
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The paper is pratially supported by the National Nature Science Foundation of China(No. 61972267) and Nature Science Foundation of Hebei Province (No. F2019210306).

Author information

Authors and Affiliations

School of Information Science and Technology, Shijiazhuang Tiedao University, Shijiazhuang, 050043, China
Zhitao Zhang, Zhengyou Wang, Shanna Zhuang & Jiahui Wang
Hebei Key Laboratory for Electromagnetic Environmental Effects and Information Processing, Hebei, China
Zhengyou Wang & Shanna Zhuang

Authors

Zhitao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhengyou Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shanna Zhuang
View author publications
You can also search for this author in PubMed Google Scholar
Jiahui Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zhengyou Wang or Shanna Zhuang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Z., Wang, Z., Zhuang, S. et al. Toward action recognition and assessment using SFAGCN and combinative regression model of spatiotemporal features. Appl Intell 53, 757–768 (2023). https://doi.org/10.1007/s10489-022-03411-9

Download citation

Accepted: 17 February 2022
Published: 21 April 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s10489-022-03411-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Toward action recognition and assessment using SFAGCN and combinative regression model of spatiotemporal features

Abstract

Access this article

Similar content being viewed by others

An attentional spatial temporal graph convolutional network with co-occurrence feature learning for action recognition

Multi-skeleton structures graph convolutional network for action quality assessment in long videos

Skeleton-Based Action Recognition with Dense Spatial Temporal Graph Network

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Toward action recognition and assessment using SFAGCN and combinative regression model of spatiotemporal features

Abstract

Access this article

Similar content being viewed by others

An attentional spatial temporal graph convolutional network with co-occurrence feature learning for action recognition

Multi-skeleton structures graph convolutional network for action quality assessment in long videos

Skeleton-Based Action Recognition with Dense Spatial Temporal Graph Network

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation