Abstract
Action Quality Assessment (AQA) is a critical branch of video understanding, offering impartial evaluations for competitive sports. Existing paradigms tend to assess action quality using equal-length clips that lack sufficient semantics, leading to suboptimal predictions. To address this issue, we propose to conduct AQA with Semantic-Sequence Performance Regression (SSPR). SSPR first divides an action into a series of unequal-length segments according to the semantic continuity of the video, such as jumping, dropping, and entering the water in diving. Specifically, the latest Temporal Convolutional Network (TCN) is adopted for semantic-sequence segmentation. To better achieve SSPR, we design a feature fusion module that integrates the semantics of each segment using cascaded 1D convolutions. Furthermore, the imbalanced distribution phenomenon is usually ignored in AQA and we attempt to propose a new loss called positive-weighting MSE (PW-MSE) to deal with it. PW-MSE encourages the network to focus more on densely distributed samples during training, which further improves the network’s ranking performance. Experimental results on the benchmark datasets (i.e., UNLV-Dive and AQA-7) demonstrate that our proposed method outperforms the current state-of-the-arts.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Anastasiou D, Jin YM, Stoyanov D, Mazomenos E (2023) Keep your eye on the best: Contrastive regression transformer for skill assessment in robotic surgery. IEEE Robot Autom Lett 8(3):1755–1762
Bai Y, Zhou D, Zhang SY, Wang J, Ding E, Guan Y, Wang JD (2022) Action quality assessment with temporal parsing transformer. In: ECCV, Springer, pp 422–438
Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: CVPR, IEEE, pp 6299–6308
Chen WH, Chai Y, Qi M, Sun H, Pu Q, Kong J, Zheng CX (2022) Bottom-up improved multistage temporal convolutional network for action segmentation. Appl Intell 52(12):14053–14069
Dong LJ, Zhang HB, Shi Q, Lei Q, Du JX, Gao S (2021) Learning and fusing multiple hidden substages for action quality assessment. Knowl-Based Syst 229:107388
Farha YA, Gall J (2019) Ms-tcn: Multi-stage temporal convolutional network for action segmentation. In: CVPR, IEEE, pp 3575–3584
Gan WS, Wu WH, Chen SF, Zhao YX, Wong PK (2023) Rethinking 3d cost aggregation in stereo matching. Pattern Recognit Lett 167:75–81
Gao JB, Zheng WS, Pan JH, Gao CY, Wang YW, Zeng W, Lai JH (2020) An asymmetric modeling for action assessment. In: ECCV, Springer, pp 222–238
Gavas RD, Das M, Ghosh SK, Pal A (2023) Spatial-smote for handling imbalance in spatial regression tasks. Multimed Tools Appl 1–22
Graves A, Fernández S, Schmidhuber J (2005) Bidirectional lstm networks for improved phoneme classification and recognition. In: International conference on artificial neural networks, Springer, pp 799–804
Hao N, Ruan SH, Song YH, Chen JS, Tian LG (2023) The establishment of a precise intelligent evaluation system for sports events: Diving. Heliyon 9(11)
Ishikawa Y, Kasai S, Aoki Y, Kataoka H (2021) Alleviating over-segmentation errors by detecting action boundaries. In: WACV, IEEE, pp 2322–2331
Jain H, Harit G, Sharma A (2020) Action quality assessment using siamese network-based deep metric learning. IEEE Trans Circuits Syst Video Technol 31(6):2260–2273
Lea C, Flynn MD, Vidal R, Reiter A, Hager GD (2017) Temporal convolutional networks for action segmentation and detection. In: CVPR, IEEE, pp 156–165
Lei Q, Li HY, Zhang HB, Du JX, Gao SC (2023) Multi-skeleton structures graph convolutional network for action quality assessment in long videos. Appl Intell 1–14
Li HG, Qian WH, Nie RC, Cao JD, Xu D (2023) Siamese conditional generative adversarial network for multi-focus image fusion. Appl Intell 1–16
Li MZ, Zhang HB, Dong LJ, Lei Q, Du JX (2023) Gaussian guided frame sequence encoder network for action quality assessment. Complex Intell Syst 9(2):1963–1974
Li MZ, Zhang HB, Lei Q, Fan Z, Liu J, Du JX (2022) Pairwise contrastive learning network for action quality assessment. In: ECCV, Springer, pp 457–473
Li Y, Chai X, Chen X (2018) End-to-end learning for action quality assessment. In: Pacific rim conference on multimedia, Springer, pp 125–134
Li Y, Chai X, Chen X (2018) Scoringnet: Learning key fragment for action quality assessment with ranking loss in skilled sports. In: ACCV, Springer, pp 149–164
Liu J, Liu Y, Li D, Wang HQ, Huang XH, Song L (2023) Dsdcla: Driving style detection via hybrid cnn-lstm with multi-level attention fusion. Appl Intell 1–18
Nekoui M, Cruz FOT, Cheng L (2020) Falcons: Fast learner-grader for contorted poses in sports. In: CVPR workshops. IEEE
Nekoui M, Cruz FOT, Cheng L (2021) Eagle-eye: Extreme-pose action grader using detail bird’s-eye view. In: WACV, IEEE, pp 394–402
Pan JH, Gao J, Zheng WS (2019) Action assessment by joint relation graphs. In: ICCV, IEEE, pp 6331–6340
Pan JH, Gao J, Zheng WS (2022) Adaptive action assessment. IEEE Trans Pattern Anal Mach Intell 44(12):8779–8795
Parmar P, Morris B (2022) Win-fail action recognition. In: WACV Workshop, IEEE, pp 161–171
Parmar P, Morris BT (2017) Learning to score olympic events. In: CVPR workshops, IEEE, pp 20–28
Parmar P, Morris BT (2019) Action quality assessment across multiple actions. In: WACV, IEEE, pp 1468–1476
Parmar P, Morris BT (2019) What and how well you performed? a multitask learning approach to action quality assessment. In: CVPR, IEEE, pp 304–313
Steininger M, Kobs K, Davidson P, Krause A, Hotho A (2021) Density-based weighting for imbalanced regression. Mach Learn 110:2187–2211
Tang YS, Ni ZL, Zhou JH, Zhang DY, Lu JW, Wu Y, Zhou J (2020) Uncertainty-aware score distribution learning for action quality assessment. In: CVPR, IEEE, pp 9839–9848
Tian Y, Pang GS, Chen YH, Singh R, Verjans JW, Carneiro G (2021) Weakly-supervised video anomaly detection with robust temporal feature magnitude learning. In: CVPR, IEEE, pp 4975–4986
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: ICCV, IEEE, pp 4489–4497
Wang JH, Du ZY, Li A, Wang YH (2020) Assessing action quality via attentive spatio-temporal convolutional networks. In: PRCV, Springer, pp 3–16
Wang Q, Zhang L, Bertinetto L, Hu WM, Torr PHS (2019) Fast online object tracking and segmentation: A unifying approach. In: CVPR, IEEE, pp 1328–1338
Wang SL, Yang DK, Zhai P, Chen CX, Zhang LH (2021) Tsa-net: Tube self-attention network for action quality assessment. In: ACM MM, ACM, pp 4902–4910
Wang TY, Jin MH, Li M (2021) Towards accurate and interpretable surgical skill assessment: a video-based method for skill score prediction and guiding feedback generation. Int J Comput Assist Radiol Surg 16(9):1595–1605
Xiang X, Tian Y, Reiter A, Hager GD, Tran TD (2018) S3d: Stacking segmental p3d for action quality assessment. In: ICIP, IEEE, pp 928–932
Xu JL, Rao Y, Yu X, Chen G, Zhou J, Lu J (2022) Finediving: A fine-grained dataset for procedure-aware action quality assessment. In: CVPR, IEEE, pp 2949–2958
Yang DW, Cao Z, Mao L, Zhang RB (2022) A temporal and channel-combined attention block for action segmentation. Appl Intell 53(3):2738–2750
Yang YZ, Zha KW, Chen Y, Wang H, Katabi D (2021) Delving into deep imbalanced regression. In: ICML, PMLR, pp 11842–11851
Yi FQ, Wen HY, Jiang TT (2021) Asformer: Transformer for action segmentation. In: BMVC, BMVA Press, pp 236
Yu XM, Rao YM, Zhao WL, Lu JW, Zhou J (2021) Group-aware contrastive regression for action quality assessment. In: ICCV, IEEE, pp 7919–7928
Zeng LA, Hong FT, Zheng WS, Yu QZ, Zeng W, Wang YW, Lai JH (2020) Hybrid dynamic-static context-aware attention network for action assessment in long videos. In: ACM MM, ACM, pp 2526–2534
Zhang HB, Dong LJ, Lei Q, Yang LJ, Jiang YG, Du JX (2023) Label-reconstruction-based pseudo-subscore learning for action quality assessment in sporting events. Appl Intell 53(9):10053–10067
Zhang SJ, Pan JH, Gao J, Zheng WS (2022) Semi-supervised action quality assessment with self-supervised segment feature recovery. EEE Trans Circuits Syst Video Technol 32(9):6017–6028
Zhang SJ, Pan JH, Gao J, Zheng WS (2023) Adaptive stage-aware assessment skill transfer for skill determination. IEEE Trans Multimed 1
Zhang SY, Dai WX, Wang SJ, Shen XW, Lu JW, Zhou J, Tang YS (2023) Logo: a long-form video dataset for group action quality assessment. In: CVPR, IEEE, pp 2405–2414
Zhang Y, Xiong W, Mi SY (2022) Learning time-aware features for action quality assessment. Pattern Recognit Lett 158:104–110
Acknowledgements
This work was supported in part by the Key Research and Development Plan of Zhejiang under Grant 2021C03131, in part by the National Natural Science Foundation of China under Grant 61871170. We would like to thank Xiao-Diao Chen and Wen Wu for collaborating with us in proofreading and refining the paper.
Author information
Authors and Affiliations
Contributions
Huang Feng: Data acquisition. Experiments. Validation. Investigation. Writing & editing, Resources. Li Jianjun: Conceptualization, Methodology, Funding acquisition, Supervision, Writing - review & editing. Corresponding author.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical and informed consent for data used
The data used in this paper are from publicly available datasets and do not violate any ethical guidelines.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Huang, F., Li, J. Assessing action quality with semantic-sequence performance regression and densely distributed sample weighting. Appl Intell 54, 3245–3259 (2024). https://doi.org/10.1007/s10489-024-05349-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-024-05349-6