Skip to main content
Log in

Assessing action quality with semantic-sequence performance regression and densely distributed sample weighting

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Action Quality Assessment (AQA) is a critical branch of video understanding, offering impartial evaluations for competitive sports. Existing paradigms tend to assess action quality using equal-length clips that lack sufficient semantics, leading to suboptimal predictions. To address this issue, we propose to conduct AQA with Semantic-Sequence Performance Regression (SSPR). SSPR first divides an action into a series of unequal-length segments according to the semantic continuity of the video, such as jumping, dropping, and entering the water in diving. Specifically, the latest Temporal Convolutional Network (TCN) is adopted for semantic-sequence segmentation. To better achieve SSPR, we design a feature fusion module that integrates the semantics of each segment using cascaded 1D convolutions. Furthermore, the imbalanced distribution phenomenon is usually ignored in AQA and we attempt to propose a new loss called positive-weighting MSE (PW-MSE) to deal with it. PW-MSE encourages the network to focus more on densely distributed samples during training, which further improves the network’s ranking performance. Experimental results on the benchmark datasets (i.e., UNLV-Dive and AQA-7) demonstrate that our proposed method outperforms the current state-of-the-arts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Anastasiou D, Jin YM, Stoyanov D, Mazomenos E (2023) Keep your eye on the best: Contrastive regression transformer for skill assessment in robotic surgery. IEEE Robot Autom Lett 8(3):1755–1762

    Article  Google Scholar 

  2. Bai Y, Zhou D, Zhang SY, Wang J, Ding E, Guan Y, Wang JD (2022) Action quality assessment with temporal parsing transformer. In: ECCV, Springer, pp 422–438

  3. Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: CVPR, IEEE, pp 6299–6308

  4. Chen WH, Chai Y, Qi M, Sun H, Pu Q, Kong J, Zheng CX (2022) Bottom-up improved multistage temporal convolutional network for action segmentation. Appl Intell 52(12):14053–14069

    Article  Google Scholar 

  5. Dong LJ, Zhang HB, Shi Q, Lei Q, Du JX, Gao S (2021) Learning and fusing multiple hidden substages for action quality assessment. Knowl-Based Syst 229:107388

    Article  Google Scholar 

  6. Farha YA, Gall J (2019) Ms-tcn: Multi-stage temporal convolutional network for action segmentation. In: CVPR, IEEE, pp 3575–3584

  7. Gan WS, Wu WH, Chen SF, Zhao YX, Wong PK (2023) Rethinking 3d cost aggregation in stereo matching. Pattern Recognit Lett 167:75–81

    Article  Google Scholar 

  8. Gao JB, Zheng WS, Pan JH, Gao CY, Wang YW, Zeng W, Lai JH (2020) An asymmetric modeling for action assessment. In: ECCV, Springer, pp 222–238

  9. Gavas RD, Das M, Ghosh SK, Pal A (2023) Spatial-smote for handling imbalance in spatial regression tasks. Multimed Tools Appl 1–22

  10. Graves A, Fernández S, Schmidhuber J (2005) Bidirectional lstm networks for improved phoneme classification and recognition. In: International conference on artificial neural networks, Springer, pp 799–804

  11. Hao N, Ruan SH, Song YH, Chen JS, Tian LG (2023) The establishment of a precise intelligent evaluation system for sports events: Diving. Heliyon 9(11)

  12. Ishikawa Y, Kasai S, Aoki Y, Kataoka H (2021) Alleviating over-segmentation errors by detecting action boundaries. In: WACV, IEEE, pp 2322–2331

  13. Jain H, Harit G, Sharma A (2020) Action quality assessment using siamese network-based deep metric learning. IEEE Trans Circuits Syst Video Technol 31(6):2260–2273

    Article  Google Scholar 

  14. Lea C, Flynn MD, Vidal R, Reiter A, Hager GD (2017) Temporal convolutional networks for action segmentation and detection. In: CVPR, IEEE, pp 156–165

  15. Lei Q, Li HY, Zhang HB, Du JX, Gao SC (2023) Multi-skeleton structures graph convolutional network for action quality assessment in long videos. Appl Intell 1–14

  16. Li HG, Qian WH, Nie RC, Cao JD, Xu D (2023) Siamese conditional generative adversarial network for multi-focus image fusion. Appl Intell 1–16

  17. Li MZ, Zhang HB, Dong LJ, Lei Q, Du JX (2023) Gaussian guided frame sequence encoder network for action quality assessment. Complex Intell Syst 9(2):1963–1974

    Article  Google Scholar 

  18. Li MZ, Zhang HB, Lei Q, Fan Z, Liu J, Du JX (2022) Pairwise contrastive learning network for action quality assessment. In: ECCV, Springer, pp 457–473

  19. Li Y, Chai X, Chen X (2018) End-to-end learning for action quality assessment. In: Pacific rim conference on multimedia, Springer, pp 125–134

  20. Li Y, Chai X, Chen X (2018) Scoringnet: Learning key fragment for action quality assessment with ranking loss in skilled sports. In: ACCV, Springer, pp 149–164

  21. Liu J, Liu Y, Li D, Wang HQ, Huang XH, Song L (2023) Dsdcla: Driving style detection via hybrid cnn-lstm with multi-level attention fusion. Appl Intell 1–18

  22. Nekoui M, Cruz FOT, Cheng L (2020) Falcons: Fast learner-grader for contorted poses in sports. In: CVPR workshops. IEEE

  23. Nekoui M, Cruz FOT, Cheng L (2021) Eagle-eye: Extreme-pose action grader using detail bird’s-eye view. In: WACV, IEEE, pp 394–402

  24. Pan JH, Gao J, Zheng WS (2019) Action assessment by joint relation graphs. In: ICCV, IEEE, pp 6331–6340

  25. Pan JH, Gao J, Zheng WS (2022) Adaptive action assessment. IEEE Trans Pattern Anal Mach Intell 44(12):8779–8795

    Article  Google Scholar 

  26. Parmar P, Morris B (2022) Win-fail action recognition. In: WACV Workshop, IEEE, pp 161–171

  27. Parmar P, Morris BT (2017) Learning to score olympic events. In: CVPR workshops, IEEE, pp 20–28

  28. Parmar P, Morris BT (2019) Action quality assessment across multiple actions. In: WACV, IEEE, pp 1468–1476

  29. Parmar P, Morris BT (2019) What and how well you performed? a multitask learning approach to action quality assessment. In: CVPR, IEEE, pp 304–313

  30. Steininger M, Kobs K, Davidson P, Krause A, Hotho A (2021) Density-based weighting for imbalanced regression. Mach Learn 110:2187–2211

    Article  MathSciNet  Google Scholar 

  31. Tang YS, Ni ZL, Zhou JH, Zhang DY, Lu JW, Wu Y, Zhou J (2020) Uncertainty-aware score distribution learning for action quality assessment. In: CVPR, IEEE, pp 9839–9848

  32. Tian Y, Pang GS, Chen YH, Singh R, Verjans JW, Carneiro G (2021) Weakly-supervised video anomaly detection with robust temporal feature magnitude learning. In: CVPR, IEEE, pp 4975–4986

  33. Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: ICCV, IEEE, pp 4489–4497

  34. Wang JH, Du ZY, Li A, Wang YH (2020) Assessing action quality via attentive spatio-temporal convolutional networks. In: PRCV, Springer, pp 3–16

  35. Wang Q, Zhang L, Bertinetto L, Hu WM, Torr PHS (2019) Fast online object tracking and segmentation: A unifying approach. In: CVPR, IEEE, pp 1328–1338

  36. Wang SL, Yang DK, Zhai P, Chen CX, Zhang LH (2021) Tsa-net: Tube self-attention network for action quality assessment. In: ACM MM, ACM, pp 4902–4910

  37. Wang TY, Jin MH, Li M (2021) Towards accurate and interpretable surgical skill assessment: a video-based method for skill score prediction and guiding feedback generation. Int J Comput Assist Radiol Surg 16(9):1595–1605

    Article  Google Scholar 

  38. Xiang X, Tian Y, Reiter A, Hager GD, Tran TD (2018) S3d: Stacking segmental p3d for action quality assessment. In: ICIP, IEEE, pp 928–932

  39. Xu JL, Rao Y, Yu X, Chen G, Zhou J, Lu J (2022) Finediving: A fine-grained dataset for procedure-aware action quality assessment. In: CVPR, IEEE, pp 2949–2958

  40. Yang DW, Cao Z, Mao L, Zhang RB (2022) A temporal and channel-combined attention block for action segmentation. Appl Intell 53(3):2738–2750

    Article  Google Scholar 

  41. Yang YZ, Zha KW, Chen Y, Wang H, Katabi D (2021) Delving into deep imbalanced regression. In: ICML, PMLR, pp 11842–11851

  42. Yi FQ, Wen HY, Jiang TT (2021) Asformer: Transformer for action segmentation. In: BMVC, BMVA Press, pp 236

  43. Yu XM, Rao YM, Zhao WL, Lu JW, Zhou J (2021) Group-aware contrastive regression for action quality assessment. In: ICCV, IEEE, pp 7919–7928

  44. Zeng LA, Hong FT, Zheng WS, Yu QZ, Zeng W, Wang YW, Lai JH (2020) Hybrid dynamic-static context-aware attention network for action assessment in long videos. In: ACM MM, ACM, pp 2526–2534

  45. Zhang HB, Dong LJ, Lei Q, Yang LJ, Jiang YG, Du JX (2023) Label-reconstruction-based pseudo-subscore learning for action quality assessment in sporting events. Appl Intell 53(9):10053–10067

    Article  Google Scholar 

  46. Zhang SJ, Pan JH, Gao J, Zheng WS (2022) Semi-supervised action quality assessment with self-supervised segment feature recovery. EEE Trans Circuits Syst Video Technol 32(9):6017–6028

    Article  Google Scholar 

  47. Zhang SJ, Pan JH, Gao J, Zheng WS (2023) Adaptive stage-aware assessment skill transfer for skill determination. IEEE Trans Multimed 1

  48. Zhang SY, Dai WX, Wang SJ, Shen XW, Lu JW, Zhou J, Tang YS (2023) Logo: a long-form video dataset for group action quality assessment. In: CVPR, IEEE, pp 2405–2414

  49. Zhang Y, Xiong W, Mi SY (2022) Learning time-aware features for action quality assessment. Pattern Recognit Lett 158:104–110

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the Key Research and Development Plan of Zhejiang under Grant 2021C03131, in part by the National Natural Science Foundation of China under Grant 61871170. We would like to thank Xiao-Diao Chen and Wen Wu for collaborating with us in proofreading and refining the paper.

Author information

Authors and Affiliations

Authors

Contributions

Huang Feng: Data acquisition. Experiments. Validation. Investigation. Writing & editing, Resources. Li Jianjun: Conceptualization, Methodology, Funding acquisition, Supervision, Writing - review & editing. Corresponding author.

Corresponding author

Correspondence to Jianjun Li.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical and informed consent for data used

The data used in this paper are from publicly available datasets and do not violate any ethical guidelines.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, F., Li, J. Assessing action quality with semantic-sequence performance regression and densely distributed sample weighting. Appl Intell 54, 3245–3259 (2024). https://doi.org/10.1007/s10489-024-05349-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-024-05349-6

Keywords