skip to main content
10.1145/3581783.3613774acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

A Figure Skating Jumping Dataset for Replay-Guided Action Quality Assessment

Published: 27 October 2023 Publication History

Abstract

In competitive sports, judges often scrutinize replay videos from multiple views to adjudicate uncertain or contentious actions, and ultimately ascertain the definitive score. Most existing action quality assessment methods regress from a single video or a pairwise exemplar and input videos, which are limited by the viewpoint and zoom scale of videos. To end this, we construct a Replay Figure Skating Jumping dataset (RFSJ), containing additional view information provided by the post-match replay video and fine-grained annotations. We also propose a Replay-Guided approach for action quality assessment, learned by a Triple-Stream Contrastive Transformer and a Temporal Concentration Module. Specifically, besides the pairwise input and exemplar, we contrast the input and its replay by an extra contrastive module. Then the consistency of scores guides the model to learn features of the same action under different views and zoom scales. In addition, based on the fact that errors or highlight moments of athletes are crucial factors affecting scoring, these moments are concentrated in parts of the video rather than a uniform distribution. The proposed temporal concentration module encourages the model to concentrate on these features, then cooperates with the contrastive regression module to obtain an effective scoring mechanism. Extensive experiments demonstrate that our method achieves Spearman's Rank Correlation of 0.9346 on the proposed RFSJ dataset, improving over the existing state-of-the-art methods.

Supplemental Material

MP4 File
Presentation video - short version

References

[1]
Yang Bai, Desen Zhou, Songyang Zhang, Jian Wang, Errui Ding, Yu Guan, Yang Long, and Jingdong Wang. 2022. Action quality assessment with temporal parsing transformer. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part IV. Springer, 422--438.
[2]
Gedas Bertasius, Hyun Soo Park, Stella X Yu, and Jianbo Shi. 2017. Am I a baller? basketball performance assessment from first-person videos. In Proceedings of the IEEE international conference on computer vision. 2177--2185.
[3]
João Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299--6308.
[4]
Shoufa Chen, Peize Sun, Enze Xie, Chongjian Ge, Jiannan Wu, Lan Ma, Jiajun Shen, and Ping Luo. 2021. Watch Only Once: An End-to-End Video Action Detection Framework. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, Montreal, QC, Canada, 8158--8167. https://doi.org/10.1109/ICCV48922.2021.00807
[5]
Ke Cheng, Yifan Zhang, Xiangyu He, Weihan Chen, Jian Cheng, and Hanqing Lu. 2020. Skeleton-Based Action Recognition With Shift Graph Convolutional Network. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, USA, 180--189. https://doi.org/10.1109/CVPR42600.2020.00026
[6]
Hazel Doughty, Dima Damen, and Walterio Mayol-Cuevas. 2018. Who's better? who's best? pairwise deep ranking for skill determination. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6057--6066.
[7]
Hazel Doughty, Walterio Mayol-Cuevas, and Dima Damen. 2019. The pros and cons: Rank-aware temporal attention for skill determination in long videos. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 7862--7871.
[8]
Hyunjun Eun, Jinyoung Moon, Jongyoul Park, Chanho Jung, and Changick Kim. 2020. Learning to Discriminate Information for Online Action Detection. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, USA, 806--815. https://doi.org/10.1109/CVPR42600.2020.00089
[9]
Isabel Funke, Sören Torge Mees, Jürgen Weitz, and Stefanie Speidel. 2019. Video-based surgical skill assessment using 3D convolutional neural networks. International journal of computer assisted radiology and surgery, Vol. 14 (2019), 1217--1225.
[10]
Ruohan Gao, Tae-Hyun Oh, Kristen Grauman, and Lorenzo Torresani. 2020. Listen to Look: Action Recognition by Previewing Audio. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, USA, 10454--10464. https://doi.org/10.1109/CVPR42600.2020.01047
[11]
Kristin Houser. 2018. AI Will Help Judges Score Gymnastics Events at the 2020 Olympics. https://futurism.com/ai-judges-score-gymnastics-2020-olympics
[12]
Huiying Li, Qing Lei, Hongbo Zhang, Jixiang Du, and Shangce Gao. 2022a. Skeleton-based deep pose feature learning for action quality assessment on figure skating videos. Journal of Visual Communication and Image Representation, Vol. 89 (2022), 103625.
[13]
Mingzhe Li, Hong-Bo Zhang, Qing Lei, Zongwen Fan, Jinghua Liu, and Ji-Xiang Du. 2022b. Pairwise Contrastive Learning Network for Action Quality Assessment. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part IV. Springer, 457--473.
[14]
Ruixin Li, Longchuan Yan, Yuanlong Peng, and Laiyun Qing. 2023. Lighter Transformer for Online Action Detection. In Proceedings of the 2023 6th International Conference on Image and Graphics Processing. ACM, Chongqing China, 161--167. https://doi.org/10.1145/3582649.3582656
[15]
Yongjun Li, Xiujuan Chai, and Xilin Chen. 2018. End-to-end learning for action quality assessment. In Advances in Multimedia Information Processing-PCM 2018: 19th Pacific-Rim Conference on Multimedia, Hefei, China, September 21-22, 2018, Proceedings, Part II. Springer, 125--134.
[16]
Yan Li, Bin Ji, Xintian Shi, Jianguo Zhang, Bin Kang, and Limin Wang. 2020. TEA: Temporal Excitation and Aggregation for Action Recognition. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, USA, 906--915. https://doi.org/10.1109/CVPR42600.2020.00099
[17]
Daochang Liu, Qiyue Li, Tingting Jiang, Yizhou Wang, Rulin Miao, Fei Shan, and Ziyu Li. 2021. Towards unified surgical skill assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9522--9531.
[18]
Haiyang Liu, Naoya Iwamoto, Zihao Zhu, Zhengqing Li, You Zhou, Elif Bozkurt, and Bo Zheng. 2022a. DisCo: Disentangled Implicit Content and Rhythm Learning for Diverse Co-Speech Gestures Synthesis. In Proceedings of the 30th ACM International Conference on Multimedia. 3764--3773.
[19]
Haiyang Liu, Zihao Zhu, Naoya Iwamoto, Yichen Peng, Zhengqing Li, You Zhou, Elif Bozkurt, and Bo Zheng. 2022b. Beat: A large-scale semantic and emotional multi-modal dataset for conversational gestures synthesis. In European Conference on Computer Vision. Springer, 612--630.
[20]
Mahdiar Nekoui, Fidel Omar Tito Cruz, and Li Cheng. 2021. EAGLE-Eye: Extreme-Pose Action Grader Using Detail Bird's-Eye View. In Proceedings of the IEEE/CVF winter conference on applications of computer vision. 394--402.
[21]
Paritosh Parmar, Amol Gharat, and Helge Rhodin. 2022. Domain Knowledge-Informed Self-supervised Representations for Workout Form Assessment. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXXVIII. Springer, 105--123.
[22]
Paritosh Parmar and Brendan Morris. 2019a. Action Quality Assessment Across Multiple Actions. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, Waikoloa Village, HI, USA, 1468--1476. https://doi.org/10.1109/WACV.2019.00161
[23]
Paritosh Parmar and Brendan Tran Morris. 2019b. What and how well you performed? a multitask learning approach to action quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 304--313.
[24]
Paritosh Parmar and Brendan Tran Morris. 2017. Learning to score olympic events. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 20--28.
[25]
Hamed Pirsiavash, Carl Vondrick, and Antonio Torralba. 2014. Assessing the quality of actions. In Computer Vision-ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI 13. Springer, 556--571.
[26]
Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild. arxiv: 1212.0402 [cs.CV]
[27]
Yansong Tang, Zanlin Ni, Jiahuan Zhou, Danyang Zhang, Jiwen Lu, Ying Wu, and Jie Zhou. 2020. Uncertainty-aware score distribution learning for action quality assessment. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9839--9848.
[28]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).
[29]
Shruti Vyas, Yogesh S Rawat, and Mubarak Shah. 2020. Multi-view action recognition using cross-view video prediction. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XXVII 16. Springer, 427--444.
[30]
Limin Wang, Zhan Tong, Bin Ji, and Gangshan Wu. 2021b. TDN: Temporal Difference Networks for Efficient Action Recognition. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Nashville, TN, USA, 1895--1904. https://doi.org/10.1109/CVPR46437.2021.00193
[31]
Shunli Wang, Dingkang Yang, Peng Zhai, Chixiao Chen, and Lihua Zhang. 2021c. Tsa-net: Tube self-attention network for action quality assessment. In Proceedings of the 29th ACM International Conference on Multimedia. 4902--4910.
[32]
Tianyu Wang, Minhao Jin, and Mian Li. 2021a. Towards accurate and interpretable surgical skill assessment: a video-based method for skill score prediction and guiding feedback generation. International Journal of Computer Assisted Radiology and Surgery, Vol. 16, 9 (2021), 1595--1605.
[33]
Chengming Xu, Yanwei Fu, Bing Zhang, Zitian Chen, Yu-Gang Jiang, and Xiangyang Xue. 2019. Learning to score figure skating sport videos. IEEE transactions on circuits and systems for video technology, Vol. 30, 12 (2019), 4578--4590.
[34]
Jinglin Xu, Yongming Rao, Xumin Yu, Guangyi Chen, Jie Zhou, and Jiwen Lu. 2022. Finediving: A fine-grained dataset for procedure-aware action quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2949--2958.
[35]
Mengmeng Xu, Chen Zhao, David S. Rojas, Ali Thabet, and Bernard Ghanem. 2020. G-TAD: Sub-Graph Localization for Temporal Action Detection. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, USA, 10153--10162. https://doi.org/10.1109/CVPR42600.2020.01017
[36]
Shen Yan, Xuehan Xiong, Anurag Arnab, Zhichao Lu, Mi Zhang, Chen Sun, and Cordelia Schmid. 2022. Multiview transformers for video recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3333--3343.
[37]
Ceyuan Yang, Yinghao Xu, Jianping Shi, Bo Dai, and Bolei Zhou. 2020. Temporal Pyramid Network for Action Recognition. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, WA, USA, 588--597. https://doi.org/10.1109/CVPR42600.2020.00067
[38]
Wenfei Yang, Tianzhu Zhang, Zhendong Mao, Yongdong Zhang, Qi Tian, and Feng Wu. 2021. Multi-Scale Structure-Aware Network for Weakly Supervised Temporal Action Detection. IEEE Transactions on Image Processing, Vol. 30 (2021), 5848--5861. https://doi.org/10.1109/TIP.2021.3089361
[39]
Xumin Yu, Yongming Rao, Wenliang Zhao, Jiwen Lu, and Jie Zhou. 2021. Group-aware contrastive regression for action quality assessment. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7919--7928.
[40]
Aneeq Zia and Irfan Essa. 2018. Automated surgical skill assessment in RMIS training. International journal of computer assisted radiology and surgery, Vol. 13 (2018), 731--739.

Cited By

View all
  • (2025)A Hierarchical Joint Training Based Replay-Guided Contrastive Transformer for Action Quality Assessment of Figure SkatingIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences10.1587/transfun.2024SMP0003E108.A:3(332-341)Online publication date: 1-Mar-2025
  • (2025)Dual-referenced assistive network for action quality assessmentNeurocomputing10.1016/j.neucom.2024.128786614(128786)Online publication date: Jan-2025
  • (2025)Vision-based human action quality assessment: A systematic reviewExpert Systems with Applications10.1016/j.eswa.2024.125642263(125642)Online publication date: Mar-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '23: Proceedings of the 31st ACM International Conference on Multimedia
October 2023
9913 pages
ISBN:9798400701085
DOI:10.1145/3581783
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. action quality assessment
  2. sports video dataset
  3. triple-stream contrastive learning
  4. video action analysis

Qualifiers

  • Research-article

Funding Sources

Conference

MM '23
Sponsor:
MM '23: The 31st ACM International Conference on Multimedia
October 29 - November 3, 2023
Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)112
  • Downloads (Last 6 weeks)11
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)A Hierarchical Joint Training Based Replay-Guided Contrastive Transformer for Action Quality Assessment of Figure SkatingIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences10.1587/transfun.2024SMP0003E108.A:3(332-341)Online publication date: 1-Mar-2025
  • (2025)Dual-referenced assistive network for action quality assessmentNeurocomputing10.1016/j.neucom.2024.128786614(128786)Online publication date: Jan-2025
  • (2025)Vision-based human action quality assessment: A systematic reviewExpert Systems with Applications10.1016/j.eswa.2024.125642263(125642)Online publication date: Mar-2025
  • (2024)CoFInAlProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/196(1771-1779)Online publication date: 3-Aug-2024
  • (2024)Bidirectional temporal and frame-segment attention for sparse action segmentation of figure skatingComputer Vision and Image Understanding10.1016/j.cviu.2024.104186249(104186)Online publication date: Dec-2024
  • (2024)Vision-Language Action Knowledge Learning for Semantic-Aware Action Quality AssessmentComputer Vision – ECCV 202410.1007/978-3-031-72946-1_24(423-440)Online publication date: 2-Oct-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media