FSD-10: A fine-grained classification dataset for figure skating
Introduction
Due to the popularity of media-sharing platforms, sports content analysis (SCA [23]) has become an important research topic in computer vision [29], [8], [22]. A vast amount of sports videos are piled up in computer storage, which are potential resources for deep learning. In recent years, many enterprises (e.g. Bloomberg, SAP) have focus on SCA [23]. In SCA, datasets are required to reflect characteristics of competitive sports, which is a guarantee for training deep learning models. Generally, competitive sports content is a series of diversified, high professional and ultimate actions. Unfortunately, existing trending human motion datasets (e.g. HMDB51 [14], UCF50 [25]) or action datasets of human sports (e.g. MIT Olympic sports[20], Nevada Olympic sports [18]) are not quite representative of the richness and complexity of competitive sports. The discriminant of an action largely depends on scene, person and object elements [10], which limit the research process of action recognition. This dependence inclines that most human action datasets concerning form (contents) rather than motion, while both motion and form are important in human action analysis.
To address the above issues, this paper proposes a figure skating dataset called FSD-10. FSD-10 consists of 1484 figure skating videos with 10 different actions manually labeled. These skating videos are segmented from around 80 h of competitions of worldwide figure skating championships in 2017–2018. FSD-10 videos range from 3 s to 30 s, and the camera is moving to focus the skater to ensure that person appears in each frame during the process of actions. Compared with existing datasets, our proposed dataset has several appealing properties. First, actions of FSD-10 are original from figure skating competitions, which are consistent in type and sports environment (including skating rink and auditorium). Second, actions in FSD-10 are complex in content and fast in action switching. For instance, the complex 2-loop-Axel jump is finished in only about 2s in Fig. 2. It’s worth note that the jump type heavily depends on the take off process, which is a hard-captured moment. The above two aspects create difficulties for machine learning model to conclude the action types by a single pose or background.
Along with the introduction of FSD-10, we propose a key frame indicator called human pose scatter (HPS). Based on HPS, we adopt key frame sampling to improve current video classification methods and evaluate these methods on FSD-10. Furthermore, experimental results validate that key frame sampling is an important approach to improve performance of frame-based model in FSD-10, which is in concert with cognition rules of human in figure skating. The main contributions of this paper can be summarised as follows.
- •
To our best knowledge, FSD-10 is the first fine grained, full motion-based dataset without multi-scene and object elements.
- •
To set a baseline for future achievements, we also benchmark state-of-the-art sport classification methods on FSD-10. Besides, the key frame sampling is proposed to capture the pivotal action details in competitive sports, which achieves better performance than state-of-the-art methods in FSD-10.
In addition, compared to current datasets, we hope FSD-10 will be a challenging benchmark dataset for background-independent action recognition, which makes an excellent contribution to a specialized workshop in sports. The aim of our research is to explore human motion rather than form of video analysis. Motion is an important topic in action research, which can be applied in many fields (such as sport content analysis, physical rehabilitation, human environmental behavior, physical emotion analysis in cognitive psychology, video synthesis). In this regard, motion related datasets and methods are urgently needed. Therefore, new dataset provides a broad scope and challenges researchers with general core problems of computer vision.
Section snippets
Related works
Professional sports dataset (PSD) is a series of competitive sports actions. Compared with common action dataset, for example UCF101 [25] and HMDB [14], PSD is consist of highly specialized actions instead of actions in daily life. MIT Olympic sports [20] and Nevada Olympic sports [19] are examples of PSD, which are derived from Olympic competitions (see Table 1).
In PSD, classification is important to attract people’s attention and to highlight athlete’s performance, and even to assist referees
Figure Skating Dataset
In this section, we describe details regarding the setup and protocol followed to capture our dataset. Then, we discuss the temporal segmentation and assessment tasks of FSD-10 and its future extensions.
Keyframe based temporal segment network (KTSN)
In this section, we give a detailed description of our keyframe based temporal segment network. Specifically, we first discuss the motivation of key frame sampling in Section 4.1. Then, sampling method of key frame is proposed in Section 4.2 Human Pose Scatter (HPS), 4.3 Key frame sampling. Finally, network structure of KTSN is detailedly introduced in Section 4.4.
Experiments
In order to provide a benchmark for our FSD-10 dataset, we evaluate various approaches under three different modalities: RGB, optical flow and anatomical keypoints (skeleton). We also conduct experiments on cross dataset validation. The following describes the details of our experiments and results.
Conclusion
In this paper, we build an action dataset for competitive sports analysis, which is characterised by high action switching speed and complex action content. We find that motion is more valuable than form (content and background) in this task. Therefore, compared with other related datasets, our dataset focuses on the action itself rather background. Our dataset creates many interesting tasks, such as fine-grained action classification, action quality assessment and action temporal segmentation.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work is supported in part by the National Key Research and Development Program of China (2017YFB1300200, 2017YFB1300203) and the Fundamental Research Funds for the Central Universities – No. DUT20RC(5)010.
Shenglan Liu received the Ph.D. degree in the School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning, China, in 2015. Currently, he is an associate professor with the School of Innovation and Entrepreneurship, Dalian University of Technology, Dalian, Liaoning, China. His research interests include manifold learning, human perception computing. Dr. Liu is currently the editorial board member of Neurocomputing.
References (34)
- et al.
Determining optical flow
Artificial Intelligence
(1981) - et al.
Artificial convolution neural network for medical image pattern recognition
Neural Networks
(1995) - et al.
Principal component analysis
Chemometrics and Intelligent Laboratory Systems
(1987) - S. Abu-El-Haija, N. Kothari, J. Lee, P. Natsev, G. Toderici, B. Varadarajan, S. Vijayanarasimhan, Youtube-8m: A...
Handbook of Image and Video Processing
(2010)- Z. Cao, G. Hidalgo, T. Simon, S.E. Wei, Y. Sheikh, OpenPose: realtime multi-person 2D pose estimation using Part...
- Z. Cao, T. Simon, S.E. Wei, Y. Sheikh, Realtime multi-person 2d pose estimation using part affinity fields, in: CVPR,...
- et al.
Quo vadis, action recognition? A new model and the kinetics dataset
- et al.
Human action recognition based on integrating body pose, part shape, and motion
IEEE Access
(2018) - et al.
Soccernet: A scalable dataset for action spotting in soccer videos
Spatio-temporal analysis of team sports
ACM Computing Surveys (CSUR)
Deep residual learning for image recognition
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Human action recognition without human
Densely connected convolutional networks
Large-scale video classification with convolutional neural networks
Hmdb: a large video database for human motion recognition
Cited by (14)
HFGCN-Based Action Recognition System for Figure Skating
2024, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Skating-Mixer: Long-Term Sport Audio-Visual Modeling with MLPs
2023, Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 20233D-Yoga: A 3D Yoga Dataset for Visual-Based Hierarchical Sports Action Analysis
2023, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Shenglan Liu received the Ph.D. degree in the School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning, China, in 2015. Currently, he is an associate professor with the School of Innovation and Entrepreneurship, Dalian University of Technology, Dalian, Liaoning, China. His research interests include manifold learning, human perception computing. Dr. Liu is currently the editorial board member of Neurocomputing.
Xiang Liu received the B.E. degree from the Dalian University of Technology, China, in 2017. He is currently working toward the M.E. degree in the School of Computer Science and Technology, Dalian University of Technology, China. His research interests include visualization, crowd counting and machine learning.
Gao Huang received the Ph.D. degree in the Tsinghua University, China, in 2015. Currently, he is an Assistant Professor in the Department of Automation, Tsinghua University. His research interests include machine learning and computer vision, in particular deep learning, resource-efficient learning and unsupervised learning. His work on DenseNet won the Best Paper Award of CVPR (2017).
Hong Qiao received the B.E. degree in hydraulics and control and the M.E. degree in robotics and automation from Xi’an Jiaotong University, Xi’an, China, and the Ph.D. degree in robotics control from De Montfort University, Leicester, U.K., in 1995. She was an Assistant Professor with the City University of Hong Kong, Hong Kong, and a Lecturer with the University of Manchester, Manchester, U.K., from 1997 to 2004. She is currently a Professor with the State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China. Her current research interests include robotics, machine learning and pattern recognition.
Lianyu Hu received the B.S. degree of Electronics and Information Engineering from Dalian University of Technology, China, 2018. Currently, he is a M.S. degree candidate in the Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology. His research interests include action recognition, graph covolution networks and skeleton-based video classification.
Dong Jiang received the B.S. degree in the School of Mechanical Engineering, Dalian University of Technology, China, in 2018. Currently, he is working toward the M.S. degree in the School of Computer Science and Technology, Dalian University of Technology, China. His research interests include graph convolution and video classification.
Aibin Zhang received the B.S. degree in the School of Microelectronics, Dalian University of Technology, China, in 2020. He is about to study for a M.S. degree in the School of Computer Science and Technology, Dalian University of Technology, China.His future research directions mainly include deep learning and computer vision.
Yang Liu received his B.S. degree and Ph.D. degree in the School of Computer Science and Technology from Dalian University of Technology, China, in 2013 and 2019 respectively. He is currently a lecturer in Dalian University of Technology, China. His research interests include video analysis, image retrieval and machine learning.
Ge Guo currently studying as an undergraduate majoring in CST at Dalian University of Technology, Dalian, China. Her research focuses on data visualization and human–computer interaction.