Learning group interaction for sports video understanding from a perspective of athlete

He, Rui; Fu, Zehua; Liu, Qingjie; Wang, Yunhong; Chen, Xunxun

doi:10.1007/s11704-023-2525-y

Learning group interaction for sports video understanding from a perspective of athlete

Research Article
Published: 18 December 2023

Volume 18, article number 184705, (2024)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Rui He^1,2,3,
Zehua Fu²,
Qingjie Liu^1,2,
Yunhong Wang^1,2 &
…
Xunxun Chen³

60 Accesses
7 Altmetric
1 Mention
Explore all metrics

Abstract

Learning activities interactions between small groups is a key step in understanding team sports videos. Recent research focusing on team sports videos can be strictly regarded from the perspective of the audience rather than the athlete. For team sports videos such as volleyball and basketball videos, there are plenty of intra-team and inter-team relations. In this paper, a new task named Group Scene Graph Generation is introduced to better understand intra-team relations and inter-team relations in sports videos. To tackle this problem, a novel Hierarchical Relation Network is proposed. After all players in a video are finely divided into two teams, the feature of the two teams’ activities and interactions will be enhanced by Graph Convolutional Networks, which are finally recognized to generate Group Scene Graph. For evaluation, built on Volleyball dataset with additional 9660 team activity labels, a Volleyball+ dataset is proposed. A baseline is set for better comparison and our experimental results demonstrate the effectiveness of our method. Moreover, the idea of our method can be directly utilized in another video-based task, Group Activity Recognition. Experiments show the priority of our method and display the link between the two tasks. Finally, from the athlete’s view, we elaborately present an interpretation that shows how to utilize Group Scene Graph to analyze teams’ activities and provide professional gaming suggestions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dual attention based spatial-temporal inference network for volleyball group activity recognition

Article 07 October 2022

Learning Key Actors and Their Interactions for Group Activity Recognition

Multi-camera Temporal Grouping for Play/Break Event Detection in Soccer Games

References

Pandit S, Honavar V. Ontology-guided extraction of complex nested relationships. In: Proceedings of the 22nd IEEE International Conference on Tools with Artificial Intelligence. 2010, 173–178
Gupta P, Yaseen U, Schütze H. Linguistically informed relation extraction and neural architectures for nested named entity recognition in BioNLP-OST 2019. In: Proceedings of the 5th Workshop on BioNLP Open Shared Tasks. 2019, 132–142
Işikman Ö Ö, Özyer T, Zarour O, Alhajj R, Polat F. TempoXML: nested bitemporal relationship modeling and conversion tool for fuzzy XML. Information Sciences, 2012, 193: 247–274
Article Google Scholar
Azar S M, Atigh M G, Nickabadi A, Alahi A. Convolutional relational machine for group activity recognition. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 7892–7901
Wu J, Wang L, Wang L, Guo J, Wu G. Learning actor relation graphs for group activity recognition. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 9964–9974
Ibrahim M S, Mori G. Hierarchical relational networks for group activity recognition and retrieval. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 742–758
Qi M, Wang Y, Qin J, Li A, Luo J, Van Gool L. stagNet: an attentive semantic RNN for group activity recognition. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 104–120
Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2005, 886–893
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 770–778
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735–1780
Article Google Scholar
Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks. In: Proceedings of the 5th International Conference on Learning Representations. 2016
Chang X, Ren P, Xu P, Li Z, Chen X, Hauptmann A. Scene graphs: a survey of generations and applications. 2021, arXiv preprint arXiv: 2104.01111
Agarwal A, Mangal A, Vipul. Visual relationship detection using scene graphs: a survey. 2020, arXiv preprint arXiv: 2005.08045
Johnson J, Krishna R, Stark M, Li L J, Shamma D A, Bernstein M S, Li F F. Image retrieval using scene graphs. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015, 3668–3678
Ibrahim M S, Muralidharan S, Deng Z, Vahdat A, Mori G. A hierarchical deep temporal model for group activity recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 1971–1980
Wang H, Schmid C. Action recognition with improved trajectories. In: Proceedings of 2013 IEEE International Conference on Computer Vision. 2013, 3551–3558
Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos. In: Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014, 568–576
Ng J Y H, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G. Beyond short snippets: deep networks for video classification. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015, 4694–4702
Ji S, Xu W, Yang M, Yu K. 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(1): 221–231
Article Google Scholar
Arnab A, Sun C, Schmid C. Unified graph structured models for video understanding. In: Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. 2021, 8097–8106
Ramanathan V, Huang J, Abu-El-Haija S, Gorban A, Murphy K, Li F F. Detecting events and key actors in multi-person videos. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 3043–3053
Niu Z, Gao X, Tian Q. Tactic analysis based on real-world ball trajectory in soccer video. Pattern Recognition, 2012, 45(5): 1937–1947
Article Google Scholar
FarajiDavar N, de Campos T, Kittler J, Yan F. Transductive transfer learning for action recognition in tennis games. In: Proceedings of 2011 IEEE International Conference on Computer Vision Workshops. 2011, 1548–1553
Toheed A, Javed A, Irtaza A, Dawood H, Dawood H, Alfakeeh A S. An automated framework for advertisement detection and removal from sports videos using audio-visual cues. Frontiers of Computer Science, 2021, 15(2): 152313
Article Google Scholar
Choi W, Shahid K, Savarese S. What are they doing?: collective activity classification using spatio-temporal relationship among people. In: Proceedings of the 12th IEEE International Conference on Computer Vision Workshops, ICCV Workshops. 2009, 1282–1289
Choi W, Shahid K, Savarese S. Learning context for collective activity recognition. In: Proceedings of the CVPR 2011. 2011, 3273–3280
Choi W, Savarese S. A unified framework for multi-target tracking and collective activity recognition. In: Proceedings of the 12th European Conference on Computer Vision. 2012, 215–230
Lan T, Sigal L, Mori G. Social roles in hierarchical models for human activity recognition. In: Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. 2012, 1354–1361
Lan T, Wang Y, Yang W, Robinovitch S N, Mori G. Discriminative latent models for recognizing contextual group activities. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(8): 1549–1562
Article Google Scholar
Kong L, Qin J, Huang D, Wang Y, Van Gool L. Hierarchical attention and context modeling for group activity recognition. In: Proceedings of 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. 2018, 1328–1332
Lu J, Xiong C, Parikh D, Socher R. Knowing when to look: adaptive attention via a visual sentinel for image captioning. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 3242–3250
Cao Y, Chen D, Xu Z, Li H, Luo P. Nested relation extraction with iterative neural network. Frontiers of Computer Science, 2021, 15(3): 153323
Article Google Scholar
Lv X, Xiao W, Zhang Y, Liao X, Jin H, Hua Q. An effective framework for asynchronous incremental graph processing. Frontiers of Computer Science, 2019, 13(3): 539–551
Article Google Scholar
Ju W, Li J, Yu W, Zhang R. iGraph: an incremental data processing system for dynamic graph. Frontiers of Computer Science, 2016, 10(3): 462–476
Article Google Scholar
Wang H, Wang S B, Li Y F. Instance selection method for improving graph-based semi-supervised learning. Frontiers of Computer Science, 2018, 12(4): 725–735
Article Google Scholar
Wang C, Zhou G, He X, Zhou A. NERank+: a graph-based approach for entity ranking in document collections. Frontiers of Computer Science, 2018, 12(3): 504–517
Article Google Scholar
Por L Y, Ku C S, Islam A, Ang T F. Graphical password: prevent shoulder-surfing attack using digraph substitution rules. Frontiers of Computer Science, 2017, 11(6): 1098–1108
Article Google Scholar
Wang Y, Wang H, Li J, Gao H. Efficient graph similarity join for information integration on graphs. Frontiers of Computer Science, 2016, 10(2): 317–329
Article Google Scholar
Ma S, Li J, Hu C, Lin X, Huai J. Big graph search: challenges and techniques. Frontiers of Computer Science, 2016, 10(3): 387–398
Article Google Scholar
Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li L J, Shamma D A, Bernstein M S, Li F F, Visual genome: connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision, 2017, 123(1): 32–73
Article MathSciNet Google Scholar
Xu D, Zhu Y, Choy C B, Li F F. Scene graph generation by iterative message passing. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 3097–3106
Tang K, Niu Y, Huang J, Shi J, Zhang H. Unbiased scene graph generation from biased training. 2020, arXiv preprint arXiv: 2002.11949
Zellers R, Yatskar M, Thomson S, Choi Y. Neural motifs: scene graph parsing with global context. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 5831–5840
Tang K, Zhang H, Wu B, Luo W, Liu W. Learning to compose dynamic tree structures for visual contexts. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 6619–6628
Cormen T H, Leiserson C E, Rivest R L, Stein C. Introduction to Algorithms. 2nd ed. Cambridge: MIT Press, 2001
Google Scholar
Tai K S, Socher R, Manning C D. Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 2015, 1556–1566
Qi M, Li W, Yang Z, Wang Y, Luo J. Attentive relational networks for mapping images to scene graphs. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 3957–3966
Liu R, Han Y. Instance-sequence reasoning for video question answering. Frontiers of Computer Science, 2022, 16(6): 166708
Article Google Scholar
He K, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 2980–2988
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Köpf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S. PyTorch: an imperative style, high-performance deep learning library. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 721
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 2818–2826
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of the 3rd International Conference on Learning Representations. 2015
Yang J, Lu J, Lee S, Batra D, Parikh D. Graph R-CNN for scene graph generation. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 690–706
Deng Z, Vahdat A, Hu H, Mori G. Structure inference machines: recurrent neural networks for analyzing relations in group activity recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 4772–4781
Hajimirsadeghi H, Yan W, Vahdat A, Mori G. Visual recognition by counting instances: a multi-instance cardinality potential kernel. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015, 2596–2605
Li X, Chuah M C. SBGAR: semantics based group activity recognition. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 2895–2904
Shu T, Todorovic S, Zhu S C. CERN: confidence-energy recurrent network for group activity recognition. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 4255–4263
Bagautdinov T, Alahi A, Fleuret F, Fua P, Savarese S. Social scene understanding: end-to-end multi-person action localization and collective activity recognition. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 3425–3434

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. U20B2069) and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Intelligent Recognition and Image Processing (IRIP) Lab, School of Computer Science and Engineering, Beihang University, Beijing, 100191, China
Rui He, Qingjie Liu & Yunhong Wang
Hangzhou Innovation Institute, Behang University, Hangzhou, 310051, China
Rui He, Zehua Fu, Qingjie Liu & Yunhong Wang
National Computer Network Emergency Response Technical Team/Coordination Center of China (CNCERT or CNCERT/CC), Beijing, 100029, China
Rui He & Xunxun Chen

Authors

Rui He
View author publications
You can also search for this author inPubMed Google Scholar
Zehua Fu
View author publications
You can also search for this author inPubMed Google Scholar
Qingjie Liu
View author publications
You can also search for this author inPubMed Google Scholar
Yunhong Wang
View author publications
You can also search for this author inPubMed Google Scholar
Xunxun Chen
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Qingjie Liu.

Additional information

Rui He received the BS degree in computer science and technology from Henan University of Science and Technology, China in 2011 and the MS degree in computer science and technology from Beihang University, China in 2016. He is currently pursuing the PhD degree with the Laboratory of Intelligent Recognition and Image Processing, Beijing Key Laboratory of Digital Media, Beihang University, also with National Computer Network Emergency Response Technical Team/Coordination Center of China (CNCERT or CNCERT/CC). His research interests include image processing, video analysis, pattern recognition and digital machine learning.

Zehua Fu holds a Bachelor’s degree and a Master’ s degree in software engineering from Southwest Jiaotong University, China, as well as a PhD in computer science from Ecole Centrale de Lyon in France. Currently, she serves as an Associate Researcher at Hangzhou Innovation Institute of Beihang University, China. Her current research interests include 3D image processing and computer vision.

Qingjie Liu received the BS degree in computer science from Hunan University, China, and the PhD degree in computer science from Beihang University, China. He is currently an Associate Professor with the School of Computer Science and Engineering, Beihang University, China where he is with Laboratory of Intelligent Recognition and Image Processing, Beijing Key Laboratory of Digital Media. He is also a Distinguished Research Fellow with the Zhongguancun Laboratory and Hangzhou Institute of Innovation, Beihang University, China. His current research interests include remote sensing image/video analysis, pattern recognition, and computer vision.

Yunhong Wang received the BS degree from Northwestern Polytechnical University, China in 1989, and the MS and PhD degrees from the Nanjing University of Science and Technology, China in 1995 and 1998, respectively, all in electronics engineering. She was with the National Laboratory of Pattern Recognition, Institute of

Automation, Chinese Academy of Sciences, China from 1998 to 2004. Since 2004, she has been a Professor with the School of Computer Science and Engineering, Beihang University, China where she is currently the Director of Laboratory of Intelligent Recognition and Image Processing, Beijing Key Laboratory of Digital Media. Her research results have published at prestigious journals and prominent conferences, such as the TPAMI, TIP, TIFS, CVPR, ICCV, ECCV. Her research interests include biometrics, pattern recognition, computer vision and image processing.

Xunxun Chen received the PhD degree from Harbin Institute of Technology, China in 2005. He is a Professor and PhD supervisor with the Institute of Information Engineering, Beihang University, China, also with National Computer Network Emergency Response Technical Team/Coordination Center of China (CNCERT or CNCERT/CC). His research interests include network security and data storage and management.

Electronic Supplementary Material