Abstract
While deep learning has been widely used for video analytics, such as video classification and action detection, dense action detection with fast-moving subjects from sports videos is still challenging. In this work, we release yet another sports video benchmark P2ANet for
- [1] . 2016. Youtube-8M: A large-scale video classification benchmark. arXiv:1609.08675. Retrieved from https://arxiv.org/abs/1609.08675Google Scholar
- [2] . 2021. ViViT: A video vision transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6836–6846.Google ScholarCross Ref
- [3] . 2021. Dynamic 3D image simulation of basketball movement based on embedded system and computer vision. Microprocessors and Microsystems 81, C (2021), 103655.Google ScholarDigital Library
- [4] . 2017. Am I a baller? Basketball performance assessment from first-person videos. In Proceedings of the IEEE International Conference on Computer Vision. 2177–2185.Google ScholarCross Ref
- [5] . 2021. Is space-time attention all you need for video understanding?. In Proceedings of the International Conference on Machine Learning. PMLR, 813–824.Google Scholar
- [6] . 2022. Machine learning in real-time internet of things (IoT) systems: A survey. IEEE Internet of Things Journal 9, 11 (2022), 8364–8386.Google ScholarCross Ref
- [7] . 2015. ActivityNet: A large-scale video benchmark for human activity understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 961–970.Google ScholarCross Ref
- [8] Desheng Cai, Shengsheng Qian, Quan Fang, Jun Hu, Wenkui Ding, and Changsheng Xu. 2023. Heterogeneous graph contrastive learning network for personalized micro-video recommendation. IEEE Transactions on Multimedia 25 (2023), 2761–2773.Google Scholar
- [9] . 2017. Quo vadis, action recognition? A new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299–6308.Google ScholarCross Ref
- [10] . 2020. Explainable end-to-end deep learning for diabetic retinopathy detection across multiple datasets. Journal of Medical Imaging 7, 4 (2020), 044503–044503.Google ScholarCross Ref
- [11] Ekin D. Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V. Le. 2019. Autoaugment: Learning augmentation strategies from data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 113–123.Google Scholar
- [12] . 2017. Temporal context network for activity localization in videos. In Proceedings of the IEEE International Conference on Computer Vision. 5793–5802.Google ScholarCross Ref
- [13] . 2005. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Vol. 1, IEEE, 886–893.Google ScholarDigital Library
- [14] . 2021. SoccerNet-v2: A dataset and benchmarks for holistic understanding of broadcast soccer videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4508–4519.Google ScholarCross Ref
- [15] Jianfeng Dong, Xirong Li, Chaoxi Xu, Xun Yang, Gang Yang, Xun Wang, and Meng Wang. 2022. Dual encoding for video retrieval by Text. IEEE Trans. Pattern Anal. Mach. Intell. 44, 8 (Aug 2022), 4065–4080.Google Scholar
- [16] . 2017. TenniSet: A dataset for dense fine-grained event recognition, localisation and description. In Proceedings of the 2017 International Conference on Digital Image Computing: Techniques and Applications. IEEE, 1–8.Google ScholarCross Ref
- [17] . 2019. Slowfast networks for video recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6202–6211.Google ScholarCross Ref
- [18] . 2020. Multi-modal transformer for video retrieval. In Proceedings of the European Conference on Computer Vision. Springer, 214–229.Google ScholarDigital Library
- [19] Raghav Goyal, Samira Ebrahimi Kahou, Vincent Michalski, Joanna Materzynska, Susanne Westphal, Heuna Kim, Valentin Haenel, Ingo Fruend, Peter Yianilos, Moritz Mueller-Freitag, Florian Hoppe, Christian Thurau, Ingo Bax, and Roland Memisevic. 2017. The “something something” video database for learning and evaluating visual common sense. In Proceedings of the IEEE International Conference on Computer Vision. 5842–5850.Google Scholar
- [20] Chunhui Gu, Chen Sun, David A Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, and Jitendra Malik. 2018. Ava: A video dataset of spatio-temporally localized atomic visual actions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6047–6056.Google Scholar
- [21] . 2018. Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet?. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6546–6555.Google ScholarCross Ref
- [22] . 2016. A hierarchical deep temporal model for group activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1971–1980.Google ScholarCross Ref
- [23] . 2020. SoccerDB: A large-scale database for comprehensive video understanding. In Proceedings of the 3rd International Workshop on Multimedia Content Analysis in Sports. 1–8.Google ScholarDigital Library
- [24] . 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1725–1732.Google ScholarDigital Library
- [25] . 2021. Understanding test-time augmentation. In Proceedings of the International Conference on Neural Information Processing. Springer, 558–569.Google ScholarDigital Library
- [26] . 2021. MoViNets: Mobile video networks for efficient video recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16020–16030.Google ScholarCross Ref
- [27] . 2021. Contrastive learning for sports video: Unsupervised player classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4528–4536.Google ScholarCross Ref
- [28] . 2021. Table tennis stroke recognition using two-dimensional human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4576–4584.Google ScholarCross Ref
- [29] . 2019. TSM: Temporal shift module for efficient video understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7083–7093.Google ScholarCross Ref
- [30] . 2019. BMN: Boundary-matching network for temporal action proposal generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3889–3898.Google ScholarCross Ref
- [31] . 2018. BSN: Boundary sensitive network for temporal action proposal generation. In Proceedings of the European Conference on Computer Vision. 3–19.Google ScholarDigital Library
- [32] . 2017. Deep learning based basketball video analysis for intelligent arena application. Multimedia Tools and Applications 76, 23 (2017), 24983–25001.Google ScholarDigital Library
- [33] . 2019. Use what you have: Video retrieval using representations from collaborative experts. arXiv:1907.13487. Retrieved from https://arxiv.org/abs/1907.13487Google Scholar
- [34] . 2019. Building effective short video recommendation. In Proceedings of the 2019 IEEE International Conference on Multimedia & Expo Workshops. IEEE, 651–656.Google ScholarCross Ref
- [35] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10012–10022.Google Scholar
- [36] . 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10012–10022.Google ScholarCross Ref
- [37] . 2018. Attention clusters: Purely attention based local feature integration for video classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7834–7843.Google ScholarCross Ref
- [38] . 2023. Classification of seed corn ears based on custom lightweight convolutional neural network and improved training strategies. Engineering Applications of Artificial Intelligence 120, C (2023), 105936.Google ScholarDigital Library
- [39] . 2019. PaddlePaddle: An open-source deep learning platform from industrial practice. Frontiers of Data and Domputing 1, 1 (2019), 105–115.Google Scholar
- [40] . 2018. Sport action recognition with siamese spatio-temporal CNNs: Application to table tennis. In Proceedings of the 2018 International Conference on Content-Based Multimedia Indexing. IEEE, 1–6.Google ScholarCross Ref
- [41] . 2021. Three-stream 3D/1D CNN for fine-grained action classification and segmentation in table tennis. In Proceedings of the 4th International Workshop on Multimedia Content Analysis in Sports. 35–41.Google ScholarDigital Library
- [42] . 2021. Activity graph transformer for temporal action localization. arXiv:2101.08540. Retrieved from https://arxiv.org/abs/2101.08540Google Scholar
- [43] . 2010. Modeling temporal structure of decomposable motion segments for activity classification. In Proceedings of the European Conference on Computer Vision. Springer, 392–405.Google ScholarCross Ref
- [44] . 2012. Referee Biographical Information. Retrieved 13 December 2023 from http://www.olympedia.org/athletes/5004924Google Scholar
- [45] . 2021. Temporal context aggregation network for temporal action proposal refinement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 485–494.Google ScholarCross Ref
- [46] . 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems.Google Scholar
- [47] . 2020. FineGym: A hierarchical video dataset for fine-grained action understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2616–2625.Google ScholarCross Ref
- [48] . 2016. Temporal action localization in untrimmed videos via multi-stage CNNs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1049–1058.Google ScholarCross Ref
- [49] . 2014. Two-stream convolutional networks for action recognition in videos. Proceedings of the 27th International Conference on Neural Information Processing Systems.Google Scholar
- [50] . 2014. Action recognition in realistic sports videos. In Computer Vision in Sports, Thomas B. Moeslund, Graham Thomas, Adrian Hilton (Eds.)., Springer, 181–208.Google ScholarCross Ref
- [51] . 2021. Toward the perfect stroke: A multimodal approach for table tennis stroke evaluation. In Proceedings of the 2021 13th International Conference on Mobile Computing and Ubiquitous Network. IEEE, 1–5.Google ScholarCross Ref
- [52] Haisheng Su, Weihao Gan, Wei Wu, Yu Qiao, and Junjie Yan. 2021. Bsn++: Complementary boundary regressor with scale-balanced relation modeling for temporal action proposal generation. In Proceedings of the AAAI Conference on Artificial Intelligence, 35 (2021), 2602–2610.Google Scholar
- [53] Haritha Thilakarathne, Aiden Nibali, Zhen He, and Stuart Morgan. 2022. Pose is all you need: The pose only group activity recognition system (pogars). Machine Vision and Applications 33, 6 (2022), 95.Google Scholar
- [54] . 2015. Learning spatiotemporal features with 3D convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 4489–4497.Google ScholarDigital Library
- [55] . 2020. TTNet: Real-time temporal and spatial video analysis of table tennis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 884–885.Google ScholarCross Ref
- [56] . 2020. Use of video-analysis feedback within a six-month coach education program at a professional football club. Journal of Sport Psychology in Action 11, 2 (2020), 73–91.Google ScholarCross Ref
- [57] . 2019. Football match intelligent editing system based on deep learning. KSII Transactions on Internet and Information Systems 13, 10 (2019), 5130–5143.Google Scholar
- [58] . 2023. VideoMAE v2: Scaling video masked autoencoders with dual masking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14549–14560.Google ScholarCross Ref
- [59] Limin Wang, Yu Qiao, Xiaoou Tang. 2014. Action recognition and detection by combining motion and appearance features. THUMOS14 Action Recognition Challenge 1, 2 (2014), 2.Google Scholar
- [60] . 2016. Temporal segment networks: Towards good practices for deep action recognition. In Proceedings of the European Conference on Computer Vision. Springer, 20–36.Google ScholarCross Ref
- [61] . 2019. Knowledge-augmented multimodal deep regression Bayesian networks for emotion video tagging. IEEE Transactions on Multimedia 22, 4 (2019), 1084–1097.Google ScholarDigital Library
- [62] . 2018. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7794–7803.Google ScholarCross Ref
- [63] . 2021. Self-supervised learning for semi-supervised temporal action proposal. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1905–1914.Google ScholarCross Ref
- [64] Chen Wei, Haoqi Fan, Saining Xie, Chao-Yuan Wu, Alan Yuille, and Christoph Feichtenhofer. 2022. Masked feature prediction for self-supervised visual pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14668–14678.Google Scholar
- [65] . 2020. A multigrid method for efficiently training video models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 153–162.Google ScholarCross Ref
- [66] Fei Wu, Qingzhong Wang, Jiang Bian, Ning Ding, Feixiang Lu, Jun Cheng, Dejing Dou, and Haoyi Xiong. 2023. A survey on video action recognition in sports: Datasets, methods and applications. IEEE Transactions on Multimedia. 25 (2023), 7943–7966.Google Scholar
- [67] . 2018. Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification. In Proceedings of the European Conference on Computer Vision. 305–321.Google ScholarDigital Library
- [68] . 2021. Boundary-sensitive pre-training for temporal localization in videos. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7220–7230.Google ScholarCross Ref
- [69] . 2019. Multi-site user behavior modeling and its application in video recommendation. IEEE Transactions on Knowledge and Data Engineering 33, 1 (2019), 180–193.Google ScholarDigital Library
- [70] . 2016. End-to-end learning of action detection from frame glimpses in videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2678–2687.Google ScholarCross Ref
- [71] . 2016. Temporal action localization with pyramid of score distribution features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3093–3102.Google ScholarCross Ref
- [72] . 2015. Beyond short snippets: Deep networks for video classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4694–4702.Google Scholar
- [73] . 2020. Collective sports: A multi-task dataset for collective activity recognition. Image and Vision Computing 94 (2020), 103870.Google ScholarCross Ref
- [74] . 2020. A comprehensive study of deep video action recognition. arXiv:2012.06567. Retrieved from https://arxiv.org/abs/2012.06567Google Scholar
Index Terms
- P2ANet: A Large-Scale Benchmark for Dense Action Detection from Table Tennis Match Broadcasting Videos
Recommendations
A table tennis game for three players
OZCHI '06: Proceedings of the 18th Australia conference on Computer-Human Interaction: Design: Activities, Artefacts and EnvironmentsTable tennis is a game that can provide healthy exercise and is also a social pastime for players of all ages across the world. However, players have to be collocated to play, and three players cannot usually play at the same time in fair or equitable ...
Design and Analysis of a Virtual Table Tennis Game Machine Circuit
ICITEE '22: Proceedings of the 5th International Conference on Information Technologies and Electrical EngineeringWith the improvement of living standard, people pay more and more attention to physical exercise and leisure entertainment. Table tennis is the national sport of our country and is loved by the Chinese people. The traditional table tennis is limited by ...
Hopping-Pong: Computational Curveball in Table Tennis by Noncontact Ultrasound Force
SIGGRAPH '20: ACM SIGGRAPH 2020 Emerging TechnologiesAugmented sports is the attempts to enhance sports as entertainment and bridge the skill gap in sports between players by computer technologies. As an augmentation method, physically interfering with sports is proposed such as changing ball ...
Comments