skip to main content
research-article

P2ANet: A Large-Scale Benchmark for Dense Action Detection from Table Tennis Match Broadcasting Videos

Published:11 January 2024Publication History
Skip Abstract Section

Abstract

While deep learning has been widely used for video analytics, such as video classification and action detection, dense action detection with fast-moving subjects from sports videos is still challenging. In this work, we release yet another sports video benchmark P2ANet for Ping Pong-Action detection, which consists of 2,721 video clips collected from the broadcasting videos of professional table tennis matches in World Table Tennis Championships and Olympiads. We work with a crew of table tennis professionals and referees on a specially designed annotation toolbox to obtain fine-grained action labels (in 14 classes) for every ping-pong action that appeared in the dataset, and formulate two sets of action detection problems—action localization and action recognition. We evaluate a number of commonly seen action recognition (e.g., TSM, TSN, Video SwinTransformer, and Slowfast) and action localization models (e.g., BSN, BSN++, BMN, TCANet), using P2ANet for both problems, under various settings. These models can only achieve 48% area under the AR-AN curve for localization and 82% top-one accuracy for recognition since the ping-pong actions are dense with fast-moving subjects but broadcasting videos are with only 25 FPS. The results confirm that P2ANet is still a challenging task and can be used as a special benchmark for dense action detection from videos. We invite readers to examine our dataset by visiting the following link: https://github.com/Fred1991/P2ANET.

REFERENCES

  1. [1] Abu-El-Haija Sami, Kothari Nisarg, Lee Joonseok, Natsev Paul, Toderici George, Varadarajan Balakrishnan, and Vijayanarasimhan Sudheendra. 2016. Youtube-8M: A large-scale video classification benchmark. arXiv:1609.08675. Retrieved from https://arxiv.org/abs/1609.08675Google ScholarGoogle Scholar
  2. [2] Arnab Anurag, Dehghani Mostafa, Heigold Georg, Sun Chen, Lučić Mario, and Schmid Cordelia. 2021. ViViT: A video vision transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 68366846.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Bao Hongshu and Yao Xiang. 2021. Dynamic 3D image simulation of basketball movement based on embedded system and computer vision. Microprocessors and Microsystems 81, C (2021), 103655.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Bertasius Gedas, Park Hyun Soo, Yu Stella X., and Shi Jianbo. 2017. Am I a baller? Basketball performance assessment from first-person videos. In Proceedings of the IEEE International Conference on Computer Vision. 21772185.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Bertasius Gedas, Wang Heng, and Torresani Lorenzo. 2021. Is space-time attention all you need for video understanding?. In Proceedings of the International Conference on Machine Learning. PMLR, 813824.Google ScholarGoogle Scholar
  6. [6] Bian Jiang, Arafat Abdullah Al, Xiong Haoyi, Li Jing, Li Li, Chen Hongyang, Wang Jun, Dou Dejing, and Guo Zhishan. 2022. Machine learning in real-time internet of things (IoT) systems: A survey. IEEE Internet of Things Journal 9, 11 (2022), 83648386.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Heilbron Fabian Caba, Escorcia Victor, Ghanem Bernard, and Niebles Juan Carlos. 2015. ActivityNet: A large-scale video benchmark for human activity understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 961970.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Desheng Cai, Shengsheng Qian, Quan Fang, Jun Hu, Wenkui Ding, and Changsheng Xu. 2023. Heterogeneous graph contrastive learning network for personalized micro-video recommendation. IEEE Transactions on Multimedia 25 (2023), 2761–2773.Google ScholarGoogle Scholar
  9. [9] Carreira Joao and Zisserman Andrew. 2017. Quo vadis, action recognition? A new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 62996308.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Chetoui Mohamed and Akhloufi Moulay A.. 2020. Explainable end-to-end deep learning for diabetic retinopathy detection across multiple datasets. Journal of Medical Imaging 7, 4 (2020), 044503044503.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Ekin D. Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V. Le. 2019. Autoaugment: Learning augmentation strategies from data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 113–123.Google ScholarGoogle Scholar
  12. [12] Dai Xiyang, Singh Bharat, Zhang Guyue, Davis Larry S., and Chen Yan Qiu. 2017. Temporal context network for activity localization in videos. In Proceedings of the IEEE International Conference on Computer Vision. 57935802.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Dalal Navneet and Triggs Bill. 2005. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Vol. 1, IEEE, 886893.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Deliege Adrien, Cioppa Anthony, Giancola Silvio, Seikavandi Meisam J., Dueholm Jacob V., Nasrollahi Kamal, Ghanem Bernard, Moeslund Thomas B., and Droogenbroeck Marc Van. 2021. SoccerNet-v2: A dataset and benchmarks for holistic understanding of broadcast soccer videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 45084519.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Jianfeng Dong, Xirong Li, Chaoxi Xu, Xun Yang, Gang Yang, Xun Wang, and Meng Wang. 2022. Dual encoding for video retrieval by Text. IEEE Trans. Pattern Anal. Mach. Intell. 44, 8 (Aug 2022), 4065–4080.Google ScholarGoogle Scholar
  16. [16] Faulkner Hayden and Dick Anthony. 2017. TenniSet: A dataset for dense fine-grained event recognition, localisation and description. In Proceedings of the 2017 International Conference on Digital Image Computing: Techniques and Applications. IEEE, 18.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Feichtenhofer Christoph, Fan Haoqi, Malik Jitendra, and He Kaiming. 2019. Slowfast networks for video recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 62026211.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Gabeur Valentin, Sun Chen, Alahari Karteek, and Schmid Cordelia. 2020. Multi-modal transformer for video retrieval. In Proceedings of the European Conference on Computer Vision. Springer, 214229.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Raghav Goyal, Samira Ebrahimi Kahou, Vincent Michalski, Joanna Materzynska, Susanne Westphal, Heuna Kim, Valentin Haenel, Ingo Fruend, Peter Yianilos, Moritz Mueller-Freitag, Florian Hoppe, Christian Thurau, Ingo Bax, and Roland Memisevic. 2017. The “something something” video database for learning and evaluating visual common sense. In Proceedings of the IEEE International Conference on Computer Vision. 5842–5850.Google ScholarGoogle Scholar
  20. [20] Chunhui Gu, Chen Sun, David A Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, and Jitendra Malik. 2018. Ava: A video dataset of spatio-temporally localized atomic visual actions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6047–6056.Google ScholarGoogle Scholar
  21. [21] Hara Kensho, Kataoka Hirokatsu, and Satoh Yutaka. 2018. Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet?. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 65466555.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Ibrahim Mostafa S., Muralidharan Srikanth, Deng Zhiwei, Vahdat Arash, and Mori Greg. 2016. A hierarchical deep temporal model for group activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 19711980.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Jiang Yudong, Cui Kaixu, Chen Leilei, Wang Canjin, and Xu Changliang. 2020. SoccerDB: A large-scale database for comprehensive video understanding. In Proceedings of the 3rd International Workshop on Multimedia Content Analysis in Sports. 18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Karpathy Andrej, Toderici George, Shetty Sanketh, Leung Thomas, Sukthankar Rahul, and Fei-Fei Li. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 17251732.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Kimura Masanari. 2021. Understanding test-time augmentation. In Proceedings of the International Conference on Neural Information Processing. Springer, 558569.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Kondratyuk Dan, Yuan Liangzhe, Li Yandong, Zhang Li, Tan Mingxing, Brown Matthew, and Gong Boqing. 2021. MoViNets: Mobile video networks for efficient video recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1602016030.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Koshkina Maria, Pidaparthy Hemanth, and Elder James H.. 2021. Contrastive learning for sports video: Unsupervised player classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 45284536.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Kulkarni Kaustubh Milind and Shenoy Sucheth. 2021. Table tennis stroke recognition using two-dimensional human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 45764584.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Lin Ji, Gan Chuang, and Han Song. 2019. TSM: Temporal shift module for efficient video understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 70837093.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Lin Tianwei, Liu Xiao, Li Xin, Ding Errui, and Wen Shilei. 2019. BMN: Boundary-matching network for temporal action proposal generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 38893898.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Lin Tianwei, Zhao Xu, Su Haisheng, Wang Chongjing, and Yang Ming. 2018. BSN: Boundary sensitive network for temporal action proposal generation. In Proceedings of the European Conference on Computer Vision. 319.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Liu Wu, Yan Chenggang Clarence, Liu Jiangyu, and Ma Huadong. 2017. Deep learning based basketball video analysis for intelligent arena application. Multimedia Tools and Applications 76, 23 (2017), 2498325001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Liu Yang, Albanie Samuel, Nagrani Arsha, and Zisserman Andrew. 2019. Use what you have: Video retrieval using representations from collaborative experts. arXiv:1907.13487. Retrieved from https://arxiv.org/abs/1907.13487Google ScholarGoogle Scholar
  34. [34] Liu Yang, Lyu Cheng, Liu Zhiyuan, and Tao Dacheng. 2019. Building effective short video recommendation. In Proceedings of the 2019 IEEE International Conference on Multimedia & Expo Workshops. IEEE, 651656.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10012–10022.Google ScholarGoogle Scholar
  36. [36] Liu Ze, Lin Yutong, Cao Yue, Hu Han, Wei Yixuan, Zhang Zheng, Lin Stephen, and Guo Baining. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1001210022.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Long Xiang, Gan Chuang, Melo Gerard De, Wu Jiajun, Liu Xiao, and Wen Shilei. 2018. Attention clusters: Purely attention based local feature integration for video classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 78347843.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Ma Xiang, Li Yonglei, Wan Lipengcheng, Xu Zexin, Song Jiannong, and Huang Jinqiu. 2023. Classification of seed corn ears based on custom lightweight convolutional neural network and improved training strategies. Engineering Applications of Artificial Intelligence 120, C (2023), 105936.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Ma Yanjun, Yu Dianhai, Wu Tian, and Wang Haifeng. 2019. PaddlePaddle: An open-source deep learning platform from industrial practice. Frontiers of Data and Domputing 1, 1 (2019), 105115.Google ScholarGoogle Scholar
  40. [40] Martin Pierre-Etienne, Benois-Pineau Jenny, Péteri Renaud, and Morlier Julien. 2018. Sport action recognition with siamese spatio-temporal CNNs: Application to table tennis. In Proceedings of the 2018 International Conference on Content-Based Multimedia Indexing. IEEE, 16.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Martin Pierre-Etienne, Benois-Pineau Jenny, Péteri Renaud, and Morlier Julien. 2021. Three-stream 3D/1D CNN for fine-grained action classification and segmentation in table tennis. In Proceedings of the 4th International Workshop on Multimedia Content Analysis in Sports. 3541.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Nawhal Megha and Mori Greg. 2021. Activity graph transformer for temporal action localization. arXiv:2101.08540. Retrieved from https://arxiv.org/abs/2101.08540Google ScholarGoogle Scholar
  43. [43] Niebles Juan Carlos, Chen Chih-Wei, and Fei-Fei Li. 2010. Modeling temporal structure of decomposable motion segments for activity classification. In Proceedings of the European Conference on Computer Vision. Springer, 392405.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Olympedia. 2012. Referee Biographical Information. Retrieved 13 December 2023 from http://www.olympedia.org/athletes/5004924Google ScholarGoogle Scholar
  45. [45] Qing Zhiwu, Su Haisheng, Gan Weihao, Wang Dongliang, Wu Wei, Wang Xiang, Qiao Yu, Yan Junjie, Gao Changxin, and Sang Nong. 2021. Temporal context aggregation network for temporal action proposal refinement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 485494.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Ren Shaoqing, He Kaiming, Girshick Ross, and Sun Jian. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems.Google ScholarGoogle Scholar
  47. [47] Shao Dian, Zhao Yue, Dai Bo, and Lin Dahua. 2020. FineGym: A hierarchical video dataset for fine-grained action understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 26162625.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Shou Zheng, Wang Dongang, and Chang Shih-Fu. 2016. Temporal action localization in untrimmed videos via multi-stage CNNs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10491058.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Simonyan Karen and Zisserman Andrew. 2014. Two-stream convolutional networks for action recognition in videos. Proceedings of the 27th International Conference on Neural Information Processing Systems.Google ScholarGoogle Scholar
  50. [50] Soomro Khurram and Zamir Amir R.. 2014. Action recognition in realistic sports videos. In Computer Vision in Sports, Thomas B. Moeslund, Graham Thomas, Adrian Hilton (Eds.)., Springer, 181208.Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Sri-Iesaranusorn Panyawut, Garcia Felan Carlo, Tiausas Francis, Wattanakriengkrai Supatsara, Ikeda Kazushi, and Yoshimoto Junichiro. 2021. Toward the perfect stroke: A multimodal approach for table tennis stroke evaluation. In Proceedings of the 2021 13th International Conference on Mobile Computing and Ubiquitous Network. IEEE, 15.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Haisheng Su, Weihao Gan, Wei Wu, Yu Qiao, and Junjie Yan. 2021. Bsn++: Complementary boundary regressor with scale-balanced relation modeling for temporal action proposal generation. In Proceedings of the AAAI Conference on Artificial Intelligence, 35 (2021), 2602–2610.Google ScholarGoogle Scholar
  53. [53] Haritha Thilakarathne, Aiden Nibali, Zhen He, and Stuart Morgan. 2022. Pose is all you need: The pose only group activity recognition system (pogars). Machine Vision and Applications 33, 6 (2022), 95.Google ScholarGoogle Scholar
  54. [54] Tran Du, Bourdev Lubomir, Fergus Rob, Torresani Lorenzo, and Paluri Manohar. 2015. Learning spatiotemporal features with 3D convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 44894497.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. [55] Voeikov Roman, Falaleev Nikolay, and Baikulov Ruslan. 2020. TTNet: Real-time temporal and spatial video analysis of table tennis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 884885.Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Wadsworth Nick, Charnock Lewis, Russell Jamie, and Littlewood Martin. 2020. Use of video-analysis feedback within a six-month coach education program at a professional football club. Journal of Sport Psychology in Action 11, 2 (2020), 7391.Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Wang Bin, Shen Wei, Chen FanSheng, and Zeng Dan. 2019. Football match intelligent editing system based on deep learning. KSII Transactions on Internet and Information Systems 13, 10 (2019), 51305143.Google ScholarGoogle Scholar
  58. [58] Wang Limin, Huang Bingkun, Zhao Zhiyu, Tong Zhan, He Yinan, Wang Yi, Wang Yali, and Qiao Yu. 2023. VideoMAE v2: Scaling video masked autoencoders with dual masking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1454914560.Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Limin Wang, Yu Qiao, Xiaoou Tang. 2014. Action recognition and detection by combining motion and appearance features. THUMOS14 Action Recognition Challenge 1, 2 (2014), 2.Google ScholarGoogle Scholar
  60. [60] Wang Limin, Xiong Yuanjun, Wang Zhe, Qiao Yu, Lin Dahua, Tang Xiaoou, and Gool Luc Van. 2016. Temporal segment networks: Towards good practices for deep action recognition. In Proceedings of the European Conference on Computer Vision. Springer, 2036.Google ScholarGoogle ScholarCross RefCross Ref
  61. [61] Wang Shangfei, Hao Longfei, and Ji Qiang. 2019. Knowledge-augmented multimodal deep regression Bayesian networks for emotion video tagging. IEEE Transactions on Multimedia 22, 4 (2019), 10841097.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. [62] Wang Xiaolong, Girshick Ross, Gupta Abhinav, and He Kaiming. 2018. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 77947803.Google ScholarGoogle ScholarCross RefCross Ref
  63. [63] Wang Xiang, Zhang Shiwei, Qing Zhiwu, Shao Yuanjie, Gao Changxin, and Sang Nong. 2021. Self-supervised learning for semi-supervised temporal action proposal. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 19051914.Google ScholarGoogle ScholarCross RefCross Ref
  64. [64] Chen Wei, Haoqi Fan, Saining Xie, Chao-Yuan Wu, Alan Yuille, and Christoph Feichtenhofer. 2022. Masked feature prediction for self-supervised visual pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14668–14678.Google ScholarGoogle Scholar
  65. [65] Wu Chao-Yuan, Girshick Ross, He Kaiming, Feichtenhofer Christoph, and Krahenbuhl Philipp. 2020. A multigrid method for efficiently training video models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 153162.Google ScholarGoogle ScholarCross RefCross Ref
  66. [66] Fei Wu, Qingzhong Wang, Jiang Bian, Ning Ding, Feixiang Lu, Jun Cheng, Dejing Dou, and Haoyi Xiong. 2023. A survey on video action recognition in sports: Datasets, methods and applications. IEEE Transactions on Multimedia. 25 (2023), 7943–7966.Google ScholarGoogle Scholar
  67. [67] Xie Saining, Sun Chen, Huang Jonathan, Tu Zhuowen, and Murphy Kevin. 2018. Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification. In Proceedings of the European Conference on Computer Vision. 305321.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. [68] Xu Mengmeng, Pérez-Rúa Juan-Manuel, Escorcia Victor, Martinez Brais, Zhu Xiatian, Zhang Li, Ghanem Bernard, and Xiang Tao. 2021. Boundary-sensitive pre-training for temporal localization in videos. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 72207230.Google ScholarGoogle ScholarCross RefCross Ref
  69. [69] Yan Huan, Yang Chunfeng, Yu Donghan, Li Yong, Jin Depeng, and Chiu Dah Ming. 2019. Multi-site user behavior modeling and its application in video recommendation. IEEE Transactions on Knowledge and Data Engineering 33, 1 (2019), 180193.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. [70] Yeung Serena, Russakovsky Olga, Mori Greg, and Fei-Fei Li. 2016. End-to-end learning of action detection from frame glimpses in videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 26782687.Google ScholarGoogle ScholarCross RefCross Ref
  71. [71] Yuan Jun, Ni Bingbing, Yang Xiaokang, and Kassim Ashraf A.. 2016. Temporal action localization with pyramid of score distribution features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 30933102.Google ScholarGoogle ScholarCross RefCross Ref
  72. [72] Ng Joe Yue-Hei, Hausknecht Matthew, Vijayanarasimhan Sudheendra, Vinyals Oriol, Monga Rajat, and Toderici George. 2015. Beyond short snippets: Deep networks for video classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 46944702.Google ScholarGoogle Scholar
  73. [73] Zalluhoglu Cemil and Ikizler-Cinbis Nazli. 2020. Collective sports: A multi-task dataset for collective activity recognition. Image and Vision Computing 94 (2020), 103870.Google ScholarGoogle ScholarCross RefCross Ref
  74. [74] Zhu Yi, Li Xinyu, Liu Chunhui, Zolfaghari Mohammadreza, Xiong Yuanjun, Wu Chongruo, Zhang Zhi, Tighe Joseph, Manmatha R., and Li Mu. 2020. A comprehensive study of deep video action recognition. arXiv:2012.06567. Retrieved from https://arxiv.org/abs/2012.06567Google ScholarGoogle Scholar

Index Terms

  1. P2ANet: A Large-Scale Benchmark for Dense Action Detection from Table Tennis Match Broadcasting Videos

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 4
        April 2024
        676 pages
        ISSN:1551-6857
        EISSN:1551-6865
        DOI:10.1145/3613617
        • Editor:
        • Abdulmotaleb El Saddik
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 January 2024
        • Online AM: 28 November 2023
        • Accepted: 21 October 2023
        • Revised: 20 September 2023
        • Received: 27 March 2023
        Published in tomm Volume 20, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
      • Article Metrics

        • Downloads (Last 12 months)163
        • Downloads (Last 6 weeks)30

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text