Abstract
Many current deep learning approaches to action recognition focus on recognizing concrete (e.g., single actor) actions in trimmed videos from datasets such as UCF-101 and HMDB-51. However, high-level semantic analysis of sports videos often requires recognizing more abstract events or situations involving multiple players with longer time-scale context. This paper builds upon inflated 3D (I3D) ConvNets for video action recognition to detect and differentiate six abstract categories of events in untrimmed videos of soccer games from multiple fixed cameras: normal play, plus breaks in play due to kick-offs, free kicks, throw-ins, and goal and corner kicks. Raw video unit classifications by variants of the basic I3D network are post-processed by two novel and efficient grouping methods for localizing the boundaries of events. Our experiments show that the proposed methods can achieve 84.2% weighted precision for event categories at the level of video units, and boost event temporal localization mean average precision at 0.5 tIoU (mAP@0.5) to 62.0%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In particular, we study soccer break event categories as defined in the FIFA rule book [8]: (1) kick-offs (to start each half or after a goal), (2) free kicks (after a foul), (3) penalty kicks, (4) throw-ins (touch line out of bounds), (5) goal kicks (end line out of bounds caused by offensive team), (6) corner kicks (end line out of bounds caused by defensive team), and (7) dropped balls (all other situations), Detecting these break event segments in the soccer game video is a difficult task due to the sparsity within a video, but also they have different duration.
References
Assfalg, J., Bertini, M., Colombo, C., Bimbo, A.D., Nunziati, W.: Semantic annotation of soccer videos: automatic highlights detection. Comput. Vis. Image Underst. 92(2), 285–305 (2003)
Bozorgpour, A., Fotouhi, M., Kasaei, S.: Robust homography optimization in soccer scenes. In: Iranian Conference on Electrical Engineering (2015)
Canales, F.: Automated semantic annotation of football games from TV broadcast. Ph.D. thesis, Department of Informatics, TUM Munich (2013)
Carreira, J., Zisserman, A.: Quo vadis, action recognition? a new model and the Kinetics dataset. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)
Chao, Y.W., Vijayanarasimhan, S., Seybold, B., Ross, D.A., Deng, J., Sukthankar, R.: Rethinking the faster R-CNN architecture for temporal action localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1130–1139 (2018)
DeepMind: Convolutional neural network model for video classification trained on the Kinetics dataset (2017). https://github.com/deepmind/kinetics-i3d
Fani, M., Yazdi, M., Clausi, D., Wong, A.: Soccer video structure analysis by parallel feature fusion network and hidden-to-observable transferring markov model. IEEE Access 5, 27322–27336 (2017)
Fédération Internationale de Football Association (FIFA): Laws of the game (2015). https://img.fifa.com/image/upload/datdz0pms85gbnqy4j3k.pdf
Gao, J., Chen, K., Nevatia, R.: Ctap: Complementary temporal action proposal generation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 68–83 (2018)
Gao, J., Yang, Z., Chen, K., Sun, C., Nevatia, R.: Turn tap: temporal unit regression network for temporal action proposals. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3628–3636 (2017)
Gerke, S., Muller, K., Schafer, R.: Soccer jersey number recognition using convolutional neural networks. In: IEEE International Conference on Computer Vision Workshop (2015)
Giancola, S., Amine, M., Dghaily, T., Ghanem, B.: Soccernet: a scalable dataset for action spotting in soccer videos. In: CVPR Workshop on Computer Vision in Sports (2018)
Grushin, A., Monner, D.D., Reggia, J.A., Mishra, A.: Robust human action recognition via long short-term memory. In: The 2013 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2013)
Huda, N., Jensen, K., Gade, R., Moeslund, T.: Estimating the number of soccer players using simulation-based occlusion handling. In: CVPR Workshop on Computer Vision in Sports (2018)
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)
Kazemi, V., Sullivan, J.: Using richer models for articulated pose estimation of footballers. In: British Machine Vision Conference (2012)
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: IEEE International Conference on Computer Vision (2011)
Leo, M., Mosca, N., Spagnolo, P., Mazzeo, P., et al.: A semi-automatic system for ground truth generation of soccer video sequences. In: Advanced Video and Signal Based Surveillance (2009)
Liu, T., et al.: Soccer video event detection using 3D convolutional networks and shot boundary detection via deep feature distance. In: International Conference on Neural Information Processing (2017)
Lu, K., Chen, J., Little, J.J., He, H.: Light cascaded convolutional neural networks for accurate player detection. In: British Machine Vision Conference (2017)
Maksai, A., Wang, X., Fua, P.: What players do with the ball: A physically constrained interaction modeling. In: Proceedings of the IEEE conference on computer vision and pattern recognition (2016)
Neyshabur, B., Bhojanapalli, S., McAllester, D., Srebro, N.: Exploring generalization in deep learning. In: Advances in Neural Information Processing Systems, pp. 5947–5956 (2017)
Ni, B., Yang, X., Gao, S.: Progressively parsing interactional objects for fine grained action detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1020–1028 (2016)
Pettersen, S.A., et al.: Soccer video and player position dataset. In: ACM Multimedia Systems Conference (2014)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems, pp. 568–576 (2014)
Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. Technical report CRCV-TR-12-01, University of Central Florida (2012)
Sozykin, K., Khan, A.M., Protasov, S., Hussain, R.: Multi-label class-imbalanced action recognition in hockey videos via 3D convolutional neural networks. In: IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (2018)
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9 (2015)
Tong, X., Lu, H., Liu, Q.: An effective and fast soccer ball detection and tracking method. In: International Conference on Pattern Recognition (2004)
Tsunoda, T., Komori, Y., Matsugu, M., Harada, T.: Football action recognition using hierarchical LSTM. In: CVPR Workshop on Computer Vision in Sports (2017)
Wagenaar, M., Okafor, E., Frencken, W., Wiering, M.: Using deep convolutional neural networks to predict goal-scoring opportunities in soccer. In: International Conference on Pattern Recognition Applications and Methods (2017)
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of the IEEE international conference on computer vision, pp. 3551–3558 (2013)
Wang, L., Li, W., Li, W., Van Gool, L.: Appearance-and-relation networks for video classification. arXiv preprint arXiv:1711.09125 (2017)
Wang, L., Xiong, Y., Lin, D., Van Gool, L.: Untrimmednets for weakly supervised action recognition and detection. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 4325–4334 (2017)
Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2
Wang, Y., Song, J., Wang, L., Van Gool, L., Hilliges, O.: Two-stream SR-CNNs for action recognition in videos. In: BMVC (2016)
Xie, L., Xu, P., Chang, S.F., Divakaran, A., Sun, H.: Structure analysis of soccer video with domain knowledge and hidden markov models. Pattern Recogn. Lett. 25(7), 767–775 (2004)
Yuan, J., Ni, B., Yang, X., Kassim, A.A.: Temporal action localization with pyramid of score distribution features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3093–3102 (2016)
Zach, C., Pock, T., Bischof, H.: A duality based approach for realtime TV-L1 optical flow. In: Hamprecht, F.A., Schnörr, C., Jähne, B. (eds.) DAGM 2007. LNCS, vol. 4713, pp. 214–223. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74936-3_22
Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530 (2016)
Zhao, Y., Xiong, Y., Wang, L., Wu, Z., Tang, X., Lin, D.: Temporal action detection with structured segment networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2914–2923 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Song, C., Rasmussen, C. (2019). Multi-camera Temporal Grouping for Play/Break Event Detection in Soccer Games. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2019. Lecture Notes in Computer Science(), vol 11844. Springer, Cham. https://doi.org/10.1007/978-3-030-33720-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-33720-9_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33719-3
Online ISBN: 978-3-030-33720-9
eBook Packages: Computer ScienceComputer Science (R0)