Abstract
Handcrafting sports video summaries based on highlights and important events from broadcast sports videos is a laborious and time-taking task. Amateur content creators and professional bodies around the world spend hundreds of man-hours to keep the audience up to date with the latest happenings by means of such highlights. In this paper, we present a deep learning-based method capable of automatically generating highlights from a broadcast sports video based on important events and user preferences. Our proposed method classifies the broadcast sports video scene to generate a summary based on highlights or important events. As various sports have different rules, playfield scenarios, and high inter-class similarities, it is quite challenging to devise a generalized method capable of handling different categories of sports. To overcome such problems and to enhance the highlight generation performance, the proposed method internally segregates the sports category and then utilizes various convolution neural network based feature extraction branches to recognize the important events. Additionally, a branch selector mechanism is introduced to select the relevant convolution neural network branch, which predicts the important sports event/activity. We performed extensive experiments using different deep learning architectures. In terms of important event recognition, the results of the experiments show the superiority of our proposed method.





Similar content being viewed by others
References
Khan, A.A., Lin, H., Tumrani, S., Wang, Z., Shao, J.: Detection and localization of scorebox in long duration broadcast sports videos. In: Proceedings of the 5th International Symposium on Artificial Intelligence and Robotics, ISAIR 2020, p. 115740 (2020)
Gong, B., Chao, W., Grauman, K., Sha, F.: Diverse sequential subset selection for supervised video summarization. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, pp. 2069–2077 (2014)
Zhao, B., Xing, E.P.: Quasi real-time summarization for consumer videos. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, pp. 2513–2520 (2014)
Andonian, A., Fosco, C., Monfort, M., Lee, A., Feris, R., Vondrick, C., Oliva, A.: We have so much in common: Modeling semantic relational set abstractions in videos. In: Computer Vision - ECCV 2020 - 16th European Conference, Proceedings, Part XVIII, pp. 18–34 (2020)
Betting, J.L.F., Romano, V., Bosman, L.W.J., Al-Ars, Z., Zeeuw, C.I.D., Strydis, C.: Stairway to abstraction: an iterative algorithm for whisker detection in video frames. In: 11th IEEE Latin American Symposium on Circuits & Systems, LASCAS 2020, pp. 1–4 (2020)
Chen, Y., Yuan, H., Li, Y.: Object-oriented state abstraction in reinforcement learning for video games. In: IEEE Conference on Games, CoG 2019, pp. 1–4 (2019)
Yamghani, A.R., Zargari, F.: Compressed domain video abstraction based on i-frame of HEVC coded videos. Circ. Syst. Signal Process. 38(4), 1695–1716 (2019)
Islam, M.R., Paul, M., Antolovich, M., Kabir, A.: Sports highlights generation using decomposed audio information. In: IEEE International Conference on Multimedia & Expo Workshops, ICME Workshops 2019, pp. 579–584 (2019)
Khan, A.A., Shao, J.: Spnet: A deep network for broadcast sports video highlight generation. Comput. Electr. Eng. 99, 107779 (2022)
Pan, Z., Li, C.: Robust basketball sports recognition by leveraging motion block estimation. Signal Process. Image Commun. 83, 115784 (2020)
Rekik, G., Khacharem, A., Belkhir, Y., Bali, N., Jarraya, M.: The instructional benefits of dynamic visualizations in the acquisition of basketball tactical actions. J. Comput. Assist. Learn. 35(1), 74–81 (2019)
Cai, J., Tang, X.: RGB video based tennis action recognition using a deep weighted long short-term memory. arXiv:1808.00845 (2018)
Ghosh, A., Jawahar, C.V.: Smarttennistv: Automatic indexing of tennis videos. In: Computer Vision, Pattern Recognition, Image Processing, and Graphics - 6th National Conference, NCVPRIPG 2017, pp. 24–33 (2017)
Agyeman, R., Muhammad, R., Choi, G.S.: Soccer video summarization using deep learning. In: 2nd IEEE Conference on Multimedia Information Processing and Retrieval, MIPR 2019 (2019)
Deng, G., Liu, L., Zuo, J.: Scoring framework of soccer matches using possession trajectory data. In: Proceedings of the ACM Turing Celebration Conference - China, ACM TUR-C 2019, pp. 59–1592 (2019)
He, D., Li, L., An, L.: Study on sports volleyball tracking technology based on image processing and 3d space matching. IEEE Access 8, 94258–94267 (2020)
Shingrakhia, H., Patel, H.: Emperor penguin optimized event recognition and summarization for cricket highlight generation. Multimed. Syst. 26(6), 745–759 (2020)
Khan, A.A., Shao, J., Ali, W., Tumrani, S.: Content-aware summarization of broadcast sports videos: An audio-visual feature extraction approach. Neural Process. Lett. 52(3), 1945–1968 (2020)
Yan, C., Li, X., Li, G.: A new action recognition framework for video highlights summarization in sporting events. In: 16th International Conference on Computer Science & Education, ICCSE 2021, pp. 653–666 (2021)
Minhas, R.A., Javed, A., Irtaza, A., Mahmood, M.T., Joo, Y.B.: Shot classification of field sports videos using alexnet convolutional neural network. Appl. Sci. 9(3), 483 (2019)
Rafiq, M., Rafiq, G., Agyeman, R., Choi, G.S., Jin, S.: Scene classification for sports video summarization using transfer learning. Sensors 20(6), 1702 (2020)
Sanabria, M., Sherly, Precioso, F., Menguy, T.: A deep architecture for multimodal summarization of soccer games. In: Proceedings Proceedings of the 2nd International Workshop on Multimedia Content Analysis in Sports, MMSports@MM 2019, pp. 16–24 (2019)
Turchini, F., Seidenari, L., Galteri, L., Ferracani, A., Becchi, G., Bimbo, A.D.: Flexible automatic football filming and summarization. In: Proceedings Proceedings of the 2nd International Workshop on Multimedia Content Analysis in Sports, MMSports@MM 2019, pp. 108–114 (2019)
Datt, M., Mukhopadhyay, J.: Content based video summarization: Finding interesting temporal sequences of frames. In: 2018 IEEE International Conference on Image Processing, ICIP 2018, Athens, Greece, October 7–10, 2018, pp. 1268–1272 (2018)
Venkataramanan, A., Laviale, M., Figus, C., Usseglio-Polatera, P., Pradalier, C.: Tackling inter-class similarity and intra-class variance for microscopic image-based classification. In: Computer Vision Systems - 13th International Conference, ICVS 2021, pp. 93–103 (2021)
Zalluhoglu, C., Ikizler-Cinbis, N.: Collective sports: A multi-task dataset for collective activity recognition. Image Vis. Comput. 94, 103870 (2020)
Khan, A.A., Tumrani, S., Jiang, C., Shao, J.: RICAPS: residual inception and cascaded capsule network for broadcast sports video classification. In: MMAsia 2020: ACM Multimedia Asia, pp. 43–1437 (2020)
Hara, K., Kataoka, H., Satoh, Y.: Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, pp. 6546–6555 (2018)
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 4278–4284 (2017)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pp. 2818–2826 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pp. 770–778 (2016)
Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, pp. 2261–2269 (2017)
Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, pp. 4724–4733 (2017)
Weng, X., Kitani, K.: Learning spatio-temporal features with two-stream deep 3d cnns for lipreading. In: 30th British Machine Vision Conference 2019, BMVC 2019, p. 269 (2019)
Tran, D., Bourdev, L.D., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: 2015 IEEE International Conference on Computer Vision, ICCV 2015, pp. 4489–4497 (2015)
Donahue, J., Hendricks, L.A., Rohrbach, M., Venugopalan, S., Guadarrama, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 677–691 (2017)
Kalfaoglu, M.E., Kalkan, S., Alatan, A.A.: Late temporal modeling in 3d CNN architectures with BERT for action recognition. In: Computer Vision - ECCV 2020 Workshops, Proceedings, Part V, pp. 731–747 (2020)
Acknowledgements
This work is supported by the National Natural Science Foundation of China (No. 61832001).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Khan, A.A., Rao, Y. & Shao, J. ENet: event based highlight generation network for broadcast sports videos. Multimedia Systems 28, 2453–2464 (2022). https://doi.org/10.1007/s00530-022-00978-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-022-00978-8