ENet: event based highlight generation network for broadcast sports videos

Khan, Abdullah Aman; Rao, Yunbo; Shao, Jie

doi:10.1007/s00530-022-00978-8

ENet: event based highlight generation network for broadcast sports videos

Regular Article
Published: 19 July 2022

Volume 28, pages 2453–2464, (2022)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Abdullah Aman Khan^1,2,
Yunbo Rao¹ &
Jie Shao^1,2

425 Accesses
Explore all metrics

Abstract

Handcrafting sports video summaries based on highlights and important events from broadcast sports videos is a laborious and time-taking task. Amateur content creators and professional bodies around the world spend hundreds of man-hours to keep the audience up to date with the latest happenings by means of such highlights. In this paper, we present a deep learning-based method capable of automatically generating highlights from a broadcast sports video based on important events and user preferences. Our proposed method classifies the broadcast sports video scene to generate a summary based on highlights or important events. As various sports have different rules, playfield scenarios, and high inter-class similarities, it is quite challenging to devise a generalized method capable of handling different categories of sports. To overcome such problems and to enhance the highlight generation performance, the proposed method internally segregates the sports category and then utilizes various convolution neural network based feature extraction branches to recognize the important events. Additionally, a branch selector mechanism is introduced to select the relevant convolution neural network branch, which predicts the important sports event/activity. We performed extensive experiments using different deep learning architectures. In terms of important event recognition, the results of the experiments show the superiority of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Content-Aware Summarization of Broadcast Sports Videos: An Audio–Visual Feature Extraction Approach

Article 04 February 2020

MMDL: a multi-modal deep learning for video highlight detection in sports

Article 25 April 2025

Indirect Match Highlights Detection with Deep Convolutional Neural Networks

Notes

https://github.com/abdkhanstd/ENet.

References

Khan, A.A., Lin, H., Tumrani, S., Wang, Z., Shao, J.: Detection and localization of scorebox in long duration broadcast sports videos. In: Proceedings of the 5th International Symposium on Artificial Intelligence and Robotics, ISAIR 2020, p. 115740 (2020)
Gong, B., Chao, W., Grauman, K., Sha, F.: Diverse sequential subset selection for supervised video summarization. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, pp. 2069–2077 (2014)
Zhao, B., Xing, E.P.: Quasi real-time summarization for consumer videos. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, pp. 2513–2520 (2014)
Andonian, A., Fosco, C., Monfort, M., Lee, A., Feris, R., Vondrick, C., Oliva, A.: We have so much in common: Modeling semantic relational set abstractions in videos. In: Computer Vision - ECCV 2020 - 16th European Conference, Proceedings, Part XVIII, pp. 18–34 (2020)
Betting, J.L.F., Romano, V., Bosman, L.W.J., Al-Ars, Z., Zeeuw, C.I.D., Strydis, C.: Stairway to abstraction: an iterative algorithm for whisker detection in video frames. In: 11th IEEE Latin American Symposium on Circuits & Systems, LASCAS 2020, pp. 1–4 (2020)
Chen, Y., Yuan, H., Li, Y.: Object-oriented state abstraction in reinforcement learning for video games. In: IEEE Conference on Games, CoG 2019, pp. 1–4 (2019)
Yamghani, A.R., Zargari, F.: Compressed domain video abstraction based on i-frame of HEVC coded videos. Circ. Syst. Signal Process. 38(4), 1695–1716 (2019)
Article Google Scholar
Islam, M.R., Paul, M., Antolovich, M., Kabir, A.: Sports highlights generation using decomposed audio information. In: IEEE International Conference on Multimedia & Expo Workshops, ICME Workshops 2019, pp. 579–584 (2019)
Khan, A.A., Shao, J.: Spnet: A deep network for broadcast sports video highlight generation. Comput. Electr. Eng. 99, 107779 (2022)
Article Google Scholar
Pan, Z., Li, C.: Robust basketball sports recognition by leveraging motion block estimation. Signal Process. Image Commun. 83, 115784 (2020)
Article Google Scholar
Rekik, G., Khacharem, A., Belkhir, Y., Bali, N., Jarraya, M.: The instructional benefits of dynamic visualizations in the acquisition of basketball tactical actions. J. Comput. Assist. Learn. 35(1), 74–81 (2019)
Article Google Scholar
Cai, J., Tang, X.: RGB video based tennis action recognition using a deep weighted long short-term memory. arXiv:1808.00845 (2018)
Ghosh, A., Jawahar, C.V.: Smarttennistv: Automatic indexing of tennis videos. In: Computer Vision, Pattern Recognition, Image Processing, and Graphics - 6th National Conference, NCVPRIPG 2017, pp. 24–33 (2017)
Agyeman, R., Muhammad, R., Choi, G.S.: Soccer video summarization using deep learning. In: 2nd IEEE Conference on Multimedia Information Processing and Retrieval, MIPR 2019 (2019)
Deng, G., Liu, L., Zuo, J.: Scoring framework of soccer matches using possession trajectory data. In: Proceedings of the ACM Turing Celebration Conference - China, ACM TUR-C 2019, pp. 59–1592 (2019)
He, D., Li, L., An, L.: Study on sports volleyball tracking technology based on image processing and 3d space matching. IEEE Access 8, 94258–94267 (2020)
Article Google Scholar
Shingrakhia, H., Patel, H.: Emperor penguin optimized event recognition and summarization for cricket highlight generation. Multimed. Syst. 26(6), 745–759 (2020)
Article Google Scholar
Khan, A.A., Shao, J., Ali, W., Tumrani, S.: Content-aware summarization of broadcast sports videos: An audio-visual feature extraction approach. Neural Process. Lett. 52(3), 1945–1968 (2020)
Article Google Scholar
Yan, C., Li, X., Li, G.: A new action recognition framework for video highlights summarization in sporting events. In: 16th International Conference on Computer Science & Education, ICCSE 2021, pp. 653–666 (2021)
Minhas, R.A., Javed, A., Irtaza, A., Mahmood, M.T., Joo, Y.B.: Shot classification of field sports videos using alexnet convolutional neural network. Appl. Sci. 9(3), 483 (2019)
Article Google Scholar
Rafiq, M., Rafiq, G., Agyeman, R., Choi, G.S., Jin, S.: Scene classification for sports video summarization using transfer learning. Sensors 20(6), 1702 (2020)
Article Google Scholar
Sanabria, M., Sherly, Precioso, F., Menguy, T.: A deep architecture for multimodal summarization of soccer games. In: Proceedings Proceedings of the 2nd International Workshop on Multimedia Content Analysis in Sports, MMSports@MM 2019, pp. 16–24 (2019)
Turchini, F., Seidenari, L., Galteri, L., Ferracani, A., Becchi, G., Bimbo, A.D.: Flexible automatic football filming and summarization. In: Proceedings Proceedings of the 2nd International Workshop on Multimedia Content Analysis in Sports, MMSports@MM 2019, pp. 108–114 (2019)
Datt, M., Mukhopadhyay, J.: Content based video summarization: Finding interesting temporal sequences of frames. In: 2018 IEEE International Conference on Image Processing, ICIP 2018, Athens, Greece, October 7–10, 2018, pp. 1268–1272 (2018)
Venkataramanan, A., Laviale, M., Figus, C., Usseglio-Polatera, P., Pradalier, C.: Tackling inter-class similarity and intra-class variance for microscopic image-based classification. In: Computer Vision Systems - 13th International Conference, ICVS 2021, pp. 93–103 (2021)
Zalluhoglu, C., Ikizler-Cinbis, N.: Collective sports: A multi-task dataset for collective activity recognition. Image Vis. Comput. 94, 103870 (2020)
Article Google Scholar
Khan, A.A., Tumrani, S., Jiang, C., Shao, J.: RICAPS: residual inception and cascaded capsule network for broadcast sports video classification. In: MMAsia 2020: ACM Multimedia Asia, pp. 43–1437 (2020)
Hara, K., Kataoka, H., Satoh, Y.: Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, pp. 6546–6555 (2018)
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 4278–4284 (2017)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pp. 2818–2826 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pp. 770–778 (2016)
Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, pp. 2261–2269 (2017)
Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, pp. 4724–4733 (2017)
Weng, X., Kitani, K.: Learning spatio-temporal features with two-stream deep 3d cnns for lipreading. In: 30th British Machine Vision Conference 2019, BMVC 2019, p. 269 (2019)
Tran, D., Bourdev, L.D., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: 2015 IEEE International Conference on Computer Vision, ICCV 2015, pp. 4489–4497 (2015)
Donahue, J., Hendricks, L.A., Rohrbach, M., Venugopalan, S., Guadarrama, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 677–691 (2017)
Article Google Scholar
Kalfaoglu, M.E., Kalkan, S., Alatan, A.A.: Late temporal modeling in 3d CNN architectures with BERT for action recognition. In: Computer Vision - ECCV 2020 Workshops, Proceedings, Part V, pp. 731–747 (2020)

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 61832001).

Author information

Authors and Affiliations

University of Electronic Science and Technology of China, Chengdu, China
Abdullah Aman Khan, Yunbo Rao & Jie Shao
Sichuan Artificial Intelligence Research Institute, Yibin, China
Abdullah Aman Khan & Jie Shao

Authors

Abdullah Aman Khan
View author publications
You can also search for this author inPubMed Google Scholar
Yunbo Rao
View author publications
You can also search for this author inPubMed Google Scholar
Jie Shao
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Jie Shao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khan, A.A., Rao, Y. & Shao, J. ENet: event based highlight generation network for broadcast sports videos. Multimedia Systems 28, 2453–2464 (2022). https://doi.org/10.1007/s00530-022-00978-8

Download citation

Received: 04 May 2022
Accepted: 02 July 2022
Published: 19 July 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s00530-022-00978-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ENet: event based highlight generation network for broadcast sports videos

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Content-Aware Summarization of Broadcast Sports Videos: An Audio–Visual Feature Extraction Approach

MMDL: a multi-modal deep learning for video highlight detection in sports

Indirect Match Highlights Detection with Deep Convolutional Neural Networks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now