Abstract
Video summarization or highlights from long-hour sport video have been appreciated as one of the interesting and challenging techniques. Generally, the viewers of sports would be interested to have short summary of video. There are some interesting methods published in the literature addressing the issues on automatic sports video summarization. In this chapter, a systematic review on existing video summarization techniques is discussed by focusing on various algorithms and methods categorized under common ideas such as boundary shot detection, players/crowd/umpire shot classification and identification, key events detection, replay, strokes, commercials and play breaks-based detection, event, text, and excitement-based summarizations. The intention of the chapter is to recapitulate decades of development in sports video summarization for the benefit of the prospective researchers and exhibit future avenues to strengthen the outcome of video summarization techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Rahman, A. A., Saleem, W., & Iyer, V. V. Driving behavior profiling and prediction in KSA using smart phone sensors and MLAs. In 2019 IEEE Jordan international joint conference on Electrical Engineering and Information Technology (JEEIT) (pp. 34–39).
Ajmal, M., Ashraf, M. H., Shakir, M., Abbas, Y., & Shah, F. A. (2012). Video summarization: Techniques and classification. In Computer vision and graphics (Vol. 7594). ISBN: 978-3-642-33563-1.
Sen, A., Deb, K., Dhar, P. K., & Koshiba, T. (2021). CricShotClassify: An approach to classifying batting shots from cricket videos using a convolutional neural network and gated recurrent unit. Sensors, 21, 2846. https://doi.org/10.3390/s21082846
Halin, A. A., & Mandava, R. (2013, January). Goal event detection in soccer videos via collaborative multimodal analysis. Pertanika Journal of Science and Technology, 21(2), 423–442.
Amruta, A. D., & Kamde, P. M. (2015, March). Sports highlight generation system based on video feature extraction. IJRSI (2321–2705), II(III).
Bagheri-Khaligh, A., Raziperchikolaei, R., & Moghaddam, M. (2012). A new method for shot classification in soccer sports video based on SVM classifier. In Proceedings of the 2012 IEEE Southwest Symposium on Image Analysis and Interpretation (SSIAI). Santa Fe, NM.
Baijal, A., Jaeyoun, C., Woojung, L., & Byeong-Seob, K. (2015). Sports highlights generation based on acoustic events detection: A rugby case study. In 2015 IEEE International Conference on Consumer Electronics (ICCE) (pp. 20–23). https://doi.org/10.1109/ICCE.2015.7066303
Alexey, B., Chien-Yao, W., & Hong-Yuan, M. L. (2020). YOLOv4: Optimal speed and accuracy of object detection. In arXiv 2004.10934[cs.CV].
Chen, F., De Vleeschouwer, C., Barrobés, H. D., Escalada, J. G., & Conejero, D. (2010). Automatic summarization of audio-visual soccer feeds. In 2010 IEEE international conference on Multimedia and Expo (pp. 837–842). https://doi.org/10.1109/ICME.2010.5582561
Dai, J., Li, Y., He, K., & Sun, J. (2016). R-fcn: Object detection via region-based fully convolutional networks. In Advances in neural information processing systems (pp. 379–387).
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In 2005 IEEE Computer Society conference on Computer Vision and Pattern Recognition (CVPR ‘05) (Vol. 1, pp. 886–893). https://doi.org/10.1109/CVPR.2005.177
Jesse, D., & Mark, G. (2006). The relationship between Precision-Recall and ROC curves. In Proceedings of the 23rd International Conference on Machine Learning (ICML ‘06) (pp. 233–240). ACM, New York, NY, USA. https://doi.org/10.1145/1143844.1143874
Asadi, E., & Charkari, N. M. (2012). Video summarization using fuzzy c-means clustering. In 20th Iranian conference on Electrical Engineering (ICEE2012) (pp. 690–694). https://doi.org/10.1109/IranianCEE.2012.6292442
Ekin, A., Tekalp, A., & Mehrotra, R. (2003). Automatic soccer video analysis and summarization. IEEE Transactions on Image Processing, 12(7), 796–807.
Fani, M., Yazdi, M., Clausi, D., & Wong, A. (2017). Soccer video structure analysis by parallel feature fusion network and hidden-to-observable transferring Markov model. IEEE Access, 5, 27322–27336.
Felzenszwalb, P. F., Girshick, R. B., & McAllester, D. (2010). Cascade object detection with deformable part models. In 2010 IEEE computer society conference on Computer Vision and Pattern Recognition (pp. 2241–2248). https://doi.org/10.1109/CVPR.2010.5539906
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010, September). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645. https://doi.org/10.1109/TPAMI.2009.167
Felzenszwalb, P., McAllester, D., & Ramanan, D. (2008). A discriminatively trained, multiscale, deformable part model. In 2008 IEEE conference on Computer Vision and Pattern Recognition (pp. 1–8). https://doi.org/10.1109/CVPR.2008.4587597
Foysal, M. F., Islam, M., Karim, A., & Neehal, N. (2018). Shot-Net: A convolutional neural network for classifying different cricket shots. In Recent trends in image processing and pattern recognition. Springer Singapore.
Ghanem, B., Kreidieh, M., Farra, M., & Zhang, T. (2012). Context-aware learning for automatic sports highlight recognition. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012) (pp. 1977–1980).
Girshick, R. B. (2012). From rigid templates to grammars: object detection with structured models (Ph.D. Dissertation). University of Chicago, USA. Advisor(s) Pedro F. Felzenszwalb. Order Number: AAI3513455.
Girshick, R. B., Felzenszwalb, P. F., & Mcallester, D. A. (2011). Object detection with grammar models. In Proceedings of the 24th international conference on Neural Information Processing Systems (NIPS’11) (pp. 442–450). Curran Associates Inc., Red Hook, NY, USA.
Girshick, R., & Fast, R.-C. N. N. (2015). 2015 IEEE International Conference on Computer Vision (ICCV) (pp. 1440–1448). https://doi.org/10.1109/ICCV.2015.169
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2016, January 1). Region-based convolutional networks for accurate object detection and segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(1), 142–158. https://doi.org/10.1109/TPAMI.2015.2437384
Gonzalez, A., Bergasa, L., Yebes, J., & Bronte, S. (2012). Text location in complex images. In IEEE ICPR.
Gupta, A., & Muthaiah, S. (2020). Viewpoint constrained and unconstrained Cricket stroke localization from untrimmed videos. Image and Vision Computing, 100.
Gupta, A., & Muthaiah, S. (2019). Cricket stroke extraction: Towards creation of a large-scale cricket actions dataset. arXiv:1901.03107 [cs.CV].
Gupta, A., Karel, A., & Sakthi Balan, M. (2020). Discovering cricket stroke classes in trimmed telecast videos. In N. Nain, S. Vipparthi, & B. Raman (Eds.), Computer vision and image processing. CVIP 2019. Communications in computer and information science (Vol. 1148). Springer Singapore.
Arpan, G., Ashish, K., & Sakthi Balan, M. (2021). Cricket stroke recognition using hard and soft assignment based bag of visual words. In Communications in computer and information science (pp. 231–242). Springer Singapore. https://doi.org/10.1007/2F978-981-16-1092-2021
Hari, R. (2015, November). Automatic summarization of hockey videos. IJARET (0976–6480), 6(11).
Harun-Ur-Rashid, M., Khatun, S., Trisha, Z., Neehal, N., & Hasan, M. (2018). Crick-net: A convolutional neural network based classification approach for detecting waist high no balls in cricket. arXiv preprint arXiv:1805.05974.
He, J., & Pao, H.-K. (2020). Multi-modal, multi-labeled sports highlight extraction. In 2020 international conference on Technologies and Applications of Artificial Intelligence (TAAI) (pp. 181–186). https://doi.org/10.1109/TAAI51410.2020.00041
He, K., Zhang, X., Ren, S., & Sun, J. (2014). Spatial pyramid pooling in deep convolutional networks for visual recognition. In European conference on Computer Vision (pp. 346–361). Springer.
Khurram, I. M., Aun, I., & Nudrat, N. (2020). Automatic soccer video key event detection and summarization based on hybrid approach. Proceedings of the Pakistan Academy of Sciences, A Physical and Computational Sciences (2518–4245), 57(3), 19–30.
Islam, M. R., Paul, M., Antolovich, M., & Kabir, A. (2019). Sports highlights generation using decomposed audio information. In IEEE International Conference on Multimedia & Expo Workshops (ICMEW) (pp. 579–584). https://doi.org/10.1109/ICMEW.2019.00105
Islam, M., Hassan, T., & Khan, S. (2019). A CNN-based approach to classify cricket bowlers based on their bowling actions. In 2019 IEEE international conference on Signal Processing, Information, Communication & Systems (SPICSCON) (pp. 130–134). https://doi.org/10.1109/SPICSCON48833.2019.9065090
Takahiro, I., Tsukasa, F., Shugo, Y., & Shigeo, M. (2017). Court-aware volleyball video summarization. In ACM SIGGRAPH 2017 posters (SIGGRAPH ‘17) (pp. 1–2). Association for Computing Machinery, New York, NY, USA, Article 74. https://doi.org/10.1145/3102163.3102204
Javed, A., Malik, K. M., Irtaza, A., et al. (2020). A decision tree framework for shot classification of field sports videos. The Journal of Supercomputing, 76, 7242–7267. https://doi.org/10.1007/s11227-020-03155-8
Javed, A., Bajwa, K., Malik, H., Irtaza, A., & Mahmood, M. (2016). A hybrid approach for summarization of cricket videos. In IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia). Seoul.
Javed, A., Irtaza, A., Khaliq, Y., & Malik, H. (2019). Replay and key-events detection for sports video summarization using confined elliptical local ternary patterns and extreme learning machine. Applied Intelligence, 49, 2899–2917. https://doi.org/10.1007/s10489-019-01410-x
Jothi Shri, S., & Jothilakshmi, S. (2019). Crowd video event classification using convolutional neural network. Computer Communications, 147, 35–39.
Kanade, S. S., & Patil, P. M. (2013, March). Dominant color based extraction of key frames for sports video summarization. International Journal of Advances in Engineering & Technology, 6(1), 504–512. ISSN: 2231-1963.
Kapela, R., McGuinness, K., & O’Connor, N. (2017). Real-time field sports scene classification using colour and frequency space decompositions. Journal of Real-Time Image Process, 13, 725–737.
Kathirvel, P., Manikandan, S. M., & Soman, K. P. (2011, January). Automated referee whistle sound detection for extraction of highlights from sports video. International Journal of Computer Applications (0975–8887), 12(11), 16–21.
Khan, A., Shao, J., Ali, W., & Tumrani, S. (2020). Content-aware summarization of broadcast sports videos: An audio–visual feature extraction approach. Neural Process Letter, 1945–1968.
Kiani, V., & Pourreza, H. R. (2013). Flexible soccer video summarization in compressed domain. In ICCKE 2013 (pp. 213–218). https://doi.org/10.1109/ICCKE.2013.6682798
Kolekar, M. H., & Sengupta, S. (2015). Bayesian network-based customized highlight generation for broadcast soccer videos. IEEE Transactions on Broadcasting, (2), 195–209.
Kolekar, M. H., & Sengupta, S. (2006). Event-importance based customized and automatic cricket highlight generation. In IEEE international conference on Multimedia and Expo. Toronto, ON.
Kolekar, M. H., & Sengupta, S. (2008). Caption content analysis based automated cricket highlight generation. In National Communications Conference (NCC). Mumbai.
Bhattacharya, K., Chaudhury, S., & Basak, J. (2004, December 16–18). Video summarization: A machine learning based approach. In ICVGIP 2004, Proceedings of the fourth Indian conference on Computer Vision, Graphics & Image Processing (pp. 429–434). Allied Publishers Private Limited, Kolkata, India.
Alex, K., Ilya, S., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th international conference on Neural Information Processing Systems, Volume 1 (NIPS’12) (pp. 1097–1105). Curran Associates Inc., Red Hook, NY, USA.
Kumar, R., Santhadevi, D., & Janet, B. (2019). Outcome classification in cricket using deep learning. In IEEE international conference on Cloud Computing in Emerging Markets CCEM. Bengaluru.
Kumar Susheel, K., Shitala, P., Santosh, B., & Bhaskar, S. V. (2010). Sports video summarization using priority curve algorithm. International Journal on Computer Science and Engineering (0975–3397), 02(09), 2996–3002.
Kumar, Y., Gupta, S., Kiran, B., Ramakrishnan, K., & Bhattacharyya, C. (2011). Automatic summarization of broadcast cricket videos. In IEEE 15th International Symposium on Consumer Electronics (ISCE). Singapore.
Li, Y., Chen, Y., Wang, N., & Zhang, Z. (2019). Scale-aware trident networks for object detection. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 6053–6062). https://doi.org/10.1109/ICCV.2019.00615
Li, Z., Peng, C., Yu, G., Zhang, X., Deng, Y., & Sun, J. (2017). Light-head r-cnn: In defense of two-stage object detector. arXiv preprint arXiv:1711.07264.
Lin, T., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In IEEE conference on Computer Vision and Pattern Recognition (CVPR) (pp. 936–944). https://doi.org/10.1109/CVPR.2017.106
Lin, T., Goyal, P., Girshick, R., He, K., & Dollár, P. (2018, July). Focal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2), 318–327. https://doi.org/10.1109/TPAMI.2018.2858826
Merler, M., Mac, K. N. C., Joshi, D., Nguyen, Q. B., Hammer, S., Kent, J., Xiong, J., Do, M. N., Smith, J. R., & Feris, R. S. (2019, May). Cricket automatic curation of sports highlights using multimodal excitement features. IEEE Transactions on Multimedia, 21(5), 1147–1160. https://doi.org/10.1109/TMM.2018.2876046
Minhas, R., Javed, A., Irtaza, A., Mahmood, M., & Joo, Y. (2019). Shot classification of field sports videos using AlexNet Convolutional Neural Network. Applied Sciences, 9(3), 483.
Mohan, S., & Vani, V. (2016). Predictive 3D content streaming based on decision tree classifier approach. In S. Satapathy, J. Mandal, S. Udgata, & V. Bhateja (Eds.), Information systems design and intelligent applications. Advances in intelligent systems and computing (Vol. 433). Springer. https://doi.org/10.1007/978-81-322-2755-7_16
Namuduri, K. (2009). Automatic extraction of highlights from a cricket video using MPEG-7 descriptors. In First international communication systems and networks and workshops. Bangalore.
Nguyen, N., & Yoshitaka, A. (2014). Soccer video summarization based on cinematography and motion analysis. In 2014 IEEE 16th international workshop on Multimedia Signal Processing (MMSP) (pp. 1–6). https://doi.org/10.1109/MMSP.2014.6958804
Rafiq, M., Rafiq, G., Agyeman, R., Choi, G., & Jin, S.-I. (2020). Scene classification for sports video summarization using transfer learning. Sensors, 20, 1702.
Raj, R., Bhatnagar, V., Singh, A. K., Mane, S., & Walde, N. (2019, May). Video summarization: Study of various techniques. In Proceedings of IRAJ international conference, arXiv:2101.08434.
Raventos, A., Quijada, R., Torres, L., & Tarrés, F. (2015). Automatic summarization of soccer highlights using audio-visual descriptors. Springer Plus, 4, 1–13.
Ravi, A., Venugopal, H., Paul, S., & Tizhoosh, H. R. (2018). A dataset and preliminary results for umpire pose detection using SVM classification of deep features. In 2018 IEEE Symposium Series on Computational Intelligence (SSCI) (pp. 1396–1402). https://doi.org/10.1109/SSCI.2018.8628877
Redmon, J., & Farhadi, A. (2017). YOLO9000: Better, faster, stronger. In 2017 IEEE conference on Computer Vision and Pattern Recognition (CVPR) (pp. 6517–6525). https://doi.org/10.1109/CVPR.2017.690
Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 779–788).
Ren, S., He, K., Girshick, R., & Sun, J. (2016). Faster R-CNN: Towards real-time object detection with region proposal. arXiv:1506.01497 [cs.CV].
Sharma, R., Sankar, K., & Jawahar, C. (2015). Fine-grain annotation of cricket videos. In Proceedings of the 3rd IAPR Asian Conference on Pattern Recognition (ACPR). Kuala Lumpur, Malaysia.
Shih, H. (2018). A survey of content-aware video analysis for sports. IEEE Transactions on Circuits and Systems for Video Technology, 28(5), 1212–1231.
Shingrakhia, H., & Patel, H. (2021). SGRNN-AM and HRF-DBN: A hybrid machine learning model for cricket video summarization. The Visual Computer, 38, 2285. https://doi.org/10.1007/s00371-021-02111-8
Shukla, P., Sadana, H., Verma, D., Elmadjian, C., Ramana, B., & Turk, M. (2018). Automatic cricket highlight generation using event-driven and excitement-based features. In IEEE/CVF conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Salt Lake City, UT.
Sreeja, M. U., & KovoorBinsu, C. (2019). Towards genre-specific frameworks for video summarisation: A survey. Journal of Visual Communication and Image Representation (1047–3203), 62, 340–358. https://doi.org/10.1016/j.jvcir.2019.06.004
Su Yuting., Wang Weikang., Liu Jing., Jing Peiguang., and Yang Xiaokang., DS-Net: Dynamic spatiotemporal network for video salient object detection, arXiv:2012.04886 [cs.CV], 2020.
Sukhwani, M., & Kothari, R. A parameterized approach to personalized variable length summarization of soccer matches. arXiv preprint arXiv:1706.09193.
Sun, Y., Ou, Z., Hu, W., & Zhang, Y. (2010). Excited commentator speech detection with unsupervised model adaptation for soccer highlight extraction. In 2010 international conference on Audio, Language, and Image Processing (pp. 747–751). https://doi.org/10.1109/ICALIP.2010.5685077
Tang, H., Kwatra, V., Sargin, M., & Gargi, U. (2011). Detecting highlights in sports videos: Cricket as a test case. In IEEE international conference on Multimedia and Expo. Barcelona.
Saba, T., & Altameem, A. (2013, August). Analysis of vision based systems to detect real time goal events in soccer videos. International Journal of Applied Artificial Intelligence, 27(7), 656–667. https://doi.org/10.1080/08839514.2013.787779
Antonio, T.-d.-P., Yuta, N., Tomokazu, S., Naokazu, Y., Marko, L., & Esa, R. (2018, August). Summarization of user-generated sports video by using deep action recognition features. IEEE Transactions on Multimedia, 20(8), 2000–2010.
Tien, M.-C., Chen, H.-T., Hsiao, C. Y.-W. M.-H., & Lee, S.-Y. (2007). Shot classification of basketball videos and its application in shooting position extraction. In Proceedings of the IEEE international conference on Acoustics, Speech and Signal Processing (ICASSP 2007).
Vadhanam, B. R. J., Mohan, S., Ramalingam, V., & Sugumaran, V. (2016). Performance comparison of various decision tree algorithms for classification of advertisement and non-advertisement videos. Indian Journal of Science and Technology, 9(1), 48–65.
Vani, V., Kumar, R. P., & Mohan, S. Profiling user interactions of 3D complex meshes for predictive streaming and rendering. In Proceedings of the fourth international conference on Signal and Image Processing 2012 (ICSIP 2012) (pp. 457–467). Springer, India.
Vani, V., & Mohan, S. (2021). Advances in sports video summarization – a review based on cricket video. In The 34th international conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems, Special Session on Big Data and Intelligence Fusion Analytics (BDIFA 2021). Accepted for publication in Springer LNCS.
Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society conference on Computer Vision and Pattern Recognition. CVPR 2001 (p. I-I). https://doi.org/10.1109/CVPR.2001.990517
Viola, P., & Jones, M. (2004). Robust real-time face detection. International Journal of Computer Vision, 57(2), 137–154.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., & Berg, A. C. (2016). SSD: Single shot multibox detector. In European conference on computer vision (pp. 21–37). Springer.
Xu, W., & Yi, Y. (2011, September). A robust replay detection algorithm for soccer video. IEEE Signal Processing Letters, 18(9), 509–512. https://doi.org/10.1109/LSP.2011.2161287
Khan, Y. S., & Pawar, S. (2015). Video summarization: Survey on event detection and summarization in soccer videos. International Journal of Advanced Computer Science and Applications (IJACSA), 6(11). https://doi.org/10.14569/IJACSA.2015.061133
Ye, J., Kobayashi, T., & Higuchi, T. Audio-based sports highlight detection by Fourier local auto-correlations. In Proceedings of the 11th annual conference of the International Speech Communication Association, INTERSPEECH 2010 (pp. 2198–2201).
Hossam, Z. M., Nashwa, E.-B., Ella, H. A., & Tai-hoon, K. (2011). Machine learning-based soccer video summarization system, multimedia, computer graphics and broadcasting (Vol. 263). ISBN: 978-3-642-27185-4.
Zhang, S., Wen, L., Bian, X., Lei, Z., & Li, S. Z. (2018). Singleshot refinement neural network for object detection. In IEEE CVPR.
Zhang, S., Wen, L., Lei, Z., & Li, S. Z. (2021, February). RefineDet++: Single-shot refinement neural network for object detection. IEEE Transactions on Circuits and Systems for Video Technology, 31(2), 674–687. https://doi.org/10.1109/TCSVT.2020.2986402
Zou, Z., Shi, Z., Guo, Y., & Ye, J. (2019). Object detection in 20 years: A survey. arXiv preprint arXiv:1905.05055.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Vasudevan, V., Gounder, M.S. (2023). A Systematic Review on Machine Learning-Based Sports Video Summarization Techniques. In: Kumar, B.V., Sivakumar, P., Surendiran, B., Ding, J. (eds) Smart Computer Vision. EAI/Springer Innovations in Communication and Computing. Springer, Cham. https://doi.org/10.1007/978-3-031-20541-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-20541-5_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20540-8
Online ISBN: 978-3-031-20541-5
eBook Packages: EngineeringEngineering (R0)