ABSTRACT
The massive growth of sports videos, specially in soccer, has resulted in a need for the automatic generation of summaries, where the objective is not only to show the most important actions of the match but also to elicit as much emotion as the ones bring upon by human editors. State-of-the-art methods on video summarization mostly rely on video processing, however this is not an optimal approach for long videos such as soccer matches. In this paper we propose a multimodal approach to automatically generate summaries of soccer match videos that consider both event and audio features. The event features get a shorter and better representation of the match, and the audio helps detect the excitement generated by the game. Our method consists of three consecutive stages: Proposals, Summarization and Content Refinement. The first one generates summary proposals, using Multiple Instance Learning to deal with the similarity between the events inside the summary and the rest of the match. The Summarization stage uses event and audio features as input of a hierarchical Recurrent Neural Network to decide which proposals should indeed be in the summary. And the last stage, takes advantage of the visual content to create the final summary. The results show that our approach outperforms by a large margin not only the video processing methods but also methods that use event and audio features.
- Vinay Bettadapura, Caroline Pantofaru, and Irfan Essa. 2016. Leveraging contextual cues for generating basketball highlights. In Proceedings of the 24th ACM international conference on Multimedia. ACM, 908--917.Google ScholarDigital Library
- Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem, and Juan Carlos Niebles. 2015. Activitynet: A large-scale video benchmark for human activity understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 961--970.Google ScholarCross Ref
- Deepayan Chakrabarti and Kunal Punera. 2011. Event summarization using tweets. In Fifth International AAAI Conference on Weblogs and Social Media .Google Scholar
- David Corney, Carlos Martin, and Ayse Göker. 2014. Two Sides to Every Story: Subjective Event Summarization of Sports Events using Twitter.. In SoMuS@ ICMR. Citeseer.Google Scholar
- Tom Decroos, Vladimir Dzyuba, Jan Van Haaren, and Jesse Davis. 2017. Predicting soccer highlights from spatio-temporal match event streams. In Thirty-First AAAI Conference on Artificial Intelligence .Google ScholarDigital Library
- Tom Decroos, Jan Van Haaren, and Jesse Davis. 2018. Automatic Discovery of Tactics in Spatio-Temporal Soccer Match Data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 223--232.Google ScholarDigital Library
- Thomas G Dietterich, Richard H Lathrop, and Tomás Lozano-Pérez. 1997. Solving the multiple instance problem with axis-parallel rectangles. Artificial intelligence , Vol. 89, 1--2 (1997), 31--71.Google Scholar
- Ahmet Ekin, A Murat Tekalp, and Rajiv Mehrotra. 2003. Automatic soccer video analysis and summarization. IEEE Transactions on Image processing , Vol. 12, 7 (2003), 796--807.Google ScholarDigital Library
- Mohamed Y Eldib, Bassam S Abou Zaid, Hossam M Zawbaa, Mohamed El-Zahar, and Motaz El-Saban. 2009. Soccer video summarization using enhanced logo detection. In 2009 16th IEEE International Conference on Image Processing (ICIP). IEEE, 4345--4348.Google ScholarCross Ref
- Guilherme Fi ao, Teresa Rom ao, Nuno Correia, Pedro Centieiro, and A Eduardo Dias. 2016. Automatic Generation of Sport Video Highlights Based on Fan's Emotions and Content. In Proceedings of the 13th International Conference on Advances in Computer Entertainment Technology. ACM, 29.Google Scholar
- Michael Gygli, Helmut Grabner, and Luc Van Gool. 2015. Video summarization by learning submodular mixtures of objectives. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 3090--3098.Google ScholarCross Ref
- Yu Gang Jiang, Jingen Liu, A Roshan Zamir, George Toderici, Ivan Laptev, Mubarak Shah, and Rahul Sukthankar. 2014. THUMOS challenge: Action recognition with a large number of classes.Google Scholar
- Xuelong Li, Bin Zhao, and Xiaoqiang Lu. 2017. A general framework for edited video and raw video summarization. IEEE Transactions on Image Processing , Vol. 26, 8 (2017), 3652--3664.Google ScholarDigital Library
- Tingxi Liu, Yao Lu, Xiaoyu Lei, Lijing Zhang, Haoyu Wang, Wei Huang, and Zijian Wang. 2017. Soccer video event detection using 3D convolutional networks and shot boundary detection via deep feature distance. In International Conference on Neural Information Processing. Springer, 440--449.Google ScholarDigital Library
- Behrooz Mahasseni, Michael Lam, and Sinisa Todorovic. 2017. Unsupervised video summarization with adversarial lstm networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition . 202--211.Google ScholarCross Ref
- Engin Mendi, Hélio B Clemente, and Coskun Bayrak. 2013. Sports video summarization based on motion analysis. Computers & Electrical Engineering , Vol. 39, 3 (2013), 790--796.Google ScholarDigital Library
- Michele Merler, Dhiraj Joshi, Quoc-Bao Nguyen, Stephen Hammer, John Kent, John R Smith, and Rogerio S Feris. 2017. Automatic curation of golf highlights using multimodal excitement features. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) . IEEE, 57--65.Google ScholarCross Ref
- Ngoc Nguyen and Atsuo Yoshitaka. 2014. Soccer video summarization based on cinematography and motion analysis. In 2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP). IEEE, 1--6.Google ScholarCross Ref
- Arnau Raventos, Raul Quijada, Luis Torres, and Francesc Tarrés. 2015. Automatic summarization of soccer highlights using audio-visual descriptors. SpringerPlus , Vol. 4, 1 (2015), 301.Google ScholarCross Ref
- Yong Rui, Anoop Gupta, and Alex Acero. 2000. Automatically extracting highlights for TV baseball programs. In Proceedings of the eighth ACM international conference on Multimedia. ACM, 105--115.Google ScholarDigital Library
- Bertram Scharf. 1970. Critical bands. Foundation of modern auditory theory , Vol. 1 (1970), 159--202.Google Scholar
- Pushkar Shukla, Hemant Sadana, Apaar Bansal, Deepak Verma, Carlos Elmadjian, Balasubramanian Raman, and Matthew Turk. 2018. Automatic Cricket Highlight generation using Event-Driven and Excitement-Based features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 1800--1808.Google ScholarCross Ref
- Mohamad-Hoseyn Sigari, Hamid Soltanian-Zadeh, and Hamid-Reza Pourreza. 2015. Fast highlight detection and scoring for broadcast soccer video summarization using on-demand feature extraction and fuzzy inference. International Journal of Computer Graphics , Vol. 6, 1 (2015), 13--36.Google ScholarCross Ref
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.Google ScholarCross Ref
- Anthony Tang and Sebastian Boring. 2012. # EpicPlay: Crowd-sourcing sports video highlights. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1569--1572.Google ScholarDigital Library
- Hao Tang, Vivek Kwatra, Mehmet Emre Sargin, and Ullas Gargi. 2011. Detecting highlights in sports videos: Cricket as a test case. In 2011 IEEE International Conference on Multimedia and Expo. IEEE, 1--6.Google ScholarDigital Library
- Kaiyu Tang, Yixin Bao, Zhijian Zhao, Liang Zhu, Yining Lin, and Yao Peng. 2018. AutoHighlight: Automatic Highlights Detection and Segmentation in Soccer Matches. In 2018 IEEE International Conference on Big Data (Big Data). IEEE, 4619--4624.Google Scholar
- Mostafa Tavassolipour, Mahmood Karimian, and Shohreh Kasaei. 2014. Event detection and summarization in soccer videos using bayesian network and copula. IEEE Transactions on circuits and systems for video technology , Vol. 24, 2 (2014), 291--304.Google ScholarDigital Library
- Ruben Vroonen, Tom Decroos, Jan Van Haaren, and Jesse Davis. 2017. Predicting the Potential of Professional Soccer Players.. In MLSA@ PKDD/ECML . 1--10.Google Scholar
- Xinggang Wang, Yongluan Yan, Peng Tang, Xiang Bai, and Wenyu Liu. 2018. Revisiting multiple instance neural networks. Pattern Recognition , Vol. 74 (2018), 15--24.Google ScholarDigital Library
- Zengkai Wang, Junqing Yu, Yunfeng He, and Tao Guan. 2014. Affection arousal based highlight extraction for soccer video. Multimedia Tools and Applications , Vol. 73, 1 (2014), 519--546.Google ScholarDigital Library
- Huijuan Xu, Abir Das, and Kate Saenko. 2017. R-C3D: region convolutional 3d network for temporal activity detection. In IEEE Int. Conf. on Computer Vision (ICCV). 5794--5803.Google ScholarCross Ref
- Ke Zhang, Wei-Lun Chao, Fei Sha, and Kristen Grauman. 2016a. Summary transfer: Exemplar-based subset selection for video summarization. In Proceedings of the IEEE conference on computer vision and pattern recognition . 1059--1067.Google ScholarCross Ref
- Ke Zhang, Wei-Lun Chao, Fei Sha, and Kristen Grauman. 2016b. Video summarization with long short-term memory. In European conference on computer vision . Springer, 766--782.Google ScholarCross Ref
- Ke Zhang, Kristen Grauman, and Fei Sha. 2018. Retrospective Encoders for Video Summarization. In Proceedings of the European Conference on Computer Vision (ECCV). 383--399.Google ScholarDigital Library
- Bin Zhao, Xuelong Li, and Xiaoqiang Lu. 2017. Hierarchical recurrent neural network for video summarization. In Proceedings of the 25th ACM international conference on Multimedia. ACM, 863--871.Google ScholarDigital Library
- Bin Zhao, Xuelong Li, and Xiaoqiang Lu. 2018. Hsa-rnn: Hierarchical structure-adaptive rnn for video summarization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 7405--7414.Google ScholarCross Ref
Index Terms
- A Deep Architecture for Multimodal Summarization of Soccer Games
Recommendations
A Graph-Based Method for Soccer Action Spotting Using Unsupervised Player Classification
MMSports '22: Proceedings of the 5th International ACM Workshop on Multimedia Content Analysis in SportsAction spotting in soccer videos is the task of identifying the specific time when a certain key action of the game occurs. Lately, it has received a large amount of attention and powerful methods have been introduced. Action spotting involves ...
Multimodal latent topic analysis for image collection summarization
We present a new multimodal image collection summarization method.The summarization method is based on latent topic analysis.Textual and visual modalities are fused in the same latent space using convex non-negative matrix factorization.The obtained ...
Automatic summarization of music videos
In this article, we propose a novel approach for automatic music video summarization. The proposed summarization scheme is different from the current methods used for video summarization. The music video is separated into the music track and video ...
Comments