skip to main content
10.1145/3347318.3355524acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

A Deep Architecture for Multimodal Summarization of Soccer Games

Authors Info & Claims
Published:15 October 2019Publication History

ABSTRACT

The massive growth of sports videos, specially in soccer, has resulted in a need for the automatic generation of summaries, where the objective is not only to show the most important actions of the match but also to elicit as much emotion as the ones bring upon by human editors. State-of-the-art methods on video summarization mostly rely on video processing, however this is not an optimal approach for long videos such as soccer matches. In this paper we propose a multimodal approach to automatically generate summaries of soccer match videos that consider both event and audio features. The event features get a shorter and better representation of the match, and the audio helps detect the excitement generated by the game. Our method consists of three consecutive stages: Proposals, Summarization and Content Refinement. The first one generates summary proposals, using Multiple Instance Learning to deal with the similarity between the events inside the summary and the rest of the match. The Summarization stage uses event and audio features as input of a hierarchical Recurrent Neural Network to decide which proposals should indeed be in the summary. And the last stage, takes advantage of the visual content to create the final summary. The results show that our approach outperforms by a large margin not only the video processing methods but also methods that use event and audio features.

References

  1. Vinay Bettadapura, Caroline Pantofaru, and Irfan Essa. 2016. Leveraging contextual cues for generating basketball highlights. In Proceedings of the 24th ACM international conference on Multimedia. ACM, 908--917.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem, and Juan Carlos Niebles. 2015. Activitynet: A large-scale video benchmark for human activity understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 961--970.Google ScholarGoogle ScholarCross RefCross Ref
  3. Deepayan Chakrabarti and Kunal Punera. 2011. Event summarization using tweets. In Fifth International AAAI Conference on Weblogs and Social Media .Google ScholarGoogle Scholar
  4. David Corney, Carlos Martin, and Ayse Göker. 2014. Two Sides to Every Story: Subjective Event Summarization of Sports Events using Twitter.. In SoMuS@ ICMR. Citeseer.Google ScholarGoogle Scholar
  5. Tom Decroos, Vladimir Dzyuba, Jan Van Haaren, and Jesse Davis. 2017. Predicting soccer highlights from spatio-temporal match event streams. In Thirty-First AAAI Conference on Artificial Intelligence .Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Tom Decroos, Jan Van Haaren, and Jesse Davis. 2018. Automatic Discovery of Tactics in Spatio-Temporal Soccer Match Data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 223--232.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Thomas G Dietterich, Richard H Lathrop, and Tomás Lozano-Pérez. 1997. Solving the multiple instance problem with axis-parallel rectangles. Artificial intelligence , Vol. 89, 1--2 (1997), 31--71.Google ScholarGoogle Scholar
  8. Ahmet Ekin, A Murat Tekalp, and Rajiv Mehrotra. 2003. Automatic soccer video analysis and summarization. IEEE Transactions on Image processing , Vol. 12, 7 (2003), 796--807.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Mohamed Y Eldib, Bassam S Abou Zaid, Hossam M Zawbaa, Mohamed El-Zahar, and Motaz El-Saban. 2009. Soccer video summarization using enhanced logo detection. In 2009 16th IEEE International Conference on Image Processing (ICIP). IEEE, 4345--4348.Google ScholarGoogle ScholarCross RefCross Ref
  10. Guilherme Fi ao, Teresa Rom ao, Nuno Correia, Pedro Centieiro, and A Eduardo Dias. 2016. Automatic Generation of Sport Video Highlights Based on Fan's Emotions and Content. In Proceedings of the 13th International Conference on Advances in Computer Entertainment Technology. ACM, 29.Google ScholarGoogle Scholar
  11. Michael Gygli, Helmut Grabner, and Luc Van Gool. 2015. Video summarization by learning submodular mixtures of objectives. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 3090--3098.Google ScholarGoogle ScholarCross RefCross Ref
  12. Yu Gang Jiang, Jingen Liu, A Roshan Zamir, George Toderici, Ivan Laptev, Mubarak Shah, and Rahul Sukthankar. 2014. THUMOS challenge: Action recognition with a large number of classes.Google ScholarGoogle Scholar
  13. Xuelong Li, Bin Zhao, and Xiaoqiang Lu. 2017. A general framework for edited video and raw video summarization. IEEE Transactions on Image Processing , Vol. 26, 8 (2017), 3652--3664.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Tingxi Liu, Yao Lu, Xiaoyu Lei, Lijing Zhang, Haoyu Wang, Wei Huang, and Zijian Wang. 2017. Soccer video event detection using 3D convolutional networks and shot boundary detection via deep feature distance. In International Conference on Neural Information Processing. Springer, 440--449.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Behrooz Mahasseni, Michael Lam, and Sinisa Todorovic. 2017. Unsupervised video summarization with adversarial lstm networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition . 202--211.Google ScholarGoogle ScholarCross RefCross Ref
  16. Engin Mendi, Hélio B Clemente, and Coskun Bayrak. 2013. Sports video summarization based on motion analysis. Computers & Electrical Engineering , Vol. 39, 3 (2013), 790--796.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Michele Merler, Dhiraj Joshi, Quoc-Bao Nguyen, Stephen Hammer, John Kent, John R Smith, and Rogerio S Feris. 2017. Automatic curation of golf highlights using multimodal excitement features. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) . IEEE, 57--65.Google ScholarGoogle ScholarCross RefCross Ref
  18. Ngoc Nguyen and Atsuo Yoshitaka. 2014. Soccer video summarization based on cinematography and motion analysis. In 2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP). IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  19. Arnau Raventos, Raul Quijada, Luis Torres, and Francesc Tarrés. 2015. Automatic summarization of soccer highlights using audio-visual descriptors. SpringerPlus , Vol. 4, 1 (2015), 301.Google ScholarGoogle ScholarCross RefCross Ref
  20. Yong Rui, Anoop Gupta, and Alex Acero. 2000. Automatically extracting highlights for TV baseball programs. In Proceedings of the eighth ACM international conference on Multimedia. ACM, 105--115.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Bertram Scharf. 1970. Critical bands. Foundation of modern auditory theory , Vol. 1 (1970), 159--202.Google ScholarGoogle Scholar
  22. Pushkar Shukla, Hemant Sadana, Apaar Bansal, Deepak Verma, Carlos Elmadjian, Balasubramanian Raman, and Matthew Turk. 2018. Automatic Cricket Highlight generation using Event-Driven and Excitement-Based features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 1800--1808.Google ScholarGoogle ScholarCross RefCross Ref
  23. Mohamad-Hoseyn Sigari, Hamid Soltanian-Zadeh, and Hamid-Reza Pourreza. 2015. Fast highlight detection and scoring for broadcast soccer video summarization using on-demand feature extraction and fuzzy inference. International Journal of Computer Graphics , Vol. 6, 1 (2015), 13--36.Google ScholarGoogle ScholarCross RefCross Ref
  24. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  25. Anthony Tang and Sebastian Boring. 2012. # EpicPlay: Crowd-sourcing sports video highlights. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1569--1572.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Hao Tang, Vivek Kwatra, Mehmet Emre Sargin, and Ullas Gargi. 2011. Detecting highlights in sports videos: Cricket as a test case. In 2011 IEEE International Conference on Multimedia and Expo. IEEE, 1--6.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kaiyu Tang, Yixin Bao, Zhijian Zhao, Liang Zhu, Yining Lin, and Yao Peng. 2018. AutoHighlight: Automatic Highlights Detection and Segmentation in Soccer Matches. In 2018 IEEE International Conference on Big Data (Big Data). IEEE, 4619--4624.Google ScholarGoogle Scholar
  28. Mostafa Tavassolipour, Mahmood Karimian, and Shohreh Kasaei. 2014. Event detection and summarization in soccer videos using bayesian network and copula. IEEE Transactions on circuits and systems for video technology , Vol. 24, 2 (2014), 291--304.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ruben Vroonen, Tom Decroos, Jan Van Haaren, and Jesse Davis. 2017. Predicting the Potential of Professional Soccer Players.. In MLSA@ PKDD/ECML . 1--10.Google ScholarGoogle Scholar
  30. Xinggang Wang, Yongluan Yan, Peng Tang, Xiang Bai, and Wenyu Liu. 2018. Revisiting multiple instance neural networks. Pattern Recognition , Vol. 74 (2018), 15--24.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Zengkai Wang, Junqing Yu, Yunfeng He, and Tao Guan. 2014. Affection arousal based highlight extraction for soccer video. Multimedia Tools and Applications , Vol. 73, 1 (2014), 519--546.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Huijuan Xu, Abir Das, and Kate Saenko. 2017. R-C3D: region convolutional 3d network for temporal activity detection. In IEEE Int. Conf. on Computer Vision (ICCV). 5794--5803.Google ScholarGoogle ScholarCross RefCross Ref
  33. Ke Zhang, Wei-Lun Chao, Fei Sha, and Kristen Grauman. 2016a. Summary transfer: Exemplar-based subset selection for video summarization. In Proceedings of the IEEE conference on computer vision and pattern recognition . 1059--1067.Google ScholarGoogle ScholarCross RefCross Ref
  34. Ke Zhang, Wei-Lun Chao, Fei Sha, and Kristen Grauman. 2016b. Video summarization with long short-term memory. In European conference on computer vision . Springer, 766--782.Google ScholarGoogle ScholarCross RefCross Ref
  35. Ke Zhang, Kristen Grauman, and Fei Sha. 2018. Retrospective Encoders for Video Summarization. In Proceedings of the European Conference on Computer Vision (ECCV). 383--399.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Bin Zhao, Xuelong Li, and Xiaoqiang Lu. 2017. Hierarchical recurrent neural network for video summarization. In Proceedings of the 25th ACM international conference on Multimedia. ACM, 863--871.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Bin Zhao, Xuelong Li, and Xiaoqiang Lu. 2018. Hsa-rnn: Hierarchical structure-adaptive rnn for video summarization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 7405--7414.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A Deep Architecture for Multimodal Summarization of Soccer Games

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          MMSports '19: Proceedings Proceedings of the 2nd International Workshop on Multimedia Content Analysis in Sports
          October 2019
          120 pages
          ISBN:9781450369114
          DOI:10.1145/3347318

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 15 October 2019

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate29of49submissions,59%

          Upcoming Conference

          MM '24
          MM '24: The 32nd ACM International Conference on Multimedia
          October 28 - November 1, 2024
          Melbourne , VIC , Australia