research-article

A Deep Architecture for Multimodal Summarization of Soccer Games

Authors:
Melissa Sanabria

Université Côte d'Azur, CNRS, I3S, Sophia Antipolis, France

Université Côte d'Azur, CNRS, I3S, Sophia Antipolis, France
View Profile

,
Sherly

Université Côte d'Azur, CNRS, I3S, Sophia Antipolis, France

Université Côte d'Azur, CNRS, I3S, Sophia Antipolis, France
View Profile

,
Frédéric Precioso

Université Côte d'Azur, CNRS, I3S, Sophia Antipolis, France

Université Côte d'Azur, CNRS, I3S, Sophia Antipolis, France
View Profile

,
Thomas Menguy

Wildmoka, Sophia Antipolis, France

Wildmoka, Sophia Antipolis, France
View Profile

MMSports '19: Proceedings Proceedings of the 2nd International Workshop on Multimedia Content Analysis in SportsOctober 2019Pages 16–24https://doi.org/10.1145/3347318.3355524

Published:15 October 2019Publication History

MMSports '19: Proceedings Proceedings of the 2nd International Workshop on Multimedia Content Analysis in Sports

Pages 16–24

ABSTRACT

The massive growth of sports videos, specially in soccer, has resulted in a need for the automatic generation of summaries, where the objective is not only to show the most important actions of the match but also to elicit as much emotion as the ones bring upon by human editors. State-of-the-art methods on video summarization mostly rely on video processing, however this is not an optimal approach for long videos such as soccer matches. In this paper we propose a multimodal approach to automatically generate summaries of soccer match videos that consider both event and audio features. The event features get a shorter and better representation of the match, and the audio helps detect the excitement generated by the game. Our method consists of three consecutive stages: Proposals, Summarization and Content Refinement. The first one generates summary proposals, using Multiple Instance Learning to deal with the similarity between the events inside the summary and the rest of the match. The Summarization stage uses event and audio features as input of a hierarchical Recurrent Neural Network to decide which proposals should indeed be in the summary. And the last stage, takes advantage of the visual content to create the final summary. The results show that our approach outperforms by a large margin not only the video processing methods but also methods that use event and audio features.

References

Vinay Bettadapura, Caroline Pantofaru, and Irfan Essa. 2016. Leveraging contextual cues for generating basketball highlights. In Proceedings of the 24th ACM international conference on Multimedia. ACM, 908--917.Google ScholarDigital Library
Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem, and Juan Carlos Niebles. 2015. Activitynet: A large-scale video benchmark for human activity understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 961--970.Google ScholarCross Ref
Deepayan Chakrabarti and Kunal Punera. 2011. Event summarization using tweets. In Fifth International AAAI Conference on Weblogs and Social Media .Google Scholar
David Corney, Carlos Martin, and Ayse Göker. 2014. Two Sides to Every Story: Subjective Event Summarization of Sports Events using Twitter.. In SoMuS@ ICMR. Citeseer.Google Scholar
Tom Decroos, Vladimir Dzyuba, Jan Van Haaren, and Jesse Davis. 2017. Predicting soccer highlights from spatio-temporal match event streams. In Thirty-First AAAI Conference on Artificial Intelligence .Google ScholarDigital Library
Tom Decroos, Jan Van Haaren, and Jesse Davis. 2018. Automatic Discovery of Tactics in Spatio-Temporal Soccer Match Data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 223--232.Google ScholarDigital Library
Thomas G Dietterich, Richard H Lathrop, and Tomás Lozano-Pérez. 1997. Solving the multiple instance problem with axis-parallel rectangles. Artificial intelligence , Vol. 89, 1--2 (1997), 31--71.Google Scholar
Ahmet Ekin, A Murat Tekalp, and Rajiv Mehrotra. 2003. Automatic soccer video analysis and summarization. IEEE Transactions on Image processing , Vol. 12, 7 (2003), 796--807.Google ScholarDigital Library
Mohamed Y Eldib, Bassam S Abou Zaid, Hossam M Zawbaa, Mohamed El-Zahar, and Motaz El-Saban. 2009. Soccer video summarization using enhanced logo detection. In 2009 16th IEEE International Conference on Image Processing (ICIP). IEEE, 4345--4348.Google ScholarCross Ref
Guilherme Fi ao, Teresa Rom ao, Nuno Correia, Pedro Centieiro, and A Eduardo Dias. 2016. Automatic Generation of Sport Video Highlights Based on Fan's Emotions and Content. In Proceedings of the 13th International Conference on Advances in Computer Entertainment Technology. ACM, 29.Google Scholar
Michael Gygli, Helmut Grabner, and Luc Van Gool. 2015. Video summarization by learning submodular mixtures of objectives. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 3090--3098.Google ScholarCross Ref
Yu Gang Jiang, Jingen Liu, A Roshan Zamir, George Toderici, Ivan Laptev, Mubarak Shah, and Rahul Sukthankar. 2014. THUMOS challenge: Action recognition with a large number of classes.Google Scholar
Xuelong Li, Bin Zhao, and Xiaoqiang Lu. 2017. A general framework for edited video and raw video summarization. IEEE Transactions on Image Processing , Vol. 26, 8 (2017), 3652--3664.Google ScholarDigital Library
Tingxi Liu, Yao Lu, Xiaoyu Lei, Lijing Zhang, Haoyu Wang, Wei Huang, and Zijian Wang. 2017. Soccer video event detection using 3D convolutional networks and shot boundary detection via deep feature distance. In International Conference on Neural Information Processing. Springer, 440--449.Google ScholarDigital Library
Behrooz Mahasseni, Michael Lam, and Sinisa Todorovic. 2017. Unsupervised video summarization with adversarial lstm networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition . 202--211.Google ScholarCross Ref
Engin Mendi, Hélio B Clemente, and Coskun Bayrak. 2013. Sports video summarization based on motion analysis. Computers & Electrical Engineering , Vol. 39, 3 (2013), 790--796.Google ScholarDigital Library
Michele Merler, Dhiraj Joshi, Quoc-Bao Nguyen, Stephen Hammer, John Kent, John R Smith, and Rogerio S Feris. 2017. Automatic curation of golf highlights using multimodal excitement features. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) . IEEE, 57--65.Google ScholarCross Ref
Ngoc Nguyen and Atsuo Yoshitaka. 2014. Soccer video summarization based on cinematography and motion analysis. In 2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP). IEEE, 1--6.Google ScholarCross Ref
Arnau Raventos, Raul Quijada, Luis Torres, and Francesc Tarrés. 2015. Automatic summarization of soccer highlights using audio-visual descriptors. SpringerPlus , Vol. 4, 1 (2015), 301.Google ScholarCross Ref
Yong Rui, Anoop Gupta, and Alex Acero. 2000. Automatically extracting highlights for TV baseball programs. In Proceedings of the eighth ACM international conference on Multimedia. ACM, 105--115.Google ScholarDigital Library
Bertram Scharf. 1970. Critical bands. Foundation of modern auditory theory , Vol. 1 (1970), 159--202.Google Scholar
Pushkar Shukla, Hemant Sadana, Apaar Bansal, Deepak Verma, Carlos Elmadjian, Balasubramanian Raman, and Matthew Turk. 2018. Automatic Cricket Highlight generation using Event-Driven and Excitement-Based features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 1800--1808.Google ScholarCross Ref
Mohamad-Hoseyn Sigari, Hamid Soltanian-Zadeh, and Hamid-Reza Pourreza. 2015. Fast highlight detection and scoring for broadcast soccer video summarization using on-demand feature extraction and fuzzy inference. International Journal of Computer Graphics , Vol. 6, 1 (2015), 13--36.Google ScholarCross Ref
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.Google ScholarCross Ref
Anthony Tang and Sebastian Boring. 2012. # EpicPlay: Crowd-sourcing sports video highlights. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1569--1572.Google ScholarDigital Library
Hao Tang, Vivek Kwatra, Mehmet Emre Sargin, and Ullas Gargi. 2011. Detecting highlights in sports videos: Cricket as a test case. In 2011 IEEE International Conference on Multimedia and Expo. IEEE, 1--6.Google ScholarDigital Library
Kaiyu Tang, Yixin Bao, Zhijian Zhao, Liang Zhu, Yining Lin, and Yao Peng. 2018. AutoHighlight: Automatic Highlights Detection and Segmentation in Soccer Matches. In 2018 IEEE International Conference on Big Data (Big Data). IEEE, 4619--4624.Google Scholar
Mostafa Tavassolipour, Mahmood Karimian, and Shohreh Kasaei. 2014. Event detection and summarization in soccer videos using bayesian network and copula. IEEE Transactions on circuits and systems for video technology , Vol. 24, 2 (2014), 291--304.Google ScholarDigital Library
Ruben Vroonen, Tom Decroos, Jan Van Haaren, and Jesse Davis. 2017. Predicting the Potential of Professional Soccer Players.. In MLSA@ PKDD/ECML . 1--10.Google Scholar
Xinggang Wang, Yongluan Yan, Peng Tang, Xiang Bai, and Wenyu Liu. 2018. Revisiting multiple instance neural networks. Pattern Recognition , Vol. 74 (2018), 15--24.Google ScholarDigital Library
Zengkai Wang, Junqing Yu, Yunfeng He, and Tao Guan. 2014. Affection arousal based highlight extraction for soccer video. Multimedia Tools and Applications , Vol. 73, 1 (2014), 519--546.Google ScholarDigital Library
Huijuan Xu, Abir Das, and Kate Saenko. 2017. R-C3D: region convolutional 3d network for temporal activity detection. In IEEE Int. Conf. on Computer Vision (ICCV). 5794--5803.Google ScholarCross Ref
Ke Zhang, Wei-Lun Chao, Fei Sha, and Kristen Grauman. 2016a. Summary transfer: Exemplar-based subset selection for video summarization. In Proceedings of the IEEE conference on computer vision and pattern recognition . 1059--1067.Google ScholarCross Ref
Ke Zhang, Wei-Lun Chao, Fei Sha, and Kristen Grauman. 2016b. Video summarization with long short-term memory. In European conference on computer vision . Springer, 766--782.Google ScholarCross Ref
Ke Zhang, Kristen Grauman, and Fei Sha. 2018. Retrospective Encoders for Video Summarization. In Proceedings of the European Conference on Computer Vision (ECCV). 383--399.Google ScholarDigital Library
Bin Zhao, Xuelong Li, and Xiaoqiang Lu. 2017. Hierarchical recurrent neural network for video summarization. In Proceedings of the 25th ACM international conference on Multimedia. ACM, 863--871.Google ScholarDigital Library
Bin Zhao, Xuelong Li, and Xiaoqiang Lu. 2018. Hsa-rnn: Hierarchical structure-adaptive rnn for video summarization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 7405--7414.Google ScholarCross Ref

Index Terms

A Deep Architecture for Multimodal Summarization of Soccer Games
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Video summarization
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Neural networks

Recommendations

A Graph-Based Method for Soccer Action Spotting Using Unsupervised Player Classification
MMSports '22: Proceedings of the 5th International ACM Workshop on Multimedia Content Analysis in Sports

Action spotting in soccer videos is the task of identifying the specific time when a certain key action of the game occurs. Lately, it has received a large amount of attention and powerful methods have been introduced. Action spotting involves ...
Read More
Multimodal latent topic analysis for image collection summarization

We present a new multimodal image collection summarization method.The summarization method is based on latent topic analysis.Textual and visual modalities are fused in the same latent space using convex non-negative matrix factorization.The obtained ...
Read More
Automatic summarization of music videos

In this article, we propose a novel approach for automatic music video summarization. The proposed summarization scheme is different from the current methods used for video summarization. The music video is separated into the music track and video ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MMSports '19: Proceedings Proceedings of the 2nd International Workshop on Multimedia Content Analysis in Sports
October 2019
120 pages
ISBN:9781450369114
DOI:10.1145/3347318
General Chairs:
Rainer Lienhart
University of Augsburg, Germany
,
Thomas B. Moeslund
Aalborg University, Denmark
,
Hideo Saito
Keio University, Japan
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 October 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
long short-term memory
multimodal analysis
multiple instance learning
sports summarization
video summarization
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate29of49submissions,59%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 20
  Total Citations
  View Citations
- 18
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

A Deep Architecture for Multimodal Summarization of Soccer Games

MMSports '19: Proceedings Proceedings of the 2nd International Workshop on Multimedia Content Analysis in Sports

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Graph-Based Method for Soccer Action Spotting Using Unsupervised Player Classification

Multimodal latent topic analysis for image collection summarization

Automatic summarization of music videos

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

Digital Edition

Caption

A Deep Architecture for Multimodal Summarization of Soccer Games

MMSports '19: Proceedings Proceedings of the 2nd International Workshop on Multimedia Content Analysis in Sports

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Graph-Based Method for Soccer Action Spotting Using Unsupervised Player Classification

Multimodal latent topic analysis for image collection summarization

Automatic summarization of music videos

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

Digital Edition

Share this Publication link

Share on Social Media