research-article

A Deep Learning Model for Extracting Live Streaming Video Highlights using Audience Messages

Authors:
Hung-Kuang Han

National Taiwan University, Taipei, Taiwan(R.O.C.)

National Taiwan University, Taipei, Taiwan(R.O.C.)
View Profile

,
Yu-Chen Huang

National Taiwan University, Taipei, Taiwan(R.O.C.)

National Taiwan University, Taipei, Taiwan(R.O.C.)
View Profile

,
Chien Chin Chen

National Taiwan University, Taipei, Taiwan(R.O.C.)

National Taiwan University, Taipei, Taiwan(R.O.C.)
View Profile

AICCC '19: Proceedings of the 2019 2nd Artificial Intelligence and Cloud Computing ConferenceDecember 2019Pages 75–81https://doi.org/10.1145/3375959.3375965

Published:16 February 2020Publication History

AICCC '19: Proceedings of the 2019 2nd Artificial Intelligence and Cloud Computing Conference

Pages 75–81

ABSTRACT

Live streaming has become a ubiquitous channel for people to learn new happenings. Although live streaming videos generally attract a large audience of watchers, their contents are long and contain relatively unexciting stretches of knowledge transmission. This observation has prompted artificial intelligence researchers to establish advanced models that automatically extract highlights from live streaming videos. Most streaming highlight extraction research has been based on visual analysis of video frames, and seldom have studies considered the messages posted by the audiences. In this paper, we propose a deep learning model that examines the messages posted by streaming audiences. The video segments whose messages reveal audience excitement are extracted to compose the highlights of a streaming video. We evaluate our model in terms of multiple Twitch streaming channels. The precision of our highlight extraction model is 51.3% and is superior to several baseline methods.

References

BARBIERI, F., ANKE, L.E., BALLESTEROS, M., SOLER, J., and SAGGION, H., 2017. Towards the understanding of gaming audiences by modeling twitch emotes. In Proceedings of the 3rd Workshop on Noisy User-generated Text, 11--20.Google ScholarCross Ref
CHUNG, J., GULCEHRE, C., CHO, K., and BENGIO, Y., 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. In NIPS 2014 Deep Learning and Representation Learning Workshop.Google Scholar
DUPREZ, C., CHRISTOPHE, V.R., RIMÃ©, B., CONGARD, A., and ANTOINE, P., 2015. Motives for the social sharing of an emotional experience. Journal of Social and Personal Relationships 32, 6, 757--787.Google ScholarCross Ref
ESCORCIA, V., HEILBRON, F.C., NIEBLES, J.C., and GHANEM, B., 2016. Daps: Deep action proposals for action understanding. In European Conference on Computer Vision Springer, 768--784.Google ScholarCross Ref
FU, C.-Y., LEE, J., BANSAL, M., and BERG, A.C., 2017. Video highlight prediction using audience chat reactions. arXiv preprint arXiv: 1707.08559.Google Scholar
GOYAL, S., 2016. Sentimental Analysis of Twitter Data using Text Mining and Hybrid Classification Approach. International Journal of Advance Research, Ideas and Innovations in Technology 2, z5, 2454-2132X.Google Scholar
HUA, X.-S., LU, L., ZHANG, H.-J., and DISTRICT, H., 2005. A generic framework of user attention model and its application in video summarization. IEEE Transaction on multimedia 7, 5, 907--919.Google ScholarDigital Library
JIAO, Y., LI, Z., HUANG, S., YANG, X., LIU, B., and ZHANG, T., 2018. Three-Dimensional Attention-Based Deep Ranking Model for Video Highlight Detection. IEEE Transactions on Multimedia 20, 10, 2693--2705.Google ScholarCross Ref
KRIZHEVSKY, A., SUTSKEVER, I., and HINTON, G.E., 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, 1097--1105.Google Scholar
LIPTON, Z.C., BERKOWITZ, J., and ELKAN, C., 2015. A critical review of recurrent neural networks for sequence learning. In arXiv preprint arXiv: 1506.00019.Google Scholar
MIKOLOV, T., SUTSKEVER, I., CHEN, K., CORRADO, G.S., and DEAN, J., 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, 3111--3119.Google Scholar
MOHER, M., 1993. Decoding via cross-entropy minimization. In Proceedings of GLOBECOM'93. IEEE Global Telecommunications Conference IEEE, 809--813.Google ScholarCross Ref
NEPAL, S., SRINIVASAN, U., and REYNOLDS, G., 2001. Automatic detection of Goal'segments in basketball videos. In Proceedings of the ninth ACM international conference on Multimedia ACM, 261--269.Google ScholarDigital Library
NGO, C.-W., MA, Y.-F., and ZHANG, H.-J., 2005. Video summarization and scene detection by graph modeling. IEEE Transactions on Circuits and Systems for Video Technology 15, 2, 296--305.Google ScholarDigital Library
OTSUKA, I., NAKANE, K., DIVAKARAN, A., HATANAKA, K., and OGAWA, M., 2005. A highlight scene detection and video summarization system using audio feature for a personal video recorder. IEEE Transactions on Consumer Electronics 51, 1, 112--116.Google ScholarDigital Library
RINGER, C. and NICOLAOU, M.A., 2018. Deep unsupervised multi-view detection of video game stream highlights. In Proceedings of the 13th International Conference on the Foundations of Digital Games ACM, 15.Google ScholarDigital Library
ROCHAN, M., YE, L., and WANG, Y., 2018. Video summarization using fully convolutional sequence networks. In Proceedings of the European Conference on Computer Vision (ECCV), 347--363.Google ScholarDigital Library
RUI, Y., GUPTA, A., and ACERO, A., 2000. Automatically extracting highlights for TV baseball programs. In Proceedings of the eighth ACM international conference on Multimedia ACM, 105--115.Google ScholarDigital Library
SCHUSTER, M. and PALIWAL, K.K., 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45, 11, 2673--2681.Google ScholarDigital Library
SIGARI, M.-H., SOLTANIAN-ZADEH, H., and POURREZA, H.-R., 2015. Fast highlight detection and scoring for broadcast soccer video summarization using on-demand feature extraction and fuzzy inference. International Journal of Computer Graphics 6, 1, 13--36.Google ScholarCross Ref
SONG, Y., 2016. Real-time video highlights for yahoo esports. In NIPS Workshop, LSCVS 2016.Google Scholar
SUN, M., FARHADI, A., and SEITZ, S., 2014. Ranking domain-specific highlights by analyzing edited videos. In European conference on computer vision Springer, 787--802.Google ScholarCross Ref
TANG, A. and BORING, S., 2012. #EpicPlay: crowd-sourcing sports video highlights. In Proceedings of the Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Austin, Texas, USA2012), ACM, 2208622, 1569--1572. DOI= http://dx.doi.org/10.1145/2207676.2208622.Google ScholarDigital Library
TJONDRONEGORO, D., CHEN, Y.-P.P., and PHAM, B., 2004. Highlights for more complete sports video summarization. IEEE multimedia 11, 4, 22--37.Google Scholar
TRUONG, B.T. and VENKATESH, S., 2007. Video abstraction: A systematic review and classification. ACM transactions on multimedia computing, communications, and applications (TOMM) 3, 1, 3.Google Scholar
WEBB, G.I. and CONILIONE, P., 2005. Estimating bias and variance from data. In Pre-publication manuscript (http./www.csse.monash.edu/webb/-Files/WebbConilione06.pdf).Google Scholar
XU, C., WANG, J., WAN, K., LI, Y., and DUAN, L., 2006. Live sports event detection based on broadcast video and web-casting text. In Proceedings of the 14th ACM international conference on Multimedia ACM, 221--230.Google ScholarDigital Library
XU, H., DAS, A., and SAENKO, K., 2017. R-c3d: Region convolutional 3d network for temporal activity detection. In Proceedings of the IEEE international conference on computer vision, 5783--5792.Google ScholarCross Ref
YANG, H., WANG, B., LIN, S., WIPF, D., GUO, M., and GUO, B., 2015. Unsupervised extraction of video highlights via robust recurrent auto-encoders. In Proceedings of the IEEE international conference on computer vision, 4633--4641.Google ScholarDigital Library
YAO, T., MEI, T., and RUI, Y., 2016. Highlight detection with pairwise deep ranking for first-person video summarization. In Proceedings of the IEEE conference on computer vision and pattern recognition, 982--990.Google ScholarCross Ref
ZHANG, B., DOU, W., and CHEN, L., 2006. Combining short and long term audio features for TV sports highlight detection. In European Conference on Information Retrieval Springer, 472--475.Google ScholarDigital Library

Index Terms

A Deep Learning Model for Extracting Live Streaming Video Highlights using Audience Messages
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Multi-camera Live Video Streaming over Wireless Network
Advances in Mobile Computing and Multimedia Intelligence
Abstract
Due to the development of wireless communication technology, more and more streamers are using cameras mounted on mobile devices for live streaming in a wireless LAN environment. Conventional live streaming systems, which employ multiple images ...
Read More
Live Streaming as Co-Performance: Dynamics between Center and Periphery in Theatrical Engagement

Live streaming is a highly participatory form of performance, involving various types of audience participation such as liking, commenting, and gifting. But how do streamers and audiences collaborate to deliver live streaming performances? We approach ...
Read More
A measurement study of YouTube 360° live video streaming
NOSSDAV '19: Proceedings of the 29th ACM Workshop on Network and Operating Systems Support for Digital Audio and Video

360° live video streaming is becoming increasingly popular. While providing viewers with enriched experience, 360° live video streaming is challenging to achieve since it requires a significantly higher bandwidth and a powerful computation ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

AICCC '19: Proceedings of the 2019 2nd Artificial Intelligence and Cloud Computing Conference
December 2019
216 pages
ISBN:9781450372633
DOI:10.1145/3375959

Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 February 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Bidirectional GRU
Crowdsourcing
Highlight Extraction
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 261
  Total Downloads
- Downloads (Last 12 months)33
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Deep Learning Model for Extracting Live Streaming Video Highlights using Audience Messages

AICCC '19: Proceedings of the 2019 2nd Artificial Intelligence and Cloud Computing Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multi-camera Live Video Streaming over Wireless Network

Live Streaming as Co-Performance: Dynamics between Center and Periphery in Theatrical Engagement

A measurement study of YouTube 360° live video streaming

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A Deep Learning Model for Extracting Live Streaming Video Highlights using Audience Messages

AICCC '19: Proceedings of the 2019 2nd Artificial Intelligence and Cloud Computing Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multi-camera Live Video Streaming over Wireless Network

Live Streaming as Co-Performance: Dynamics between Center and Periphery in Theatrical Engagement

A measurement study of YouTube 360° live video streaming

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media