User preference-aware video highlight detection via deep reinforcement learning

Wang, Han; Wang, Kexin; Wu, Yuqing; Wang, Zhongzhi; Zou, Ling

doi:10.1007/s11042-020-08668-1

User preference-aware video highlight detection via deep reinforcement learning

Published: 20 February 2020

Volume 79, pages 15015–15024, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Han Wang¹,
Kexin Wang¹,
Yuqing Wu¹,
Zhongzhi Wang¹ &
…
Ling Zou²

310 Accesses
3 Citations
Explore all metrics

Abstract

Video highlight detection is a technique to retrieval short video clips that capture a user’s primary attention or interest within an unedited video. There exists a substantial interest in automatizing highlight detection to facilitate efficient video browsing. Recent research often focuses on objectively finding frames that are visual representative as well as diversity to form highlights. However, the user preferences are relatively subjective and may vary from person to person. Therefore, it is not trivial to find different highlights over a same video for different users. This paper describes a reinforcement learning-based framework that detects different highlights according to different user’s preferences. Under this framework, a novel reward function that accounts for relevance of user preference to candidate highlights is introduced. During training, the framework strives for earning higher rewards by learning to detect more diverse and more preference-aware highlights. The effectiveness of the proposed method is illustrated by applying it to different types of real world movies, and show it achieves state-of-the-art results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial intelligence in the creative industries: a review

Article Open access 02 July 2021

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

Video summarization using deep learning techniques: a detailed analysis and investigation

Article 15 March 2023

References

Araujo A, Girod B (2018) Large-scale video retrieval using image queries. IEEE Trans Circ Sys Video Technol 28(6):1406–1420
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hosu IA, Rebedea T (2016) Playing atari games with deep reinforcement learning and human checkpoint replay. arXiv:1312.5602
Jianping G, Hongxing M, Weihua O, Shaoning Z, Yunbo R, Hebiao Y (2019) A generalized mean distance-based k-nearest neighbor classifier. Expert Syst Appl 115:356–372
Article Google Scholar
Kawai Y, Sumiyoshi H, Yagi N (2007) Automated production of tv program trailer using electronic program guide. In: Proceedings of the 6th ACM international conference on Image and video retrieval. ACM, pp 49–56
Koutras P, Zlatintsi A, Iosif E, Katsamanis A, Maragos P, Potamianos A (2015) Predicting audio-visual salient events based on visual, audio and text modalities for movie summarization. In: 2015 IEEE international conference on image processing (ICIP). IEEE, pp 4361–4365
Lan X, Wang H, Gong S, Zhu X (2017) Deep reinforcement learning attention selection for person re-identification, BMVC
Lei J, Luan Q, Song X, Liu X, Tao D, Song M (2018) Action parsing driven video summarization based on reinforcement learning. IEEE Trans Circ Sys Video Technol
Li Y (2017) Attention-aware deep reinforcement learning for video face recognition. In: ICCV 2017, pp 3951–3960
Li Y (2017) Deep reinforcement learning: an overview. arXiv:1701.07274
Li Y, Wang R, Huang Z, Shan S, Chen X (2015) Face video retrieval with image query via hashing across euclidean space and riemannian manifold. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4758–4767
Liu Q, Lu X, He Z, Zhang C, Chen W (2017) Deep convolutional neural networks for thermal infrared object tracking. Knowledge-Based Systems 134:189–198
Masumitsu K, Echigo T (2000) Video summarization using reinforcement learning in eigenspace. In: Proceedings 2000 international conference on image processing (Cat. No. 00CH37101), vol 2. IEEE, pp 267–270
Ou W, Yuan D, Liu Q, Cao Y (2018) Object tracking based on online representative sample selection via non-negative least square. Multimed Tools Appl 77 (9):10569–10587
Article Google Scholar
Quan Z, Yang W, Gao G, Ou W, Lu H, Jie C, Latecki LJ (2018) Multi-scale deep context convolutional neural networks for semantic segmentation. World Wide Web-Internet and Web Information Systems 22(7):1–16
Google Scholar
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Sharghi A, Laurel JS, Gong B (2017) Query-focused video summarization: dataset, evaluation, and a memory network based approach. In: IEEE conference on computer vision pattern recognition
Smith JR, Joshi D, Huet B, Hsu W, Cota J (2017) Harnessing ai for augmenting creativity: application to movie trailer creation. In: Proceedings of the 25th ACM international conference on multimedia. ACM, pp 1799–1808
Song X, Chen K, Lei J, Sun L, Wang Z, Xie L, Song M (2016) Category driven deep recurrent neural network for video summarization. IEEE Int Conf Multimed Expo Workshops
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. IEEE Trans Neural Netw 9(5):1054–1054
Article Google Scholar
Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300
Article Google Scholar
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhutdinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: ICML, pp 2048–2057
Yang H, Wang B, Lin S, Wipf D, Guo M, Guo B (2015) Unsupervised extraction of video highlights via robust recurrent auto-encoders. In: Proceedings of the IEEE international conference on computer vision, pp 4633–4641
Zhang K, Chao W-L, Sha F, Grauman K (2016) Video summarization with long short-term memory. In: European conference on computer vision. Springer, pp 766–782
Zhou K, Qiao Y, Xiang T (2018) Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In: Proceedings of the thirty-second AAAI conference on artificial intelligence, (AAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pp 7582–7589

Download references

Acknowledgements

This work was supported in part by the Fundamental Research Funds for the Central Universities (2018ZY03, 2015ZCQ-XX), the Natural Science Foundation of China (NSFC) (61703046) and Scientific Research Program of Beijing Education Commission (KM201910050001).

Author information

Authors and Affiliations

School of Information Science and Technology, Beijing Forestry University, Beijing, China
Han Wang, Kexin Wang, Yuqing Wu & Zhongzhi Wang
Digital Media School, Beijing Film Academy, Beijing, China
Ling Zou

Authors

Han Wang
View author publications
You can also search for this author in PubMed Google Scholar
Kexin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yuqing Wu
View author publications
You can also search for this author in PubMed Google Scholar
Zhongzhi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ling Zou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhongzhi Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, H., Wang, K., Wu, Y. et al. User preference-aware video highlight detection via deep reinforcement learning. Multimed Tools Appl 79, 15015–15024 (2020). https://doi.org/10.1007/s11042-020-08668-1

Download citation

Received: 05 June 2019
Revised: 30 November 2019
Accepted: 09 January 2020
Published: 20 February 2020
Issue Date: June 2020
DOI: https://doi.org/10.1007/s11042-020-08668-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

User preference-aware video highlight detection via deep reinforcement learning

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence in the creative industries: a review

Attention mechanisms in computer vision: A survey

Video summarization using deep learning techniques: a detailed analysis and investigation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

User preference-aware video highlight detection via deep reinforcement learning

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence in the creative industries: a review

Attention mechanisms in computer vision: A survey

Video summarization using deep learning techniques: a detailed analysis and investigation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation