Skip to main content
Log in

User preference-aware video highlight detection via deep reinforcement learning

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Video highlight detection is a technique to retrieval short video clips that capture a user’s primary attention or interest within an unedited video. There exists a substantial interest in automatizing highlight detection to facilitate efficient video browsing. Recent research often focuses on objectively finding frames that are visual representative as well as diversity to form highlights. However, the user preferences are relatively subjective and may vary from person to person. Therefore, it is not trivial to find different highlights over a same video for different users. This paper describes a reinforcement learning-based framework that detects different highlights according to different user’s preferences. Under this framework, a novel reward function that accounts for relevance of user preference to candidate highlights is introduced. During training, the framework strives for earning higher rewards by learning to detect more diverse and more preference-aware highlights. The effectiveness of the proposed method is illustrated by applying it to different types of real world movies, and show it achieves state-of-the-art results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Araujo A, Girod B (2018) Large-scale video retrieval using image queries. IEEE Trans Circ Sys Video Technol 28(6):1406–1420

    Article  Google Scholar 

  2. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  3. Hosu IA, Rebedea T (2016) Playing atari games with deep reinforcement learning and human checkpoint replay. arXiv:1312.5602

  4. Jianping G, Hongxing M, Weihua O, Shaoning Z, Yunbo R, Hebiao Y (2019) A generalized mean distance-based k-nearest neighbor classifier. Expert Syst Appl 115:356–372

    Article  Google Scholar 

  5. Kawai Y, Sumiyoshi H, Yagi N (2007) Automated production of tv program trailer using electronic program guide. In: Proceedings of the 6th ACM international conference on Image and video retrieval. ACM, pp 49–56

  6. Koutras P, Zlatintsi A, Iosif E, Katsamanis A, Maragos P, Potamianos A (2015) Predicting audio-visual salient events based on visual, audio and text modalities for movie summarization. In: 2015 IEEE international conference on image processing (ICIP). IEEE, pp 4361–4365

  7. Lan X, Wang H, Gong S, Zhu X (2017) Deep reinforcement learning attention selection for person re-identification, BMVC

  8. Lei J, Luan Q, Song X, Liu X, Tao D, Song M (2018) Action parsing driven video summarization based on reinforcement learning. IEEE Trans Circ Sys Video Technol

  9. Li Y (2017) Attention-aware deep reinforcement learning for video face recognition. In: ICCV 2017, pp 3951–3960

  10. Li Y (2017) Deep reinforcement learning: an overview. arXiv:1701.07274

  11. Li Y, Wang R, Huang Z, Shan S, Chen X (2015) Face video retrieval with image query via hashing across euclidean space and riemannian manifold. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4758–4767

  12. Liu Q, Lu X, He Z, Zhang C, Chen W (2017) Deep convolutional neural networks for thermal infrared object tracking. Knowledge-Based Systems 134:189–198

  13. Masumitsu K, Echigo T (2000) Video summarization using reinforcement learning in eigenspace. In: Proceedings 2000 international conference on image processing (Cat. No. 00CH37101), vol 2. IEEE, pp 267–270

  14. Ou W, Yuan D, Liu Q, Cao Y (2018) Object tracking based on online representative sample selection via non-negative least square. Multimed Tools Appl 77 (9):10569–10587

    Article  Google Scholar 

  15. Quan Z, Yang W, Gao G, Ou W, Lu H, Jie C, Latecki LJ (2018) Multi-scale deep context convolutional neural networks for semantic segmentation. World Wide Web-Internet and Web Information Systems 22(7):1–16

    Google Scholar 

  16. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  17. Sharghi A, Laurel JS, Gong B (2017) Query-focused video summarization: dataset, evaluation, and a memory network based approach. In: IEEE conference on computer vision pattern recognition

  18. Smith JR, Joshi D, Huet B, Hsu W, Cota J (2017) Harnessing ai for augmenting creativity: application to movie trailer creation. In: Proceedings of the 25th ACM international conference on multimedia. ACM, pp 1799–1808

  19. Song X, Chen K, Lei J, Sun L, Wang Z, Xie L, Song M (2016) Category driven deep recurrent neural network for video summarization. IEEE Int Conf Multimed Expo Workshops

  20. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. IEEE Trans Neural Netw 9(5):1054–1054

    Article  Google Scholar 

  21. Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300

    Article  Google Scholar 

  22. Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhutdinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: ICML, pp 2048–2057

  23. Yang H, Wang B, Lin S, Wipf D, Guo M, Guo B (2015) Unsupervised extraction of video highlights via robust recurrent auto-encoders. In: Proceedings of the IEEE international conference on computer vision, pp 4633–4641

  24. Zhang K, Chao W-L, Sha F, Grauman K (2016) Video summarization with long short-term memory. In: European conference on computer vision. Springer, pp 766–782

  25. Zhou K, Qiao Y, Xiang T (2018) Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In: Proceedings of the thirty-second AAAI conference on artificial intelligence, (AAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pp 7582–7589

Download references

Acknowledgements

This work was supported in part by the Fundamental Research Funds for the Central Universities (2018ZY03, 2015ZCQ-XX), the Natural Science Foundation of China (NSFC) (61703046) and Scientific Research Program of Beijing Education Commission (KM201910050001).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhongzhi Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, H., Wang, K., Wu, Y. et al. User preference-aware video highlight detection via deep reinforcement learning. Multimed Tools Appl 79, 15015–15024 (2020). https://doi.org/10.1007/s11042-020-08668-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-08668-1

Keywords

Navigation