Abstract
In this paper, we focus on detecting highlights in online videos. Given the explosive growth of online videos, it is becoming increasingly important to single out those highlights for audiences instead of requiring them browsing every tedious part of the video. It is ideally that the contents of extracted highlights can be consistent with the topic of the video as well as the preference of the individual audience. To this end, this paper introduces a novel content-aware approach by formulating the highlights detection in a transfer learning framework. Under this framework. The experimental results on three different types of videos show that our content-aware highlight extraction method is particularly useful for online videos content fetching, e.g. showing the abstraction of the entire video while playing focus on the parts that matches the user queries.
Similar content being viewed by others
References
Azadi S, Feng J, Darrell T Learning detection with diverse proposals
Bacco R, Lambert P, Ionescu BE (2008) Video summarization from spatio-temporal features. In: ACM Trecvid video summarization workshop, pp 144–148
Dan BG, Curless B, Salesin D, Seitz SM (2006) Schematic storyboarding for video visualization and editing. ACM Transactions on Graphics (Proceedings of ACM SIGGRAPH 2006) 25(3):862–871
Deng J, Dong W, Socher R, Li LJ, Li K, Li FF (2009) Imagenet: a large-scale hierarchical image database. In: Computer vision and pattern recognition, pp 248–255
Ghosh J, Yong JL, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: Proceedings of the 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, 2012. pp 1346–1353
Gong B, Chao WL, Grauman K, Sha F (2014) Diverse sequential subset selection for supervised video summarization. In: International conference on neural information processing systems, pp 2069– 2077
Gygli M, Grabner H, Riemenschneider H, Gool LV (2014) Creating summaries from user videos. In: European conference on computer vision, pp 505–520
Gygli M, Grabner H, Gool LV (2015) Video summarization by learning submodular mixtures of objectives. In: Computer vision and pattern recognition
Gygli M, Song Y, Cao L (2016) Video2gif: automatic generation of animated gifs from video. In: Computer vision and pattern recognition, pp 1001–1009
Jiao Y, Yang X, Zhang T, Huang S, Xu C (2017) Video highlight detection via deep ranking modeling. Pacific-Rim Symposium on Image and Video Technology. Springer, Cham, pp 28–39
Joshi N, Kienzle W, Toelle M, Uyttendaele M, Cohen MF (2015) Real-time hyperlapse creation via optimal frame selection. ACM Trans Graph 34(4):63
Kulesza A, Taskar B (2012) Determinantal point processes for machine learning. Foundations and Trends in Machine Learning 5(2–3):17
Liu L, Cheng L, Liu Y, et al. (2016) Recognizing complex activities by a probabilistic interval-based model. In: Thirtieth AAAI conference on artificial intelligence, pp 1266–1272
Liu K, Liu W, Gan C, Tan M, Ma H (2018) T-c3d: temporal convolutional 3d network for real-time action recognition. In: Thirty-second AAAI conference on artificial intelligence
Liu T, Kender JR (2002) Optimization algorithms for the selection of key frame sequences of variable length. In: European conference on computer vision, pp 403–417
Lu Z, Grauman K (2013) Story-driven summarization for egocentric video. In: IEEE conference on computer vision and pattern recognition, pp 2714–2721
Mahasseni B, Lam M, Todorovic S (2017) Unsupervised video summarization with adversarial lstm networks. In: Conference on computer vision and pattern recognition
Nie L, Wang M, Zha ZJ, Chua TS (2012) Oracle in image search: a content-based approach to performance prediction. ACM Trans Inf Syst 30(2):1–23
Nie L, Xiang W, Zhang J, He X, Zhang H, Hong R, Qi T (2017) Enhancing micro-video understanding by harnessing external sounds. In: ACM on multimedia conference
Potapov D, Douze M, Harchaoui Z, Schmid C (2014) Category-specific video summarization. In: European conference on computer vision, pp 540–555
Rui Z, Sheng T, Wu L, Zhang Y, Li J (2016) Multi-modal tag localization for mobile video search. Multimedia Systems 23(6):713–724
Sharghi A, Gong B, Shah M (2016) Query-focused extractive video summarization. In: European conference on computer vision, pp 3–19
Sharghi A, Laurel JS, Gong B (2017) Query-focused video summarization: dataset, evaluation, and a memory network based approach. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2127–2136
Song J, Gao L, Guo Z, Liu W, Zhang D, Shen HT (2017) Hierarchical lstm with adjusted temporal attention for video captioning. In: International joint conference on artificial intelligence, pp 2737–2743
Sun M, Farhadi A, Seitz S (2014) Ranking domain-specific highlights by analyzing edited videos. In: European conference on computer vision, pp 787–802
Sun M, Zeng KH, Lin Y, Ali F (2017) Semantic highlight retrieval and term prediction. IEEE Trans Image Process 26(7):3303–3316
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2015) Rethinking the inception architecture for computer vision. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2818–2826
Truong BT, Venkatesh S (2007) Video abstraction: a systematic review and classification. ACM Trans Multimed Comput Commun Appl 3(1):3
Vasudevan AB, Gygli M, Volokitin A, Van Gool L (2017) Query-adaptive video summarization via quality-aware relevance estimation. In: 2017 ACM on multimedia conference, pp 582–590
Wang H, Yu H, Hua R, Zou L (2018) Video highlight extraction based on the interests of users. Journal of Image and Graphics 23(5):0748–0755
Wu L, Tao M, Zhang Y, Che C, Luo J (2015) Multi-task deep visual-semantic embedding for video thumbnail selection. In: Computer vision and pattern recognition
Xiong B, Grauman K (2014) Detecting snap points in egocentric video with a web photo prior. In: European conference on computer vision, pp 282–298
Yang H, Wang B, Lin S, Wipf D, Guo M, Guo B (2015) Unsupervised extraction of video highlights via robust recurrent auto-encoders. In: IEEE International conference on computer vision
Yao T, Mei T, Rui Y (2016) Highlight detection with pairwise deep ranking for first-person video summarization. In: Computer vision and pattern recognition, pp 982–990
Ye L, Nie L, Lei H, Zhang L, Rosenblum DS (2015) Action2activity: recognizing complex activities from sensor data. In: IJCAI
Ye L, Nie L, Li L, Rosenblum DS (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115
Yi R, Zhu C, Ping T , Lin S Faces as lighting probes via unsupervised deep highlight extraction
Yong JL, Ghosh J, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: IEEE conference on computer vision and pattern recognition, pp 1346–1353
Zhang CL, Luo JH, Wei XS, Wu J (2017) In defense of fully connected layers in visual representation transfer. In: Pacific-rim conference on multimedia
Zhang D, Han J, Jiang L, Ye S, Chang X (2017) Revealing event saliency in unconstrained video collection. IEEE Trans Image Process 26(4):1746–1758
Zhang K, Chao WL, Sha F, Grauman K (2016) Video summarization with long short-term memory. In: ECCV, pp 766–782
Zhao B, Xing EP (2014) Quasi real-time summarization for consumer videos. In: IEEE conference on computer vision and pattern recognition
Acknowledgments
The research was supported in part by the Natural Science Foundation of China (NSFC) under Grant No. 61703046.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Niu, K., Wang, H. Video highlight extraction via content-aware deep transfer. Multimed Tools Appl 78, 21133–21144 (2019). https://doi.org/10.1007/s11042-019-7442-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-7442-6