Abstract
Video summarization is an ideal tool for skimming videos. Previous computational models extract explicit information from the input video, such as visual appearance, motion or audio information, in order to generate informative summaries. Eye gaze information, which is an implicit clue, has proved useful for indicating important content and the viewer’s interest. In this paper, we propose a novel gaze-aware deep learning model for video summarization. In our model, the position and velocity of the observers’ raw eye movements are processed by the deep neural network to indicate the users’ preferences. Experiments on two widely used video summarization datasets show that our model is more proficient than state-of-the-art methods in summarizing video for characterizing general preferences as well as for personal preferences. The results provide an innovative and improved algorithm for using gaze information in video summarization.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Chakraborty, P.R., Tjondronegoro, D., Zhang, L., Chandran, V.: Automatic identification of sports video highlights using viewer interest features. In: ICMR, pp. 55–62 (2016)
Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM TIST 2(3), 1–27 (2011)
Chuk, T., Chan, A., Hsiao, J.: Hidden markov model analysis reveals better eye movement strategies in face recognition. In: CogSci (2015)
Deng, J., et al.: Imagenet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)
Drucker, H., Burges, C.J.C., Kaufman, L., Smola, A.J., Vapnik, V.: Support vector regression machines. In: NIPS, pp. 155–161 (1997)
Gygli, M., Grabner, H., Riemenschneider, H., Van Gool, L.: Creating summaries from user videos. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 505–520. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_33
Gygli, M., Grabner, H., Van Gool, L.: Video summarization by learning submodular mixtures of objectives. In: CVPR (2015)
Holmberg, N., Holmqvist, K., Sandberg, H.: Children’s attention to online adverts is related to low-level saliency factors and individual level of gaze control. JEMR 8(2), 1–10 (2015)
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. CoRR abs/1408.5093 (2014)
Jiang, W., Cotton, C., Loui, A.C.: Automatic consumer video summarization by audio and visual analysis. In: ICMR, pp. 1–6 (2011)
Li, Y., Fathi, A., Rehg, J.M.: Learning to predict gaze in egocentric video. In: ICCV, pp. 3216–3223 (2013)
Liu, Y., Zhong, S.H., Li, W.: Query-oriented multi-document summarization via unsupervised deep learning. In: AAAI, pp. 1699–1705 (2012)
Mahasseni, B., Lam, M., Todorovic, S.: Unsupervised video summarization with adversarial LSTM networks. In: CVPR (2017)
Mishra, A.K., Aloimonos, Y., Cheong, L.F., Kassim, A.: Active visual segmentation. TPAMI 34(4), 639–653 (2012)
Papoutsaki, A., Sangkloy, P., Laskey, J., Daskalova, N., Huang, J., Hays, J.: Webgazer: Scalable webcam eye tracking using user interactions. In: IJCAI, pp. 3839–3845 (2016)
Salehin, M.M., Paul, M.: A novel framework for video summarization based on smooth pursuit information from eye tracker data. In: ICMR, pp. 692–697 (2017)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS, pp. 568–576 (2014)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)
Song, Y., Vallmitjana, J., Stent, A., Jaimes, A.: Tvsum: summarizing web videos using titles. In: CVPR, pp. 5179–5187 (2015)
Truong, B.T., Venkatesh, S.: Video abstraction: a systematic review and classification. ACM TOMM 3(1), 1–37 (2007)
Wu, J., Zhong, S.H., Jiang, J., Yang, Y.: A novel clustering method for static video summarization. MTAP 76(7), 9625–9641 (2017)
Wu, J., Zhong, S.H., Ma, Z., Heinen, S.J., Jiang, J.: Foveated convolutional neural networks for video summarization. MTAP (2018)
Xu, J., Mukherjee, L., Li, Y., Warner, J., Rehg, J.M., Singh, V.: Gaze-enabled egocentric video summarization via constrained submodular maximization. In: CVPR, pp. 2235–2244 (2015)
Yao, T., Mei, T., Rui, Y.: Highlight detection with pairwise deep ranking for first-person video summarization. In: CVPR, pp. 982–990 (2016)
Zhang, B., Wang, L., Wang, Z., Qiao, Y., Wang, H.: Real-time action recognition with enhanced motion vector CNNs. In: CVPR, pp. 2718–2726 (2016)
Zhang, K., Chao, Wei, L., Sha, F., Grauman, K.: Summary transfer: exemplar-based subset selection for video summarization. In: CVPR (2016)
Zhang, K., Chao, W.-L., Sha, F., Grauman, K.: Video summarization with long short-term memory. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 766–782. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_47
Zhong, S.H., Liu, Y., Li, B., Long, J.: Query-oriented unsupervised multi-document summarization via deep learning model. ESWA 42(21), 8146–8155 (2015)
Zhong, S.H., Liu, Y., Liu, Y.: Bilinear deep learning for image classification. In: ACM MM, pp. 343–352 (2011)
Acknowledgments
This work was supported by the National Natural Science Foundation of China (No. 61502311, No. 61620106008), the Natural Science Foundation of Guangdong Province (No. 2016A030310053, 2016A030310039, 2017A030310521), the Science and Technology Innovation Commission of Shenzhen under Grant (No. JCYJ2016 0422151736824), Shenzhen Emerging Industries of the Strategic Basic Research Project under Grant (No. JCYJ20160226191842793), the Shenzhen high-level overseas talents program, the Tencent ‘‘Rhinoceros Birds’’- Scientific Research Foundation for Young Teachers of Shenzhen University (2016), the National Institutes of Health Grant (5T32EY025201-03), and the Smith-Kettlewell Eye Research Institute Grant.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Wu, J., Zhong, Sh., Ma, Z., Heinen, S.J., Jiang, J. (2018). Gaze Aware Deep Learning Model for Video Summarization. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11165. Springer, Cham. https://doi.org/10.1007/978-3-030-00767-6_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-00767-6_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00766-9
Online ISBN: 978-3-030-00767-6
eBook Packages: Computer ScienceComputer Science (R0)