Gaze Aware Deep Learning Model for Video Summarization

Wu, Jiaxin; Zhong, Sheng-hua; Ma, Zheng; Heinen, Stephen J.; Jiang, Jianmin

doi:10.1007/978-3-030-00767-6_27

Gaze Aware Deep Learning Model for Video Summarization

Jiaxin Wu¹⁸,
Sheng-hua Zhong¹⁸,
Zheng Ma¹⁹,
Stephen J. Heinen¹⁹ &
…
Jianmin Jiang¹⁸

Conference paper
First Online: 19 September 2018

2486 Accesses
6 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11165))

Abstract

Video summarization is an ideal tool for skimming videos. Previous computational models extract explicit information from the input video, such as visual appearance, motion or audio information, in order to generate informative summaries. Eye gaze information, which is an implicit clue, has proved useful for indicating important content and the viewer’s interest. In this paper, we propose a novel gaze-aware deep learning model for video summarization. In our model, the position and velocity of the observers’ raw eye movements are processed by the deep neural network to indicate the users’ preferences. Experiments on two widely used video summarization datasets show that our model is more proficient than state-of-the-art methods in summarizing video for characterizing general preferences as well as for personal preferences. The results provide an innovative and improved algorithm for using gaze information in video summarization.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://fortunelords.com/youtube-statistics/.

References

Chakraborty, P.R., Tjondronegoro, D., Zhang, L., Chandran, V.: Automatic identification of sports video highlights using viewer interest features. In: ICMR, pp. 55–62 (2016)
Google Scholar
Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM TIST 2(3), 1–27 (2011)
Article Google Scholar
Chuk, T., Chan, A., Hsiao, J.: Hidden markov model analysis reveals better eye movement strategies in face recognition. In: CogSci (2015)
Google Scholar
Deng, J., et al.: Imagenet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)
Google Scholar
Drucker, H., Burges, C.J.C., Kaufman, L., Smola, A.J., Vapnik, V.: Support vector regression machines. In: NIPS, pp. 155–161 (1997)
Google Scholar
Gygli, M., Grabner, H., Riemenschneider, H., Van Gool, L.: Creating summaries from user videos. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 505–520. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_33
Chapter Google Scholar
Gygli, M., Grabner, H., Van Gool, L.: Video summarization by learning submodular mixtures of objectives. In: CVPR (2015)
Google Scholar
Holmberg, N., Holmqvist, K., Sandberg, H.: Children’s attention to online adverts is related to low-level saliency factors and individual level of gaze control. JEMR 8(2), 1–10 (2015)
Google Scholar
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. CoRR abs/1408.5093 (2014)
Google Scholar
Jiang, W., Cotton, C., Loui, A.C.: Automatic consumer video summarization by audio and visual analysis. In: ICMR, pp. 1–6 (2011)
Google Scholar
Li, Y., Fathi, A., Rehg, J.M.: Learning to predict gaze in egocentric video. In: ICCV, pp. 3216–3223 (2013)
Google Scholar
Liu, Y., Zhong, S.H., Li, W.: Query-oriented multi-document summarization via unsupervised deep learning. In: AAAI, pp. 1699–1705 (2012)
Google Scholar
Mahasseni, B., Lam, M., Todorovic, S.: Unsupervised video summarization with adversarial LSTM networks. In: CVPR (2017)
Google Scholar
Mishra, A.K., Aloimonos, Y., Cheong, L.F., Kassim, A.: Active visual segmentation. TPAMI 34(4), 639–653 (2012)
Article Google Scholar
Papoutsaki, A., Sangkloy, P., Laskey, J., Daskalova, N., Huang, J., Hays, J.: Webgazer: Scalable webcam eye tracking using user interactions. In: IJCAI, pp. 3839–3845 (2016)
Google Scholar
Salehin, M.M., Paul, M.: A novel framework for video summarization based on smooth pursuit information from eye tracker data. In: ICMR, pp. 692–697 (2017)
Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS, pp. 568–576 (2014)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)
Google Scholar
Song, Y., Vallmitjana, J., Stent, A., Jaimes, A.: Tvsum: summarizing web videos using titles. In: CVPR, pp. 5179–5187 (2015)
Google Scholar
Truong, B.T., Venkatesh, S.: Video abstraction: a systematic review and classification. ACM TOMM 3(1), 1–37 (2007)
Article Google Scholar
Wu, J., Zhong, S.H., Jiang, J., Yang, Y.: A novel clustering method for static video summarization. MTAP 76(7), 9625–9641 (2017)
Google Scholar
Wu, J., Zhong, S.H., Ma, Z., Heinen, S.J., Jiang, J.: Foveated convolutional neural networks for video summarization. MTAP (2018)
Google Scholar
Xu, J., Mukherjee, L., Li, Y., Warner, J., Rehg, J.M., Singh, V.: Gaze-enabled egocentric video summarization via constrained submodular maximization. In: CVPR, pp. 2235–2244 (2015)
Google Scholar
Yao, T., Mei, T., Rui, Y.: Highlight detection with pairwise deep ranking for first-person video summarization. In: CVPR, pp. 982–990 (2016)
Google Scholar
Zhang, B., Wang, L., Wang, Z., Qiao, Y., Wang, H.: Real-time action recognition with enhanced motion vector CNNs. In: CVPR, pp. 2718–2726 (2016)
Google Scholar
Zhang, K., Chao, Wei, L., Sha, F., Grauman, K.: Summary transfer: exemplar-based subset selection for video summarization. In: CVPR (2016)
Google Scholar
Zhang, K., Chao, W.-L., Sha, F., Grauman, K.: Video summarization with long short-term memory. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 766–782. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_47
Chapter Google Scholar
Zhong, S.H., Liu, Y., Li, B., Long, J.: Query-oriented unsupervised multi-document summarization via deep learning model. ESWA 42(21), 8146–8155 (2015)
Google Scholar
Zhong, S.H., Liu, Y., Liu, Y.: Bilinear deep learning for image classification. In: ACM MM, pp. 343–352 (2011)
Google Scholar

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 61502311, No. 61620106008), the Natural Science Foundation of Guangdong Province (No. 2016A030310053, 2016A030310039, 2017A030310521), the Science and Technology Innovation Commission of Shenzhen under Grant (No. JCYJ2016 0422151736824), Shenzhen Emerging Industries of the Strategic Basic Research Project under Grant (No. JCYJ20160226191842793), the Shenzhen high-level overseas talents program, the Tencent ‘‘Rhinoceros Birds’’- Scientific Research Foundation for Young Teachers of Shenzhen University (2016), the National Institutes of Health Grant (5T32EY025201-03), and the Smith-Kettlewell Eye Research Institute Grant.

Author information

Authors and Affiliations

College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
Jiaxin Wu, Sheng-hua Zhong & Jianmin Jiang
The Smith-Kettlewell Eye Research Institute, San Francisco, CA, USA
Zheng Ma & Stephen J. Heinen

Authors

Jiaxin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Sheng-hua Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Ma
View author publications
You can also search for this author in PubMed Google Scholar
Stephen J. Heinen
View author publications
You can also search for this author in PubMed Google Scholar
Jianmin Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zheng Ma .

Editor information

Editors and Affiliations

Hefei University of Technology, Hefei, China
Richang Hong
National Chiao Tung University, Hsinchu, Taiwan
Wen-Huang Cheng
University of Tokyo, Tokyo, Japan
Toshihiko Yamasaki
Hefei University of Technology, Hefei, China
Meng Wang
City University of Hong Kong, Hong Kong, Hong Kong
Chong-Wah Ngo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, J., Zhong, Sh., Ma, Z., Heinen, S.J., Jiang, J. (2018). Gaze Aware Deep Learning Model for Video Summarization. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11165. Springer, Cham. https://doi.org/10.1007/978-3-030-00767-6_27

Download citation

DOI: https://doi.org/10.1007/978-3-030-00767-6_27
Published: 19 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00766-9
Online ISBN: 978-3-030-00767-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics