Video attention prediction using gaze saliency

Chen, Yanxiang; Tao, Gang; Xie, Qiangqiang; Song, Minglong

doi:10.1007/s11042-016-4294-1

Video attention prediction using gaze saliency

Published: 03 January 2017

Volume 78, pages 26867–26884, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yanxiang Chen ORCID: orcid.org/0000-0002-6923-306X¹,
Gang Tao²,
Qiangqiang Xie¹ &
…
Minglong Song¹

491 Accesses
4 Citations
Explore all metrics

Abstract

In recent years, the significant progress has been achieved in the field of visual saliency modeling. Our research key is in video saliency, which differs substantially from image saliency and could be better detected by adding the gaze information from the movement of eyes while people are looking at the video. In this paper we purposed a novel gaze saliency method to predict video attention, which is inspired by the widespread usage of mobile smart devices with camera. It is a non-contacted method to predict visual attention, and it does not bring the burden on the hardware. Our method first extracts the bottom-up saliency maps from the video frames, and then constructs the mapping from eye images obtained by the camera in synchronization with the video frames to the screen region. Finally the combination between top-down gaze information and bottom-up saliency maps is conducted by point-wise multiplication to predict the video attention. Furthermore, the proposed approach is validated on the two datasets: one is the public dataset MIT, the other is the dataset we collected, versus other four usual methods, and the experiment results show that our method achieves the state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

References

Ali B, Laurent I (2013) State-of-the-art in visual attention modeling. IEEE Trans Pattern Anal Mach Intell 35(1):185–207
Article Google Scholar
Chen Y, Pan D, Pan Y, Liu S, Gu A, Wang M (2015) Indoor scene understanding via monocular rgb-d images. IInf Sci 320(C):361–371
Article MathSciNet Google Scholar
Chen J, Song X, Nie L, Wang X, Zhang H, Chua T-S Micro tells macro: Predicting the popularity of micro-videos via a transductive model. In: Proceedings of the 2016 ACM on Multimedia Conference
Cao X, Wei Y, Wen F, Sun J (2012) Face alignment by explicit shape regression. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 2887–2894
Fang Y, Lin W, Fang Z, Chen Z, Lin CW, Deng C (2015) Visual acuity inspired saliency detection by using sparse features. Inf Sci 309(C):1–10
Article Google Scholar
Girshick R, Iandola F, Darrell T, Malik J (2015) Deformable part models are convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 437–446
Guo C, Ma Q, Zhang L (2008) Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform. In: CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, pp 1–8
Hamel S, Guyader N, Pellerin D, Houzet D (2014) Contribution of Color Information in Visual Saliency Model for Videos. Springer International Publishing:213–221
Hou X, Harel J, Koch C (2012) Image signature: Highlighting sparse salient regions. IEEE Trans Pattern Anal Mach Intell 34(1):194–201
Article Google Scholar
Han J, Li K, Shao L, Hu X, He S, Guo L, Han J, Liu T (2014) Video abstraction based on fmri-driven visual attention model. Inf Sci 281:781–796
Article Google Scholar
Hou X, Zhang L (2008) Dynamic visual attention: Searching for coding length increments. Adv Neural Inf Proces Syst 21:681–688
Google Scholar
Itti L, Koch C (2001) Feature combination strategies for saliency-based visual attention systems. Redele Revista Electrónica De Didáctica Ele 10(1):161–169
Google Scholar
Judd T, Ehinger K, Durand F, Torralba A (2009) Learning to predict where humans look:2106–2113
Kanan C, Tong MH (2009) Sun: Top-down saliency using natural statistics. Vis Cogn 17(6):979–1003
Article Google Scholar
Koch C, Ullman S (1985) Shifts in selective visual attention: Towards the underlying neural circuitry. Hum Neurobiol 4(4):219–27
Google Scholar
Kostinger M, Wohlhart P, Roth PM, Bischof H Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In: IEEE International Conference on Computer Vision Workshops, ICCV 2011 Workshops, Barcelona, Spain, November 6-13, 2011, pp 2144–2151
Liang L, Xiao R, Wen F, Sun J (2008) Face alignment via component-based discriminative search. In: Computer Vision - ECCV 2008, 10th European Conference on Computer Vision, Marseille, France, October 12-18 2008, Proceedings, Part II, pp 72–85
Moran C, Paxon F, Christof K (2009) Faces and text attract gaze independent of the task: Experimental data and computer model. J Vis 9(12):74–76
Google Scholar
Mital PK, Smith TJ, Hill RL, Henderson JM (2011) Clustering of gaze during dynamic scene viewing is predicted by motion. Cogn Comput 3(1):5–24
Article Google Scholar
Nie L, Wang M, Zha ZJ, Chua TS (2012) Oracle in image search: A content-based approach to performance prediction. ACM Trans Inf Syst 30(2):1–23
Article Google Scholar
Ni B, Xu M, Nguyen TV, Wang M, Lang C, Huang Z, Yan S (2014) Touch saliency: Characteristics and prediction. IEEE Trans Multimedia 16 (6):1779–1791
Article Google Scholar
Ohtsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66
Article Google Scholar
Peters JF, Wasilewski P (2012) Tolerance spaces: Origins, theoretical aspects and applications. Inf Sci 195(13):211–225
Article MathSciNet MATH Google Scholar
Ren S, Cao X, Wei Y, Sun J (2014) Face alignment at 3000 fps via regressing local binary features. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). IEEE, pp 1685–1692
Rekik W, Hégarat-Mascle SL, Reynaud R, Kallel A, Hamida AB (2015) Dynamic estimation of the discernment frame in belief function theory: Application to object detection. Inf Sci 306(2015):132–149
Article MATH Google Scholar
Song M, Chen C, Wang S, Yang Y (2014) Low-level and high-level prior learning for visual saliency estimation. Inf Sci 281:573–585
Article Google Scholar
Saragih J (2011) Principal regression analysis. In: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 2881–2888
Shi Z (2012) A novel hybrid network video quality assessment method. Adv Inf Sci Serv Sci 4(20):188–197
Google Scholar
Song X, Ming ZY, Nie L, Zhao YL, Chua TS Volunteerism tendency prediction via harvesting multiple social networks, Acm Transactions on Information Systems 34 (2)
Stirk JA, Underwood G (2007) Low-level visual saliency does not predict change detection in natural scenes. J Vis 7(10):3.1–10
Article Google Scholar
Treisman AM, Gelade G (1980) A feature-integration theory of attention. Cogn Psychol 12(12):97–136
Article Google Scholar
Tzimiropoulos G, Pantic M (2014) Gauss-newton deformable part models for face alignment in-the-wild. In: Computer Vision and Pattern Recognition, IEEE, pp 1851–1858
Wu B, Xu L (2014) Integrating bottom-up and top-down visual stimulus for saliency detection in news video. Multimedia Tools and Applications 73(3):1053–1075
Article Google Scholar
Wang W, Yan Y, Zhang L, Hong R, Sebe N (2016) Collaborative sparse coding for multiview action recognition. IEEE Multimedia 23(4):80–87
Article Google Scholar
Xiong X, De la Torre F (2013) Supervised descent method and its applications to face alignment. In: IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 532–539
Xuemeng Song LZMAT-SC, Nie L (2015) Multiple social network learning and its application in volunteerism tendency prediction. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 213–222
Yang Y, Wang X, Guan T, Shen J, Yu L (2014) A multi-dimensional image quality prediction model for user-generated images in social networks. Inf Sci 281:601–610
Article Google Scholar
Zhang L, Hong R, Gao Y, Ji R, Dai Q, Li X (2015) Image categorization by learning a propagated graphlet path. IEEE Transactions on Neural Networks and Learning Systems 27(3):674–685
Article MathSciNet Google Scholar
Zhang L, Li X, Nie L, Yan Y, Zimmermann R Semantic photo retargeting under noisy image labels, Acm Transactions on Multimedia Computing Communications and Applications 12 (3)
Zhang Y, Mao Z, Li J, Tian Q, Zhang Y, Mao Z, Li J, Tian Q (2014) Salient region detection for complex background images using integrated features. Inf Sci 281:586–600
Article Google Scholar
Zhang L, Song M, Li N, Bu J, Chen C (2009) Feature selection for fast speech emotion recognition. In: International Conference on Multimedia 2009, Vancouver, British Columbia, Canada, pp 753–756
Zhang L, Song M, Zhao Q, Liu X, Bu J, Chen C (2013) Probabilistic graphlet transfer for photo cropping. IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society 22(2):802–815
Article MathSciNet MATH Google Scholar
Zhang L, Wang M, Hong R, Yin B, Li X (2016) Large-scale aerial image categorization using a multitask topological codebook. IEEE Trans Cybernetics 46(2):535–545
Article Google Scholar
Zhang L, Xia Y, Ji R, Li X (2015) Spatial-aware object-level saliency prediction by learning graphlet hierarchies. IEEE Trans Ind Electron 62(2):1301–1308
Article Google Scholar
Zhang L, Yang Y, Wang M, Hong R (2016) Detecting densely distributed graph patterns for fine-grained image categorization. IEEE Trans Image Process 25 (2):553–565
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

This work is partially supported by National Natural Science Foundation of China (61672201), Anhui Province Nature Science Foundation of China (1408085MKL76), Anhui Province Science and Technology Major Project of China (15czz02074).

Author information

Authors and Affiliations

School of Computer and Information, Hefei University of Technology, Hefei, 230009, China
Yanxiang Chen, Qiangqiang Xie & Minglong Song
Anhui Keli Information Industry Co. Ltd., Hefei, 230088, China
Gang Tao

Authors

Yanxiang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Gang Tao
View author publications
You can also search for this author in PubMed Google Scholar
Qiangqiang Xie
View author publications
You can also search for this author in PubMed Google Scholar
Minglong Song
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanxiang Chen.

Additional information

The first two authors contribute equally to this study.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Y., Tao, G., Xie, Q. et al. Video attention prediction using gaze saliency. Multimed Tools Appl 78, 26867–26884 (2019). https://doi.org/10.1007/s11042-016-4294-1

Download citation

Received: 14 July 2016
Revised: 26 November 2016
Accepted: 21 December 2016
Published: 03 January 2017
Issue Date: 15 October 2019
DOI: https://doi.org/10.1007/s11042-016-4294-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Video attention prediction using gaze saliency

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

Attention mechanisms in computer vision: A survey

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Video attention prediction using gaze saliency

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

Attention mechanisms in computer vision: A survey

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation