research-article

Video Saliency Prediction via Deep Eye Movement Learning

Authors:

Zongyi LiAuthors Info & Claims

MMAsia '21: Proceedings of the 3rd ACM International Conference on Multimedia in Asia

Article No.: 17, Pages 1 - 6

https://doi.org/10.1145/3469877.3490597

Published: 10 January 2022 Publication History

Abstract

Existing methods often utilize temporal motion information and spatial layout information in video to predict video saliency. However, the fixations are not always consistent with the moving object of interest, because human eye fixations are determined not only by the spatio-temporal information, but also by the velocity of eye movement. To address this issue, a new saliency prediction method via deep eye movement learning (EML) is proposed in this paper. Compared with previous methods that use human fixations as ground truth, our method uses the optical flow of fixations between successive frames as an extra ground truth for the purpose of eye movement learning. Experimental results on DHF1K, Hollywood2, and UCF-sports datasets show the proposed EML model achieves a promising result across a wide of metrics.

References

[1]

C. Bak, A. Kocak, E. Erdem, and A. Erdem. 2018. Spatio-temporal saliency networks for dynamic saliency prediction. IEEE Trans. Multimedia 20, 7 (2018), 1688–1698.

[2]

L. Bazzani, H. Larochelle, and L. Torresani. 2017. Recurrent mixture density network for spatiotemporal visual attention. In ICLR. 1–15.

[3]

S. Chaabouni, J. Benois-Pineau, and C. B. Amar. 2016. Transfer learning with deep networks for saliency prediction in natural video. In ICIP. 1604–1608.

[4]

S. Gorji and J. Clark. 2018. Going from image to video saliency: Augmenting image salience with dynamic attentional push. In CVPR. 7501–7511.

[5]

P. Holzman, L. Proctor, and D. Hughes. 1973. Eye-tracking patterns in schizophrenia. Science 181(1973), 179–181.

[6]

L. Jiang, M. Xu, and Z. Wang. 2017. Predicting video saliency with object-to-motion CNN and two-layer convolutional LSTM. arXiv preprint arXiv:1709.06316(2017).

[7]

M. Jiang, S. Huang, J. Duan, and Q. Zhao. 2015. SALICON: Saliency in context. In CVPR. 1072–1080.

[8]

W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, S. Vijayanarasimhan, F. Viola, T. Green, T. Back, and P. Natsev. 2017. The kinetics human action video dataset. arXiv preprint arXiv:1705.06950(2017).

[9]

D. Kingma and J. Ba. 2015. Adam: A method for stochastic optimization. In ICLR. 1–15.

[10]

H. Komatsu. 1988. Relation of cortical areas MT and MST to pursuit eye movements. I. Localization and visual properties of neurons. Journal of Neurophysiology 60 (1988), 12410–12417.

[11]

R. J. Krauzlis and S. G. Lisberger. 1994. A model of visually-guided smooth pursuit eye movements based on behavioral observations. Journal Of Computational Neuroscience 1 (1994), 265–283.

[12]

Q. Lai, W. Wang, H. Sun, and J. Shen. 2020. Video Saliency Prediction Using Spatiotemporal Residual Attentive Networks. IEEE Trans. Image Process. 29 (2020), 1113–1126.

[13]

G. Leifman, D. Rudoy, T. Swedish, E. Bayro-Corrochano, and R. Raskar. 2017. Learning gaze transitions from depth to improve video saliency estimation. In ICCV. 1707–1716.

[14]

P. Linardos, E. Mohedano, J. J. Nieto, N. E. O’Connor, X. Giro-i-Nieto, and K. McGuinness. 2019. Simple vs complex temporal recurrences for video saliency prediction. arXiv preprint arXiv:1907.01869(2019).

[15]

S. G. Lisberger and J. A. Movshon. 1999. Visual Motion Analysis for Pursuit Eye Movements in Area MT of Macaque Monkeys. Journal of Neuroscience 19, 6 (1999), 2224–2246.

[16]

S. Mathe and C. Sminchisescu. 2015. Actions in the eye: Dynamic gaze datasets and learnt saliency models for visual recognition. IEEE Trans. Patt. Anal. Mach. Intell. 37, 7 (2015), 1408–1424.

Digital Library

[17]

K. Min and J. Corso. 2019. TASED-Net: Temporally aggregating spatial encoder-decoder network for video saliency detection. In ICCV. 2394–2403.

[18]

N. Riche, M. Duvinage, M. Mancas, B. Gosselin, and T. Dutoit. 2014. Saliency and human fixations: State-of-the-art and study of comparison metrics. In ICCV. 1153–1160.

[19]

D. L. Ringach. 1996. A “tachometer” feedback model of smooth pursuit eye movements. Biological Cybernetics 73 (1996), 561–568.

Digital Library

[20]

D. Robinson. 1965. The mechanics of human smooth pursuit eye movement. Journal of Physiology 180 (1965), 569–591.

[21]

D. Robinson, J. L. Gordon, and S. E. Gordon. 1986. A Model of the Smooth Pursuit Eye Movement System. Biological Cybernetics 55 (1986), 43–57.

Digital Library

[22]

X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, and W.-K. Wong. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In NIPS. 802–810.

[23]

K. Simonyan and A. Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In ICLR. 1–14.

[24]

M. Sun, Z. Zhou, Q. Hu, Z. Wang, and J. Jiang. 2019. SG-FCN: A motion and memory-based deep learning model for video saliency detection. IEEE Trans. Cybernetics 49, 8 (2019), 2900–2911.

[25]

S. Tomohiro, T. Hiromitsu, S. Schaal, and M. Kawato. 2005. A Model of the Smooth Pursuit Eye Movement System. Biological Cybernetics 18 (2005), 213–224.

[26]

W. Wang, J. Shen, F. Guo, M. Cheng, and A. Borji. 2018. Revisiting video saliency: A large-scale benchmark and a new model. In CVPR. 4894–4903.

[27]

X. Wu, Z. Wu, J. Zhang, L. Ju, and S. Wang. 2020. SalSAC: A video saliency prediction model with shuffled attentions and correlation-based ConvLSTM. In AAAI. 12410–12417.

[28]

L. R. Young, J. D. Forster, and N. van Houtte. 1968. A revised stochastic sampled data model for eye tracking movements. In Fourth Ann NASA–University Conference on Manual Control. 12410–12417.

Cited By

Zhou XWu SShi RZheng BWang SYin HZhang JYan C(2023)Transformer-Based Multi-Scale Feature Integration Network for Video Saliency PredictionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.327841033:12(7696-7707)Online publication date: 22-May-2023
https://dl.acm.org/doi/10.1109/TCSVT.2023.3278410
Wu Zhou XSun YGao YZhu ZZhang JYan C(2023)GFNet: gated fusion network for video saliency predictionApplied Intelligence10.1007/s10489-023-04861-553:22(27865-27875)Online publication date: 19-Sep-2023
https://dl.acm.org/doi/10.1007/s10489-023-04861-5

Index Terms

Video Saliency Prediction via Deep Eye Movement Learning
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
      2. Computer vision tasks
        Scene understanding
  2. Machine learning

Index terms have been assigned to the content through auto-classification.

Recommendations

Hierarchical HMM for Eye Movement Classification
Computer Vision – ECCV 2020 Workshops
Abstract
In this work, we tackle the problem of ternary eye movement classification, which aims to separate fixations, saccades and smooth pursuits from the raw eye positional data. The efficient classification of these different types of eye movements ...
Analysis of eye tracking movements using FIR median hybrid filters
ETRA '00: Proceedings of the 2000 symposium on Eye tracking research & applications

This paper presents an approach of using FIR Median Hybrid Filters for analysis of eye tracking movements. The proposed filter can remove the eye blink artifact from the eye movement signal. The background of the project is described first. The whole ...
Real time eye movement identification protocol
CHI EA '10: CHI '10 Extended Abstracts on Human Factors in Computing Systems

This paper introduces a Real Time Eye Movement Identification (REMI) protocol designed to address challenges related to the implementation of the eye-gaze guided computer interfaces. The REMI protocol provides the framework for 1) eye position data ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MMAsia '21: Proceedings of the 3rd ACM International Conference on Multimedia in Asia

December 2021

508 pages

ISBN:9781450386074

DOI:10.1145/3469877

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 January 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National key research and development program of China
Research Programme on Applied Fundamentals and Frontier Technologies of Wuhan
Natural Science Foundation of China
Beijing Nova Program

Conference

MMAsia '21

Sponsor:

SIGMM

MMAsia '21: ACM Multimedia Asia

December 1 - 3, 2021

Gold Coast, Australia

Acceptance Rates

Overall Acceptance Rate 59 of 204 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
124
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)3

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhou XWu SShi RZheng BWang SYin HZhang JYan C(2023)Transformer-Based Multi-Scale Feature Integration Network for Video Saliency PredictionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.327841033:12(7696-7707)Online publication date: 22-May-2023
https://dl.acm.org/doi/10.1109/TCSVT.2023.3278410
Wu Zhou XSun YGao YZhu ZZhang JYan C(2023)GFNet: gated fusion network for video saliency predictionApplied Intelligence10.1007/s10489-023-04861-553:22(27865-27875)Online publication date: 19-Sep-2023
https://dl.acm.org/doi/10.1007/s10489-023-04861-5

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten