research-article

Learning User Embeddings from Human Gaze for Personalised Saliency Prediction

Authors:

Florian Strohm,

Andreas BullingAuthors Info & Claims

Proceedings of the ACM on Human-Computer Interaction, Volume 8, Issue ETRA

Article No.: 229, Pages 1 - 16

https://doi.org/10.1145/3655603

Published: 28 May 2024 Publication History

Abstract

Reusable embeddings of user behaviour have shown significant performance improvements for the personalised saliency prediction task. However, prior works require explicit user characteristics and preferences as input, which are often difficult to obtain. We present a novel method to extract user embeddings from pairs of natural images and corresponding saliency maps generated from a small amount of user-specific eye tracking data. At the core of our method is a Siamese convolutional neural encoder that learns the user embeddings by contrasting the image and personal saliency map pairs of different users. Evaluations on two public saliency datasets show that the generated embeddings have high discriminative power, are effective at refining universal saliency maps to the individual users, and generalise well across users and images. Finally, based on our model's ability to encode individual user characteristics, our work points towards other applications that can benefit from reusable embeddings of gaze behaviour.

References

[1]

Ahmed Abdou, Ekta Sood, Philipp Müller, and Andreas Bulling. 2022. Gaze-enhanced Crossmodal Embeddings for Emotion Recognition. Proceedings of the ACM on Human-Computer Interaction, Vol. 6, ETRA (2022), 1--18.

Digital Library

[2]

Mingxiao An and Sundong Kim. 2021. Neural user embedding from browsing events. In Machine Learning and Knowledge Discovery in Databases: Applied Data Science Track: European Conference, ECML PKDD 2020, Ghent, Belgium, September 14--18, 2020, Proceedings, Part IV. Springer, 175--191.

Digital Library

[3]

Edoardo Ardizzone, Alessandro Bruno, and Giuseppe Mazzola. 2013. Saliency based image cropping. In Image Analysis and Processing--ICIAP 2013: 17th International Conference, Naples, Italy, September 9--13, 2013. Proceedings, Part I 17. Springer, 773--782.

[4]

Adrien Baranes, Pierre-Yves Oudeyer, and Jacqueline Gottlieb. 2015. Eye movements reveal epistemic curiosity in human observers. Vision research, Vol. 117 (2015), 81--90.

[5]

Ali Borji, Ming-Ming Cheng, Qibin Hou, Huaizu Jiang, and Jia Li. 2019. Salient object detection: A survey. Computational visual media, Vol. 5 (2019), 117--150.

[6]

Stephanie Brams, Gal Ziv, Oron Levin, Jochim Spitz, Johan Wagemans, A Mark Williams, and Werner F Helsen. 2019. The relationship between gaze behavior, expertise, and performance: A systematic review. Psychological bulletin, Vol. 145, 10 (2019), 980.

[7]

Maximilian Davide Broda and Benjamin De Haas. 2022. Individual differences in looking at persons in scenes. Journal of Vision, Vol. 22, 12 (2022), 9--9.

[8]

Maximilian D Broda and Benjamin de Haas. 2022. Individual fixation tendencies in person viewing generalize from images to videos. i-Perception, Vol. 13, 6 (2022), 20416695221128844.

[9]

Guy Thomas Buswell. 1935. How people look at pictures: a study of the psychology and perception in art. (1935).

[10]

Moran Cerf, Jonathan Harel, Wolfgang Einh"auser, and Christof Koch. 2007. Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems, Vol. 20 (2007).

[11]

Xianyu Chen, Ming Jiang, and Qi Zhao. 2021. Predicting human scanpaths in visual question answering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10876--10885.

[12]

Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, and Rita Cucchiara. 2016. A deep multi-level network for saliency prediction. In 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, 3488--3493.

[13]

Benjamin De Haas, Alexios L Iakovidis, D Samuel Schwarzkopf, and Karl R Gegenfurtner. 2019. Individual differences in visual salience vary along semantic dimensions. Proceedings of the National Academy of Sciences, Vol. 116, 24 (2019), 11687--11692.

[14]

Shahram Eivazi, Roman Bednarik, Markku Tukiainen, Mikael von und zu Fraunberg, Ville Leinonen, and Juha E J"a"askel"ainen. 2012. Gaze behaviour of expert and novice microneurosurgeons differs during observations of tumor removal recordings. In Proceedings of the Symposium on Eye Tracking Research and Applications. 377--380.

[15]

Christopher Ifeanyi Eke, Azah Anir Norman, Liyana Shuib, and Henry Friday Nweke. 2019. A survey of user profiling: State-of-the-art, challenges, and solutions. IEEE Access, Vol. 7 (2019), 144907--144924.

[16]

Camilo Fosco, Anelise Newman, Pat Sukhum, Yun Bin Zhang, Nanxuan Zhao, Aude Oliva, and Zoya Bylinskii. 2020. How much time do you have? modeling multi-duration saliency. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4473--4482.

[17]

Susan Gauch, Mirco Speretta, Aravind Chandramouli, and Alessandro Micarelli. 2007. User profiles for personalized information access. The adaptive Web: methods and strategies of Web personalization (2007), 54--89.

[18]

Junfeng He, Khoi Pham, Nachiappan Valliappan, Pingmei Xu, Chase Roberts, Dmitry Lagun, and Vidhya Navalpakkam. 2019b. On-device few-shot personalization for real-time gaze estimation. In Proceedings of the IEEE/CVF international conference on computer vision workshops. 0--0.

[19]

Shengfeng He, Chu Han, Guoqiang Han, and Jing Qin. 2019a. Exploring duality in visual question-driven top-down saliency. IEEE transactions on neural networks and learning systems, Vol. 31, 7 (2019), 2672--2679.

[20]

Sen He, Hamed R Tavakoli, Ali Borji, and Nicolas Pugeault. 2019c. Human attention in image captioning: Dataset and analysis. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8529--8538.

[21]

Alexander Hermans, Lucas Beyer, and Bastian Leibe. 2017. In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737 (2017).

[22]

Johannes Hewig, Ralf H Trippe, Holger Hecht, Thomas Straube, and Wolfgang HR Miltner. 2008. Gender differences for specific body regions when looking at men and women. Journal of Nonverbal Behavior, Vol. 32 (2008), 67--78.

[23]

Sabrina Hoppe, Tobias Loetscher, Stephanie A Morey, and Andreas Bulling. 2018. Eye movements during everyday behavior predict personality traits. Frontiers in human neuroscience (2018), 105.

[24]

Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. pmlr, 448--456.

[25]

Shamsi T Iqbal and Brian P Bailey. 2004. Using eye gaze patterns to identify user tasks. In The Grace Hopper Celebration of Women in Computing, Vol. 4. 2004.

[26]

Laurent Itti and Christof Koch. 2001. Computational modelling of visual attention. Nature reviews neuroscience, Vol. 2, 3 (2001), 194--203.

[27]

Laurent Itti, Christof Koch, and Ernst Niebur. 1998. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on pattern analysis and machine intelligence, Vol. 20, 11 (1998), 1254--1259.

Digital Library

[28]

Kalervo J"arvelin and Jaana Kek"al"ainen. 2017. IR evaluation methods for retrieving highly relevant documents. In ACM SIGIR Forum, Vol. 51. ACM New York, NY, USA, 243--250.

[29]

Ming Jiang, Shengsheng Huang, Juanyong Duan, and Qi Zhao. 2015. Salicon: Saliency in context. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1072--1080.

[30]

Tilke Judd, Frédo Durand, and Antonio Torralba. 2012. A benchmark of computational models of saliency to predict human fixations. (2012).

[31]

Tilke Judd, Krista Ehinger, Frédo Durand, and Antonio Torralba. 2009. Learning to predict where humans look. In 2009 IEEE 12th international conference on computer vision. IEEE, 2106--2113.

[32]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[33]

Onkar Krishna, Andrea Helo, Pia R"am"a, and Kiyoharu Aizawa. 2018. Gaze distribution analysis and saliency prediction across age groups. PloS one, Vol. 13, 2 (2018), e0193149.

[34]

Bruce Krulwich. 1997. Lifestyle finder: Intelligent user profiling using large-scale demographic data. AI magazine, Vol. 18, 2 (1997), 37--37.

[35]

Chen-Yu Lee, Saining Xie, Patrick Gallagher, Zhengyou Zhang, and Zhuowen Tu. 2015. Deeply-supervised nets. In Artificial intelligence and statistics. PMLR, 562--570.

[36]

Min Lin, Qiang Chen, and Shuicheng Yan. 2013. Network in network. arXiv preprint arXiv:1312.4400 (2013).

[37]

Akis Linardos, Matthias Kümmerer, Ori Press, and Matthias Bethge. 2021. DeepGaze IIE: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12919--12928.

[38]

Marcel Linka and Benjamin de Haas. 2020. OSIEshort: A small stimulus set can reliably estimate individual differences in semantic salience. Journal of vision, Vol. 20, 9 (2020), 13--13.

[39]

Yu-Fei Ma, Lie Lu, Hong-Jiang Zhang, and Mingjing Li. 2002. A user attention model for video summarization. In Proceedings of the tenth ACM international conference on Multimedia. 533--542.

Digital Library

[40]

Fatemehsadat Mireshghallah, Vaishnavi Shrivastava, Milad Shokouhi, Taylor Berg-Kirkpatrick, Robert Sim, and Dimitrios Dimitriadis. 2021. Useridentifier: Implicit user representations for simple and effective personalized sentiment analysis. arXiv preprint arXiv:2110.00135 (2021).

[41]

Sounak Mondal, Zhibo Yang, Seoyoung Ahn, Dimitris Samaras, Gregory Zelinsky, and Minh Hoai. 2023. Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention. arXiv preprint arXiv:2303.15274 (2023).

[42]

Yuya Moroto, Keisuke Maeda, Takahiro Ogawa, and Miki Haseyama. 2020. Few-shot personalized saliency prediction based on adaptive image selection considering object and visual attention. Sensors, Vol. 20, 8 (2020), 2170.

[43]

Junting Pan, Elisa Sayrol, Xavier Giro-i Nieto, Kevin McGuinness, and Noel E O'Connor. 2016. Shallow and deep convolutional networks for saliency prediction. In Proceedings of the IEEE conference on computer vision and pattern recognition. 598--606.

[44]

Michael J Pazzani, Jack Muramatsu, Daniel Billsus, et al. 1996. Syskill & Webert: Identifying interesting web sites. In AAAI/IAAI, Vol. 1. 54--61.

Digital Library

[45]

Evan F Risko, Nicola C Anderson, Sophie Lanthier, and Alan Kingstone. 2012. Curious eyes: Individual differences in personality predict eye movement behavior in scene-viewing. Cognition, Vol. 122, 1 (2012), 86--90.

[46]

Negar Sammaknejad, Hamidreza Pouretemad, Changiz Eslahchi, Alireza Salahirad, and Ashkan Alinejad. 2017. Gender classification based on eye movements: A processing effect during passive face viewing. Advances in cognitive psychology, Vol. 13, 3 (2017), 232.

[47]

Hosnieh Sattar, Mario Fritz, and Andreas Bulling. 2020. Deep gaze pooling: Inferring and visually decoding search intents from human gaze fixations. Neurocomputing, Vol. 387 (2020), 369--382.

Digital Library

[48]

Jude Shavlik, Susan Calcari, Tina Eliassi-Rad, and Jack Solock. 1998. An instructable, adaptive interface for discovering and monitoring information on the world-wide web. In Proceedings of the 4th international conference on Intelligent user interfaces. 157--160.

Digital Library

[49]

Ana Filipa Silva, Francisco Tomás González Fernández, et al. 2022. Differences in visual search behavior between expert and novice team sports athletes: A systematic review with meta-analysis. (2022).

[50]

Ekta Sood, Fabian Kögel, Florian Strohm, Prajit Dhar, and Andreas Bulling. 2021. VQA-MHUG: A gaze dataset to study multimodal neural attention in visual question answering. arXiv preprint arXiv:2109.13116 (2021).

[51]

Ekta Sood, Fabian Kögel, Philipp Müller, Dominike Thomas, Mihai Bâce, and Andreas Bulling. 2023. Multimodal Integration of Human-Like Attention in Visual Question Answering. In Proc. Workshop on Gaze Estimation and Prediction in the Wild (GAZE), CVPRW. 2647--2657. https://openaccess.thecvf.com/content/CVPR2023W/GAZE/papers/Sood_Multimodal_Integration_of_Human-Like_Attention_in_Visual_Question_Answering_CVPRW_2023_paper.pdf

[52]

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, Vol. 15, 1 (2014), 1929--1958.

[53]

Florian Strohm, Ekta Sood, Sven Mayer, Philipp Müller, Mihai Bâce, and Andreas Bulling. 2021. Neural Photofit: gaze-based mental image reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 245--254.

[54]

Florian Strohm, Ekta Sood, Dominike Thomas, Mihai Bâce, and Andreas Bulling. 2023. Facial Composite Generation with Iterative Human Feedback. In Annual Conference on Neural Information Processing Systems. PMLR, 165--183.

[55]

Roel Vertegaal. 2002. Designing attentive interfaces. In Proceedings of the 2002 symposium on Eye tracking research & applications. 23--30.

Digital Library

[56]

Lennart Wachowiak, Peter Tisnikar, Gerard Canal, Andrew Coles, Matteo Leonetti, and Oya Celiktutan. 2022. Analysing eye gaze patterns during confusion and errors in human--agent collaborations. In 2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN). IEEE, 224--229.

Digital Library

[57]

Dirk Walther and Christof Koch. 2006. Modeling attention to salient proto-objects. Neural networks, Vol. 19, 9 (2006), 1395--1407.

[58]

Xiaodong Wu, Weizhe Lin, Zhilin Wang, and Elena Rastorgueva. 2020. Author2vec: A framework for generating user embedding. arXiv preprint arXiv:2003.11627 (2020).

[59]

Yanyu Xu, Shenghua Gao, Junru Wu, Nianyi Li, and Jingyi Yu. 2018. Personalized saliency and its prediction. IEEE transactions on pattern analysis and machine intelligence, Vol. 41, 12 (2018), 2975--2989.

[60]

Yanyu Xu, Nianyi Li, Junru Wu, Jingyi Yu, Shenghua Gao, et al. 2017. Beyond Universal Saliency: Personalized Saliency Prediction with Multi-task CNN. In IJCAI. 3887--3893.

[61]

Zhibo Yang, Lihan Huang, Yupei Chen, Zijun Wei, Seoyoung Ahn, Gregory Zelinsky, Dimitris Samaras, and Minh Hoai. 2020. Predicting goal-directed human attention using inverse reinforcement learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 193--202.

[62]

Alfred L Yarbus and Alfred L Yarbus. 1967. Eye movements during perception of complex objects. Eye movements and vision (1967), 171--211.

[63]

Bingqing Yu. 2018. Personalization of saliency estimation. McGill University (Canada).

[64]

Qian Zhao, Shuo Chang, F Maxwell Harper, and Joseph A Konstan. 2016. Gaze prediction for recommender systems. In Proceedings of the 10th ACM Conference on Recommender Systems. 131--138.

Digital Library

[65]

Wanjun Zhong, Duyu Tang, Jiahai Wang, Jian Yin, and Nan Duan. 2021. UserAdapter: Few-shot user learning in sentiment analysis. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 1484--1488.

Index Terms

Learning User Embeddings from Human Gaze for Personalised Saliency Prediction
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Interest point and salient region detections
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI design and evaluation methods
      1. User models

Recommendations

Human Visual Scanpath Prediction Based on RGB-D Saliency
ICIGP '18: Proceedings of the 2018 International Conference on Image and Graphics Processing

Human visual perception is considered as a dynamic process of information acquisition, while the visual scanpath can clearly reflect the shift of our eye fixations. In the previous study of visual attention, researchers generally do the saliency ...
Saliency and optical flow for gaze guidance in videos
SAP '16: Proceedings of the ACM Symposium on Applied Perception

Computer-based gaze guidance techniques have important applications in computer graphics, data visualization, image analysis, and training. Bailey et al. [2009] showed that it is possible to influence exactly where attention is allocated using a ...
Exploiting the GBVS for Saliency aware Gaze Heatmaps
ETRA '20 Short Papers: ACM Symposium on Eye Tracking Research and Applications

Analyzing visual perception in scene images is dominated by two different approaches: 1.) Eye Tracking, which allows us to measure the visual focus directly by mapping a detected fixation to a scene image, and 2.) Saliency maps, which predict the ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Human-Computer Interaction

Proceedings of the ACM on Human-Computer Interaction Volume 8, Issue ETRA

ETRA

May 2024

351 pages

EISSN:2573-0142

DOI:10.1145/3669943

Editor:
Jeff Nichols
Apple Inc., United States

Issue’s Table of Contents

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 May 2024

Published in PACMHCI Volume 8, Issue ETRA

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Postdoc.Mobility Fellowship
European Research Council

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
115
Total Downloads

Downloads (Last 12 months)115
Downloads (Last 6 weeks)13

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents