skip to main content
research-article

Learning User Embeddings from Human Gaze for Personalised Saliency Prediction

Published: 28 May 2024 Publication History

Abstract

Reusable embeddings of user behaviour have shown significant performance improvements for the personalised saliency prediction task. However, prior works require explicit user characteristics and preferences as input, which are often difficult to obtain. We present a novel method to extract user embeddings from pairs of natural images and corresponding saliency maps generated from a small amount of user-specific eye tracking data. At the core of our method is a Siamese convolutional neural encoder that learns the user embeddings by contrasting the image and personal saliency map pairs of different users. Evaluations on two public saliency datasets show that the generated embeddings have high discriminative power, are effective at refining universal saliency maps to the individual users, and generalise well across users and images. Finally, based on our model's ability to encode individual user characteristics, our work points towards other applications that can benefit from reusable embeddings of gaze behaviour.

References

[1]
Ahmed Abdou, Ekta Sood, Philipp Müller, and Andreas Bulling. 2022. Gaze-enhanced Crossmodal Embeddings for Emotion Recognition. Proceedings of the ACM on Human-Computer Interaction, Vol. 6, ETRA (2022), 1--18.
[2]
Mingxiao An and Sundong Kim. 2021. Neural user embedding from browsing events. In Machine Learning and Knowledge Discovery in Databases: Applied Data Science Track: European Conference, ECML PKDD 2020, Ghent, Belgium, September 14--18, 2020, Proceedings, Part IV. Springer, 175--191.
[3]
Edoardo Ardizzone, Alessandro Bruno, and Giuseppe Mazzola. 2013. Saliency based image cropping. In Image Analysis and Processing--ICIAP 2013: 17th International Conference, Naples, Italy, September 9--13, 2013. Proceedings, Part I 17. Springer, 773--782.
[4]
Adrien Baranes, Pierre-Yves Oudeyer, and Jacqueline Gottlieb. 2015. Eye movements reveal epistemic curiosity in human observers. Vision research, Vol. 117 (2015), 81--90.
[5]
Ali Borji, Ming-Ming Cheng, Qibin Hou, Huaizu Jiang, and Jia Li. 2019. Salient object detection: A survey. Computational visual media, Vol. 5 (2019), 117--150.
[6]
Stephanie Brams, Gal Ziv, Oron Levin, Jochim Spitz, Johan Wagemans, A Mark Williams, and Werner F Helsen. 2019. The relationship between gaze behavior, expertise, and performance: A systematic review. Psychological bulletin, Vol. 145, 10 (2019), 980.
[7]
Maximilian Davide Broda and Benjamin De Haas. 2022. Individual differences in looking at persons in scenes. Journal of Vision, Vol. 22, 12 (2022), 9--9.
[8]
Maximilian D Broda and Benjamin de Haas. 2022. Individual fixation tendencies in person viewing generalize from images to videos. i-Perception, Vol. 13, 6 (2022), 20416695221128844.
[9]
Guy Thomas Buswell. 1935. How people look at pictures: a study of the psychology and perception in art. (1935).
[10]
Moran Cerf, Jonathan Harel, Wolfgang Einh"auser, and Christof Koch. 2007. Predicting human gaze using low-level saliency combined with face detection. Advances in neural information processing systems, Vol. 20 (2007).
[11]
Xianyu Chen, Ming Jiang, and Qi Zhao. 2021. Predicting human scanpaths in visual question answering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10876--10885.
[12]
Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, and Rita Cucchiara. 2016. A deep multi-level network for saliency prediction. In 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, 3488--3493.
[13]
Benjamin De Haas, Alexios L Iakovidis, D Samuel Schwarzkopf, and Karl R Gegenfurtner. 2019. Individual differences in visual salience vary along semantic dimensions. Proceedings of the National Academy of Sciences, Vol. 116, 24 (2019), 11687--11692.
[14]
Shahram Eivazi, Roman Bednarik, Markku Tukiainen, Mikael von und zu Fraunberg, Ville Leinonen, and Juha E J"a"askel"ainen. 2012. Gaze behaviour of expert and novice microneurosurgeons differs during observations of tumor removal recordings. In Proceedings of the Symposium on Eye Tracking Research and Applications. 377--380.
[15]
Christopher Ifeanyi Eke, Azah Anir Norman, Liyana Shuib, and Henry Friday Nweke. 2019. A survey of user profiling: State-of-the-art, challenges, and solutions. IEEE Access, Vol. 7 (2019), 144907--144924.
[16]
Camilo Fosco, Anelise Newman, Pat Sukhum, Yun Bin Zhang, Nanxuan Zhao, Aude Oliva, and Zoya Bylinskii. 2020. How much time do you have? modeling multi-duration saliency. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4473--4482.
[17]
Susan Gauch, Mirco Speretta, Aravind Chandramouli, and Alessandro Micarelli. 2007. User profiles for personalized information access. The adaptive Web: methods and strategies of Web personalization (2007), 54--89.
[18]
Junfeng He, Khoi Pham, Nachiappan Valliappan, Pingmei Xu, Chase Roberts, Dmitry Lagun, and Vidhya Navalpakkam. 2019b. On-device few-shot personalization for real-time gaze estimation. In Proceedings of the IEEE/CVF international conference on computer vision workshops. 0--0.
[19]
Shengfeng He, Chu Han, Guoqiang Han, and Jing Qin. 2019a. Exploring duality in visual question-driven top-down saliency. IEEE transactions on neural networks and learning systems, Vol. 31, 7 (2019), 2672--2679.
[20]
Sen He, Hamed R Tavakoli, Ali Borji, and Nicolas Pugeault. 2019c. Human attention in image captioning: Dataset and analysis. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8529--8538.
[21]
Alexander Hermans, Lucas Beyer, and Bastian Leibe. 2017. In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737 (2017).
[22]
Johannes Hewig, Ralf H Trippe, Holger Hecht, Thomas Straube, and Wolfgang HR Miltner. 2008. Gender differences for specific body regions when looking at men and women. Journal of Nonverbal Behavior, Vol. 32 (2008), 67--78.
[23]
Sabrina Hoppe, Tobias Loetscher, Stephanie A Morey, and Andreas Bulling. 2018. Eye movements during everyday behavior predict personality traits. Frontiers in human neuroscience (2018), 105.
[24]
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. pmlr, 448--456.
[25]
Shamsi T Iqbal and Brian P Bailey. 2004. Using eye gaze patterns to identify user tasks. In The Grace Hopper Celebration of Women in Computing, Vol. 4. 2004.
[26]
Laurent Itti and Christof Koch. 2001. Computational modelling of visual attention. Nature reviews neuroscience, Vol. 2, 3 (2001), 194--203.
[27]
Laurent Itti, Christof Koch, and Ernst Niebur. 1998. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on pattern analysis and machine intelligence, Vol. 20, 11 (1998), 1254--1259.
[28]
Kalervo J"arvelin and Jaana Kek"al"ainen. 2017. IR evaluation methods for retrieving highly relevant documents. In ACM SIGIR Forum, Vol. 51. ACM New York, NY, USA, 243--250.
[29]
Ming Jiang, Shengsheng Huang, Juanyong Duan, and Qi Zhao. 2015. Salicon: Saliency in context. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1072--1080.
[30]
Tilke Judd, Frédo Durand, and Antonio Torralba. 2012. A benchmark of computational models of saliency to predict human fixations. (2012).
[31]
Tilke Judd, Krista Ehinger, Frédo Durand, and Antonio Torralba. 2009. Learning to predict where humans look. In 2009 IEEE 12th international conference on computer vision. IEEE, 2106--2113.
[32]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[33]
Onkar Krishna, Andrea Helo, Pia R"am"a, and Kiyoharu Aizawa. 2018. Gaze distribution analysis and saliency prediction across age groups. PloS one, Vol. 13, 2 (2018), e0193149.
[34]
Bruce Krulwich. 1997. Lifestyle finder: Intelligent user profiling using large-scale demographic data. AI magazine, Vol. 18, 2 (1997), 37--37.
[35]
Chen-Yu Lee, Saining Xie, Patrick Gallagher, Zhengyou Zhang, and Zhuowen Tu. 2015. Deeply-supervised nets. In Artificial intelligence and statistics. PMLR, 562--570.
[36]
Min Lin, Qiang Chen, and Shuicheng Yan. 2013. Network in network. arXiv preprint arXiv:1312.4400 (2013).
[37]
Akis Linardos, Matthias Kümmerer, Ori Press, and Matthias Bethge. 2021. DeepGaze IIE: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12919--12928.
[38]
Marcel Linka and Benjamin de Haas. 2020. OSIEshort: A small stimulus set can reliably estimate individual differences in semantic salience. Journal of vision, Vol. 20, 9 (2020), 13--13.
[39]
Yu-Fei Ma, Lie Lu, Hong-Jiang Zhang, and Mingjing Li. 2002. A user attention model for video summarization. In Proceedings of the tenth ACM international conference on Multimedia. 533--542.
[40]
Fatemehsadat Mireshghallah, Vaishnavi Shrivastava, Milad Shokouhi, Taylor Berg-Kirkpatrick, Robert Sim, and Dimitrios Dimitriadis. 2021. Useridentifier: Implicit user representations for simple and effective personalized sentiment analysis. arXiv preprint arXiv:2110.00135 (2021).
[41]
Sounak Mondal, Zhibo Yang, Seoyoung Ahn, Dimitris Samaras, Gregory Zelinsky, and Minh Hoai. 2023. Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention. arXiv preprint arXiv:2303.15274 (2023).
[42]
Yuya Moroto, Keisuke Maeda, Takahiro Ogawa, and Miki Haseyama. 2020. Few-shot personalized saliency prediction based on adaptive image selection considering object and visual attention. Sensors, Vol. 20, 8 (2020), 2170.
[43]
Junting Pan, Elisa Sayrol, Xavier Giro-i Nieto, Kevin McGuinness, and Noel E O'Connor. 2016. Shallow and deep convolutional networks for saliency prediction. In Proceedings of the IEEE conference on computer vision and pattern recognition. 598--606.
[44]
Michael J Pazzani, Jack Muramatsu, Daniel Billsus, et al. 1996. Syskill & Webert: Identifying interesting web sites. In AAAI/IAAI, Vol. 1. 54--61.
[45]
Evan F Risko, Nicola C Anderson, Sophie Lanthier, and Alan Kingstone. 2012. Curious eyes: Individual differences in personality predict eye movement behavior in scene-viewing. Cognition, Vol. 122, 1 (2012), 86--90.
[46]
Negar Sammaknejad, Hamidreza Pouretemad, Changiz Eslahchi, Alireza Salahirad, and Ashkan Alinejad. 2017. Gender classification based on eye movements: A processing effect during passive face viewing. Advances in cognitive psychology, Vol. 13, 3 (2017), 232.
[47]
Hosnieh Sattar, Mario Fritz, and Andreas Bulling. 2020. Deep gaze pooling: Inferring and visually decoding search intents from human gaze fixations. Neurocomputing, Vol. 387 (2020), 369--382.
[48]
Jude Shavlik, Susan Calcari, Tina Eliassi-Rad, and Jack Solock. 1998. An instructable, adaptive interface for discovering and monitoring information on the world-wide web. In Proceedings of the 4th international conference on Intelligent user interfaces. 157--160.
[49]
Ana Filipa Silva, Francisco Tomás González Fernández, et al. 2022. Differences in visual search behavior between expert and novice team sports athletes: A systematic review with meta-analysis. (2022).
[50]
Ekta Sood, Fabian Kögel, Florian Strohm, Prajit Dhar, and Andreas Bulling. 2021. VQA-MHUG: A gaze dataset to study multimodal neural attention in visual question answering. arXiv preprint arXiv:2109.13116 (2021).
[51]
Ekta Sood, Fabian Kögel, Philipp Müller, Dominike Thomas, Mihai Bâce, and Andreas Bulling. 2023. Multimodal Integration of Human-Like Attention in Visual Question Answering. In Proc. Workshop on Gaze Estimation and Prediction in the Wild (GAZE), CVPRW. 2647--2657. https://openaccess.thecvf.com/content/CVPR2023W/GAZE/papers/Sood_Multimodal_Integration_of_Human-Like_Attention_in_Visual_Question_Answering_CVPRW_2023_paper.pdf
[52]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, Vol. 15, 1 (2014), 1929--1958.
[53]
Florian Strohm, Ekta Sood, Sven Mayer, Philipp Müller, Mihai Bâce, and Andreas Bulling. 2021. Neural Photofit: gaze-based mental image reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 245--254.
[54]
Florian Strohm, Ekta Sood, Dominike Thomas, Mihai Bâce, and Andreas Bulling. 2023. Facial Composite Generation with Iterative Human Feedback. In Annual Conference on Neural Information Processing Systems. PMLR, 165--183.
[55]
Roel Vertegaal. 2002. Designing attentive interfaces. In Proceedings of the 2002 symposium on Eye tracking research & applications. 23--30.
[56]
Lennart Wachowiak, Peter Tisnikar, Gerard Canal, Andrew Coles, Matteo Leonetti, and Oya Celiktutan. 2022. Analysing eye gaze patterns during confusion and errors in human--agent collaborations. In 2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN). IEEE, 224--229.
[57]
Dirk Walther and Christof Koch. 2006. Modeling attention to salient proto-objects. Neural networks, Vol. 19, 9 (2006), 1395--1407.
[58]
Xiaodong Wu, Weizhe Lin, Zhilin Wang, and Elena Rastorgueva. 2020. Author2vec: A framework for generating user embedding. arXiv preprint arXiv:2003.11627 (2020).
[59]
Yanyu Xu, Shenghua Gao, Junru Wu, Nianyi Li, and Jingyi Yu. 2018. Personalized saliency and its prediction. IEEE transactions on pattern analysis and machine intelligence, Vol. 41, 12 (2018), 2975--2989.
[60]
Yanyu Xu, Nianyi Li, Junru Wu, Jingyi Yu, Shenghua Gao, et al. 2017. Beyond Universal Saliency: Personalized Saliency Prediction with Multi-task CNN. In IJCAI. 3887--3893.
[61]
Zhibo Yang, Lihan Huang, Yupei Chen, Zijun Wei, Seoyoung Ahn, Gregory Zelinsky, Dimitris Samaras, and Minh Hoai. 2020. Predicting goal-directed human attention using inverse reinforcement learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 193--202.
[62]
Alfred L Yarbus and Alfred L Yarbus. 1967. Eye movements during perception of complex objects. Eye movements and vision (1967), 171--211.
[63]
Bingqing Yu. 2018. Personalization of saliency estimation. McGill University (Canada).
[64]
Qian Zhao, Shuo Chang, F Maxwell Harper, and Joseph A Konstan. 2016. Gaze prediction for recommender systems. In Proceedings of the 10th ACM Conference on Recommender Systems. 131--138.
[65]
Wanjun Zhong, Duyu Tang, Jiahai Wang, Jian Yin, and Nan Duan. 2021. UserAdapter: Few-shot user learning in sentiment analysis. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 1484--1488.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Human-Computer Interaction
Proceedings of the ACM on Human-Computer Interaction  Volume 8, Issue ETRA
ETRA
May 2024
351 pages
EISSN:2573-0142
DOI:10.1145/3669943
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 May 2024
Published in PACMHCI Volume 8, Issue ETRA

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep learning
  2. eye-tracking
  3. gaze
  4. personal saliency
  5. saliency
  6. user embeddings
  7. user model

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 115
    Total Downloads
  • Downloads (Last 12 months)115
  • Downloads (Last 6 weeks)13
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media