Abstract
Recent studies show that user’s visual attention during virtual reality museum navigation can be effectively estimated with deep learning models. However, these models rely on large-scale datasets that usually are of high structure complexity and context specific, which is challenging for nonspecialist researchers and designers. Therefore, we present the deep learning model, ALRF, to generalise on real-time user visual attention prediction in virtual reality context. The model combines two parallel deep learning streams to process the compact dataset of temporal–spatial salient features of user’s eye movements and virtual object coordinates. The prediction accuracy outperformed the state-of-the-art deep learning models by reaching record high 91.03%. Importantly, with quick parametric tuning, the model showed flexible applicability across different environments of the virtual reality museum and outdoor scenes. Implications for how the proposed model may be implemented as a generalising tool for adaptive virtual reality application design and evaluation are discussed.









Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and material
Derived data supporting the findings of this study are available from the corresponding author on request.
Code availability
Software applications used in the study are based on public open sources, and the code being used in this study is available from the corresponding author on request.
Change history
28 April 2021
A Correction to this paper has been published: https://doi.org/10.1007/s10055-021-00527-0
References
Barbieri L, Bruno F, Muzzupappa M (2017) User-centered design of a virtual reality exhibit for archaeological museums. Int J Inter Des Manuf (IJIDeM) 12:561–571. https://doi.org/10.1007/s12008-017-0414-z
Borji A, Itti L (2013) State-of-the-art in visual attention modeling. IEEE Trans Pattern Anal Mach Intell 35:185–207. https://doi.org/10.1109/TPAMI.2012.89
Chaabouni S, Benois-Pineau J, Amar CB (2016) Transfer learning with deep networks for saliency prediction in natural video. IEEE Int Conf Image Process. https://doi.org/10.1109/icip.2016.7532629
Chen X, Kasgari ATZ, Saad W (2020) Deep learning for content-based personalized viewport prediction of 360-degree VR videos. IEEE Netw Lett 2:81–84. https://doi.org/10.1109/lnet.2020.2977124
Cummings JL, Teng B-S (2003) Transferring R&D knowledge: the key factors affecting knowledge transfer success. J Eng Tech Manag 20:39–68. https://doi.org/10.1016/s0923-4748(03)00004-3
Cutting J (2017) Measuring game experience using visual distractors. Ext Abstr Publ Annu Sympos Comput-Hum Interact Play. https://doi.org/10.1145/3130859.3133221
Deng L, Yu D (2014) Deep learning: methods and applications. Found Trends Signal Process 7:197–387. https://doi.org/10.1561/9781601988157
Fan C-L, Lee J, Lo W-C, Huang C-Y, Chen K-T, Hsu C-H (2017) Fixation prediction for 360 video streaming in head-mounted virtual reality. Proc Workshop Netw Oper Syst Supp Digit Audio Video. https://doi.org/10.1145/3083165.3083180
Fan C-L, Yen S-C, Huang C-Y, Hsu C-H (2019) Optimizing fixation prediction using recurrent neural networks for 360° video streaming in head-mounted virtual reality. IEEE Trans Multimed 22:744–759. https://doi.org/10.1109/tmm.2019.2931807
Fang Y, Wang Z, Lin W, Fang Z (2014) Video saliency incorporating spatiotemporal cues and uncertainty weighting. IEEE Trans Image Process 23:3910–3921. https://doi.org/10.1109/icme.2013.6607572
Fang Y, Zhang C, Li J, Lei J, Da Silva MP, Le Callet P (2017) Visual attention modeling for stereoscopic video: a benchmark and computational model. IEEE Trans Image Process 26:4684–4696. https://doi.org/10.1109/tip.2017.2721112
Frutos-Pascual M, Garcia-Zapirain B (2015) Assessing visual attention using eye tracking sensors in intelligent cognitive therapies based on serious games. Sensors 15:11092–11117. https://doi.org/10.3390/s150511092
Goodfellow I, Bengio Y, Courville A, Bengio Y (2016) Deep learning. MIT Press, Cambridge. https://doi.org/10.4258/hir.2016.22.4.351
Green CS, Bavelier D (2003) Action video game modifies visual selective attention. Nature 423:534. https://doi.org/10.1038/nature01647
Guo Y, Liu Y, Oerlemans A, Lao S, Wu S, Lew MS (2016) Deep learning for visual understanding: a review. Neurocomputing 187:27–48. https://doi.org/10.1016/j.neucom.2015.09.116
Haber J, Myszkowski K, Yamauchi H, Seidel HP (2001) Perceptually guided corrective splatting. Computer Graphics Forum. Wiley Online Library, Amsterdam, pp 142–153. https://doi.org/10.1111/1467-8659.00507
Han H, Lu A, Wells U (2017) Under the movement of head: evaluating visual attention in immersive virtual reality environment. Int Conf Virtual Real Vis. https://doi.org/10.1109/icvrv.2017.00067
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Proc IEEE Conf Comput Vis Pattern Recognit. https://doi.org/10.1109/CVPR.2016.90
Hell S, Argyriou V (2018) Machine learning architectures to predict motion sickness using a Virtual Reality rollercoaster simulation tool. IEEE Int Conf Artif Intell Virtual Real. https://doi.org/10.1109/AIVR.2018.00032
Hillaire S, Lécuyer A, Breton G, Corte TR (2009) Gaze behavior and visual attention model when turning in virtual environments. Proc ACM Symp Virtual Real Softw Technol. https://doi.org/10.1145/1643928.1643941
Huang H, Lin N-C, Barrett L, Springer D, Wang H-C, Pomplun M, Yu L-F (2016) Analyzing visual attention via virtual environments. SIGGRAPH ASIA Virtual Real Meets Phys Real. https://doi.org/10.1145/2992138.2992152
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. Int Conf Mach Learn. https://doi.org/10.5555/3045118.3045167
Itti L, Koch C (2001) Computational modelling of visual attention. Nat Rev Neurosci 2:194–203. https://doi.org/10.1038/35058500
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/34.730558
John B, Raiturkar P, Banerjee A, Jain E (2018) An evaluation of pupillary light response models for 2D screens and VR HMDs. Proc ACM Symp Virtual Real Softw Technol. https://doi.org/10.1145/3281505.3281538
Karim F, Majumdar S, Darabi H, Chen S (2017) LSTM fully convolutional networks for time series classification IEEE. Access 6:1662–1669. https://doi.org/10.1109/ACCESS.2017.2779939
Karim F, Majumdar S, Darabi H, Harford S (2019) Multivariate LSTM-FCNs for time series classification. Neural Netw 116:237–245. https://doi.org/10.1016/j.neunet.2019.04.014
Laprade C, Bowman B, Huang HH (2020) PicoDomain: a compact high-fidelity cybersecurity dataset. arXiv:2008.09192
Li L, Ren J, Wang X (2015) Fast cat-eye effect target recognition based on saliency extraction. Opt Commun 350:33–39. https://doi.org/10.1016/j.optcom.2015.03.065
Li X, Zhou Y, Chen W, Hansen P, Geng W, Sun L (2019) Towards personalised virtual reality touring through cross-object user interfaces. DE GRUYTER Press, Berlin. https://doi.org/10.1515/9783110552485-008
Lin M, Chen Q, Yan S (2013) Network in network. arXiv:1312.4400
Lin T, Guo T, Aberer K (2017) Hybrid neural networks for learning the trend in time series. Proc Twenty-Sixth Int Jt Conf Artif Intell. https://doi.org/10.24963/ijcai.2017/316
Lo W, Fan C, Lee J, Huang C, Chen K, Hsu C (2017) Video viewing dataset in head-mounted virtual reality. ACM Sigmm Conf Multimed Syst. https://doi.org/10.1145/3083187.3083219
Low T, Bubalo N, Gossen T, Kotzyba M, Brechmann A, Huckauf A, Nürnberger A (2017) Towards identifying user intentions in exploratory search using gaze and pupil tracking. Proc Conf Hum Inform Interact Retr. https://doi.org/10.1145/3020165.3022131
Mahdi A, Qin J, Representation I (2019) An extensive evaluation of deep featuresof convolutional neural networks for saliency prediction of human visual attention. J Vis Commun 65:102662. https://doi.org/10.1016/j.jvcir.2019.102662
Mnih V, Heess N, Graves A (2014) Recurrent models of visual attention. Advances in neural information processing systems. Springer, Berlin, pp 2204–2212. https://doi.org/10.5555/2969033.2969073
Moniri MM, Valcarcel FAE, Merkel D, Sonntag D (2016) Human gaze and focus-of-attention in dual reality human-robot collaboration. Int Conf Intell Environ. https://doi.org/10.1109/IE.2016.54
Nielsen LT, Møller MB, Hartmeyer SD, Ljung T, Nilsson NC, Nordahl R, Serafin S (2016) Missing the point: an exploration of how to guide users’ attention during cinematic virtual reality. Proc ACM Conf Virtual Real Softw Technol. https://doi.org/10.1145/2993369.2993405
Ouyang W et al (2014) Deepid-net: multi-stage and deformable deep convolutional neural networks for object detection. Pattern Recognit. https://doi.org/10.1016/j.patcog.2018.02.004
Ozcinar C, Smolic A (2018) Visual attention in omnidirectional video for virtual reality applications. Tenth Int Conf Qual Multimed Exp (QoMEX). https://doi.org/10.1109/QoMEX.2018.8463418
Schubert T, Finke K, Redel P, Kluckow S, Müller H, Strobach T (2015) Video game experience and its influence on visual attention parameters: an investigation using the framework of the Theory of Visual Attention (TVA). Acta Psychol 157:200–214. https://doi.org/10.1016/j.actpsy.2015.03.005
Sitzmann V, Serrano A, Pavel A, Agrawala M, Gutierrez D, Masia B, Wetzstein G (2018) Saliency in VR: How do people explore virtual environments? IEEE Trans Vis Comput Graph 24:1633–1642. https://doi.org/10.1109/TVCG.2018.2793599
Sun G, Wu Y, Liu S, Peng T-Q, Zhu JJ, Liang R (2014) Evoriver: visual analysis of topic coopetition on social media. IEEE Trans Vis Comput Graph 20:1753–1762. https://doi.org/10.1109/TVCG.2014.2346919
Sun L, Zhou Y, Hansen P, Geng W, Li X (2018) Cross-objects user interfaces for video interaction in virtual reality museum context. Multimed Tools Appl 77:29013–29041. https://doi.org/10.1007/s11042-018-6091-5
Upenik E, Ebrahimi T (2017) A simple method to obtain visual attention data in head mounted virtual reality. IEEE Int Conf Multimed Expo Worksh. https://doi.org/10.1109/ICMEW.2017.8026231
Walter R, Bulling A, Lindlbauer D, Schuessler M, Müller J (2015) Analyzing visual attention during whole body interaction with public displays. Proce ACM Int Jt Conf Pervasive Ubiquitous Comput. https://doi.org/10.1145/2750858.2804255
Wang W, Shen J (2017) Deep visual attention prediction. IEEE Trans Image Process 27:2368–2378. https://doi.org/10.1109/TIP.2017.2787612
Wang Y, Huang M, Zhu X, Zhao L (2016) Attention-based LSTM for aspect-level sentiment classification. Proc Conf Empir Methods Natl Lang Process. https://doi.org/10.18653/v1/D16-1058
Wang W, Shen J, Xie J, Cheng M-M, Ling H, Borji A (2019) Revisiting video saliency prediction in the deep learning era. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2019.2924417
Wood G, Hartley G, Furley P, Wilson M (2016) Working memory capacity, visual attention and hazard perception in driving. J Appl Res Mem Cognit 5:454–462. https://doi.org/10.1016/j.jarmac.2016.04.009
Xu Y, Dong Y, Wu J, Sun Z, Shi Z, Yu J, Gao S (2018) Gaze prediction in dynamic 360 immersive videos. Proc IEEE Conf Comput Vis Pattern Recognit. https://doi.org/10.1109/CVPR.2018.00559
Yan Y et al (2018) Unsupervised image saliency detection with Gestalt-laws guided optimization and visual attention based refinement. Pattern Recogn 79:65–78. https://doi.org/10.1016/j.patcog.2018.02.004
Yang F-Y, Chang C-Y, Chien W-R, Chien Y-T, Tseng Y-H (2013) Tracking learners’ visual attention during a multimedia presentation in a real classroom. Comput Educ 62:208–220. https://doi.org/10.1016/j.compedu.2012.10.009
Yang Q, Banovic N, Zimmerman J (2018) Mapping machine learning advances from HCI research to reveal starting places for design innovation. Proc Conf Hum Fact Comput Syst. https://doi.org/10.1145/3173574.3173704
Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? Adv Neural Inform Process Syst. https://doi.org/10.5555/2969033.2969197
Yu M, Lakshman H, Girod B (2015) A framework to evaluate omnidirectional video coding schemes. Int Symp Mixed Augment Real. https://doi.org/10.1109/ISMAR.2015.12
Zhao Y, Forte M, Kopper R (2018) VR touch museum. In: 2018 IEEE Conference on Virtual Reality and 3D User Interfaces (VR). IEEE. https://doi.org/10.1109/VR.2018.8446581
Zhou Y, Feng T, Shuai S, Li X, Sun L, Duh HBL (2019) An eye-tracking dataset for visual attention modelling in a virtual museum context. The 17th international conference on virtual-reality continuum and its applications in industry. Association for Computing Machinery, Brisbane. https://doi.org/10.1145/3359997.3365738
Zhu Y, Zhai G, Min X (2018) The prediction of head and eye movement for 360 degree images. Signal Process 69:15–25. https://doi.org/10.1016/j.image.2018.05.010
Acknowledgements
The work is supported by Natural Science Foundation of China (61802341) and ZJU-SUTD IDEA programme (IDEA006).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors report no conflicts of interest.
Ethical approval
This study was approved by the university human research ethics committee, and all procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The Original article has been corrected: The fifth author name was corrected as Praben Hansen.
Rights and permissions
About this article
Cite this article
Li, X., Shan, Y., Chen, W. et al. Predicting user visual attention in virtual reality with a deep learning model. Virtual Reality 25, 1123–1136 (2021). https://doi.org/10.1007/s10055-021-00512-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10055-021-00512-7