skip to main content
10.1145/3448018.3457996acmconferencesArticle/Chapter ViewAbstractPublication PagesetraConference Proceedingsconference-collections
short-paper

OpenNEEDS: A Dataset of Gaze, Head, Hand, and Scene Signals During Exploration in Open-Ended VR Environments

Published: 25 May 2021 Publication History

Abstract

We present OpenNEEDS, the first large-scale, high frame rate, comprehensive, and open-source dataset of Non-Eye (head, hand, and scene) and Eye (3D gaze vectors) data captured for 44 participants as they freely explored two virtual environments with many potential tasks (i.e., reading, drawing, shooting, object manipulation, etc.). With this dataset, we aim to enable research on the relationship between head, hand, scene, and gaze spatiotemporal statistics and its applications to gaze estimation. To demonstrate the power of OpenNEEDS, we show that gaze estimation models using individual non-eye sensors and an early fusion model combining all non-eye sensors outperform all baseline gaze estimation models considered, suggesting the possibility of considering non-eye sensors in the design of robust eye trackers. We anticipate that this dataset will support research progress in many areas and applications such as gaze estimation and prediction, sensor fusion, human-computer interaction, intent prediction, perceptuo-motor control, and machine learning.

References

[1]
Pradeep K. Atrey, M. Anwar Hossain, Abdulmotaleb El Saddik, and Mohan S. Kankanhalli. 2010. Multimodal fusion for multimedia analysis: a survey. Multimedia Systems 16, 6 (Nov 2010), 345–379. https://doi.org/10.1007/s00530-010-0182-0
[2]
Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 785–794. https://doi.org/10.1145/2939672.2939785
[3]
M. Cheng, N. J. Mitra, X. Huang, P. H. S. Torr, and S. Hu. 2015. Global Contrast Based Salient Region Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 3(2015), 569–582. https://doi.org/10.1109/TPAMI.2014.2345401
[4]
Alasdair D.F. Clarke and Benjamin W. Tatler. 2014. Deriving an appropriate baseline for describing fixation behaviour. Vision Research 102 (Sep 2014), 41–51. https://doi.org/10.1016/j.visres.2014.06.016
[5]
M. Cornia, L. Baraldi, G. Serra, and R. Cucchiara. 2018. Predicting Human Eye Fixations via an LSTM-Based Saliency Attentive Model. IEEE Transactions on Image Processing 27, 10 (2018), 5142–5154. https://doi.org/10.1109/TIP.2018.2851672
[6]
Wolfgang Einhäuser, Frank Schumann, Stanislavs Bardins, Klaus Bartl, Guido Böning, Erich Schneider, and Peter König. 2007. Human eye-head co-ordination in natural exploration. Network: Computation in Neural Systems 18, 3 (Jan 2007), 267–297. https://doi.org/10.1080/09548980701671094
[7]
Yu Fang, Ryoichi Nakashima, Kazumichi Matsumiya, Ichiro Kuriki, and Satoshi Shioiri. 2015. Eye-Head Coordination for Visual Cognitive Processing. PLOS ONE 10, 3 (Mar 2015), e0121035. https://doi.org/10.1371/journal.pone.0121035
[8]
Jerome H. Friedman. 2001. Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics 29, 5 (2001), 1189–1232. http://www.jstor.org/stable/2699986
[9]
S. Goferman, L. Zelnik-Manor, and A. Tal. 2012. Context-Aware Saliency Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 10(2012), 1915–1926. https://doi.org/10.1109/TPAMI.2011.272
[10]
E. D. Guestrin and M. Eizenman. 2006. General theory of remote gaze estimation using the pupil center and corneal reflections. IEEE Transactions on Biomedical Engineering 53, 6 (2006), 1124–1133. https://doi.org/10.1109/TBME.2005.863952
[11]
Trevor Hastie, Robert Tibshirani, and J. H. Friedman. 2009. The elements of statistical learning: data mining, inference, and prediction (2nd ed ed.). Springer.
[12]
Qibin Hou, Ming-Ming Cheng, Xiaowei Hu, Ali Borji, Zhuowen Tu, and Philip H. S. Torr. 2017. Deeply Supervised Salient Object Detection With Short Connections. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13]
Z. Hu, C. Zhang, S. Li, G. Wang, and D. Manocha. 2019. SGaze: A Data-Driven Eye-Head Coordination Model for Realtime Gaze Prediction. IEEE Transactions on Visualization and Computer Graphics 25, 5(2019), 2002–2010. https://doi.org/10.1109/TVCG.2019.2899187
[14]
Xun Huang, Chengyao Shen, Xavier Boix, and Qi Zhao. 2015. SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
[15]
L. Itti, C. Koch, and E. Niebur. 1998. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 11(1998), 1254–1259. https://doi.org/10.1109/34.730558
[16]
Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning. Springer Texts in Statistics, Vol. 103. Springer New York. https://doi.org/10.1007/978-1-4614-7138-7
[17]
Yangqing Jia and Mei Han. 2013. Category-Independent Object-Level Saliency Detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
[18]
T. Judd, K. Ehinger, F. Durand, and A. Torralba. 2009. Learning to predict where humans look. In 2009 IEEE 12th International Conference on Computer Vision. 2106–2113. https://doi.org/10.1109/ICCV.2009.5459462
[19]
B. Khaleghi, S. N. Razavi, A. Khamis, F. O. Karray, and M. Kamel. 2009. Multisensor data fusion: Antecedents and directions. In 2009 3rd International Conference on Signals, Circuits and Systems (SCS). 1–6. https://doi.org/10.1109/ICSCS.2009.5412296
[20]
Wolf Kienzle, Felix A. Wichmann, Matthias Franz, and Bernhard Schölkopf. 2007. A Nonparametric Approach to Bottom-Up Visual Saliency. In Advances in Neural Information Processing Systems, B. Schölkopf, J. Platt, and T. Hoffman (Eds.), Vol. 19. MIT Press, 689–696. https://proceedings.neurips.cc/paper/2006/file/a2d10d355cdebc879e4fc6ecc6f63dd7-Paper.pdf
[21]
Christof Koch and Shimon Ullman. 1987. Shifts in Selective Visual Attention: Towards the Underlying Neural Circuitry. Springer Netherlands, 115–141. https://doi.org/10.1007/978-94-009-3833-5_5
[22]
Rakshit Kothari, Zhizhuo Yang, Christopher Kanan, Reynold Bailey, Jeff B. Pelz, and Gabriel J. Diaz. 2020. Gaze-in-wild: A dataset for studying eye and head coordination in everyday activities. Scientific Reports 10, 1 (Dec 2020), 2539. https://doi.org/10.1038/s41598-020-59251-5
[23]
S. S. S. Kruthiventi, K. Ayush, and R. V. Babu. 2017. DeepFix: A Fully Convolutional Neural Network for Predicting Human Eye Fixations. IEEE Transactions on Image Processing 26, 9 (2017), 4446–4456. https://doi.org/10.1109/TIP.2017.2710620
[24]
Matthias Kümmerer, Lucas Theis, and Matthias Bethge. 2015. Deep Gaze I: Boosting Saliency Prediction with Feature Maps Trained on ImageNet. arxiv:1411.1045 [cs.CV]
[25]
Michael F. Land and Mary Hayhoe. 2001. In what ways do eye movements contribute to everyday activities?Vision Research 41, 25–26 (Nov 2001), 3559–3565. https://doi.org/10.1016/S0042-6989(01)00102-X
[26]
S. M. LaValle, A. Yershova, M. Katsev, and M. Antonov. 2014. Head tracking for the Oculus Rift. In 2014 IEEE International Conference on Robotics and Automation (ICRA). 187–194. https://doi.org/10.1109/ICRA.2014.6906608
[27]
Olivier Le Meur and Thierry Baccino. 2013. Methods for comparing scanpaths and saliency maps: strengths and weaknesses. Behavior Research Methods 45, 1 (Mar 2013), 251–266. https://doi.org/10.3758/s13428-012-0226-9
[28]
Guanbin Li and Yizhou Yu. 2015. Visual Saliency Based on Multiscale Deep Features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[29]
Yin Li, Alireza Fathi, and James M. Rehg. 2013. Learning to Predict Gaze in Egocentric Video. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
[30]
Yin Li, Miao Liu, and James M. Rehg. 2018. In the Eye of Beholder: Joint Learning of Gaze and Actions in First Person Video. In Proceedings of the European Conference on Computer Vision (ECCV).
[31]
Yin Li, Zhefan Ye, and James M. Rehg. 2015. Delving Into Egocentric Actions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[32]
T. Liu, Z. Yuan, J. Sun, J. Wang, N. Zheng, X. Tang, and H. Shum. 2011. Learning to Detect a Salient Object. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 2(2011), 353–367. https://doi.org/10.1109/TPAMI.2010.70
[33]
Ryoichi Nakashima, Yu Fang, Yasuhiro Hatori, Akinori Hiratani, Kazumichi Matsumiya, Ichiro Kuriki, and Satoshi Shioiri. 2015. Saliency-based gaze prediction based on head direction. Vision Research 117 (Dec 2015), 59–66. https://doi.org/10.1016/j.visres.2015.10.001
[34]
A. Nuthmann and J. M. Henderson. 2010. Object-based attentional selection in scene viewing. Journal of Vision 10, 8 (Jul 2010), 20–20. https://doi.org/10.1167/10.8.20
[35]
Jeff Pelz, Mary Hayhoe, and Russ Loeber. 2001. The coordination of eye, head, and hand movements in a natural task. Experimental Brain Research 139, 3 (Aug 2001), 266–277. https://doi.org/10.1007/s002210100745
[36]
Soujanya Poria, Erik Cambria, Rajiv Bajpai, and Amir Hussain. 2017. A review of affective computing: From unimodal analysis to multimodal fusion. Information Fusion 37 (Sep 2017), 98–125. https://doi.org/10.1016/j.inffus.2017.02.003
[37]
Yashas Rai, Jesús Gutiérrez, and Patrick Le Callet. 2017. A Dataset of Head and Eye Movements for 360 Degree Images. In Proceedings of the 8th ACM on Multimedia Systems Conference. ACM, 205–210. https://doi.org/10.1145/3083187.3083218
[38]
D. Ramachandram and G. W. Taylor. 2017. Deep Multimodal Learning: A Survey on Recent Advances and Trends. IEEE Signal Processing Magazine 34, 6 (2017), 96–108. https://doi.org/10.1109/MSP.2017.2738401
[39]
L. Ren and J. D. Crawford. 2009. Coordinate transformations for hand-guided saccades. Experimental Brain Research 195, 3 (May 2009), 455–465. https://doi.org/10.1007/s00221-009-1811-8
[40]
V. Sitzmann, A. Serrano, A. Pavel, M. Agrawala, D. Gutierrez, B. Masia, and G. Wetzstein. 2018. Saliency in VR: How Do People Explore Virtual Environments?IEEE Transactions on Visualization and Computer Graphics 24, 4(2018), 1633–1642. https://doi.org/10.1109/TVCG.2018.2793599
[41]
William W. Sprague, Emily A. Cooper, Ivana Tošić, and Martin S. Banks. 2015. Stereopsis is adaptive for the natural environment. Science Advances 1, 4 (2015). https://doi.org/10.1126/sciadv.1400254 arXiv:https://advances.sciencemag.org/content/1/4/e1400254.full.pdf
[42]
Antonio Torralba, Aude Oliva, Monica S. Castelhano, and John M. Henderson. 2006. Contextual guidance of eye movements and attention in real-world scenes: The role of global features in object search.Psychological Review 113, 4 (Oct 2006), 766–786. https://doi.org/10.1037/0033-295X.113.4.766
[43]
Eleonora Vig, Michael Dorr, and David Cox. 2014. Large-Scale Optimization of Hierarchical Features for Saliency Prediction in Natural Images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[44]
Alfred L. Yarbus. 1967. Eye Movements and Vision. Springer US. https://doi.org/10.1007/978-1-4899-5379-7

Cited By

View all
  • (2024)Creative Insights into Motion: Enhancing Human Activity Understanding with 3D Data Visualization and AnnotationProceedings of the 16th Conference on Creativity & Cognition10.1145/3635636.3664256(482-487)Online publication date: 23-Jun-2024
  • (2024)GEARS: Generalizable Multi-Purpose Embeddings for Gaze and Hand Data in VR InteractionsProceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization10.1145/3627043.3659551(279-289)Online publication date: 22-Jun-2024
  • (2024)HOIMotion: Forecasting Human Motion During Human-Object Interactions Using Egocentric 3D Object Bounding BoxesIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.345616130:11(7375-7385)Online publication date: Nov-2024
  • Show More Cited By

Index Terms

  1. OpenNEEDS: A Dataset of Gaze, Head, Hand, and Scene Signals During Exploration in Open-Ended VR Environments
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        ETRA '21 Short Papers: ACM Symposium on Eye Tracking Research and Applications
        May 2021
        232 pages
        ISBN:9781450383455
        DOI:10.1145/3448018
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 25 May 2021

        Permissions

        Request permissions for this article.

        Check for updates

        Badges

        • Best Short Paper

        Author Tags

        1. datasets
        2. eye tracking
        3. gaze estimation
        4. virtual reality

        Qualifiers

        • Short-paper
        • Research
        • Refereed limited

        Conference

        ETRA '21
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 69 of 137 submissions, 50%

        Upcoming Conference

        ETRA '25

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)108
        • Downloads (Last 6 weeks)7
        Reflects downloads up to 08 Mar 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Creative Insights into Motion: Enhancing Human Activity Understanding with 3D Data Visualization and AnnotationProceedings of the 16th Conference on Creativity & Cognition10.1145/3635636.3664256(482-487)Online publication date: 23-Jun-2024
        • (2024)GEARS: Generalizable Multi-Purpose Embeddings for Gaze and Hand Data in VR InteractionsProceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization10.1145/3627043.3659551(279-289)Online publication date: 22-Jun-2024
        • (2024)HOIMotion: Forecasting Human Motion During Human-Object Interactions Using Egocentric 3D Object Bounding BoxesIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.345616130:11(7375-7385)Online publication date: Nov-2024
        • (2024)Real-Time Gaze Tracking via Head-Eye Cues on Head Mounted DevicesIEEE Transactions on Mobile Computing10.1109/TMC.2024.342592823:12(13292-13309)Online publication date: Dec-2024
        • (2024)GazeMotion: Gaze-guided Human Motion Forecasting2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS58592.2024.10802548(13017-13022)Online publication date: 14-Oct-2024
        • (2024)Eye-tracking on virtual reality: a surveyVirtual Reality10.1007/s10055-023-00903-y28:1Online publication date: 5-Feb-2024
        • (2024)MoViAn: Advancing Human Motion Analysis with 3D Visualization and AnnotationProceedings of the International Conference on Ubiquitous Computing and Ambient Intelligence (UCAmI 2024)10.1007/978-3-031-77571-0_2(15-26)Online publication date: 21-Dec-2024
        • (2023)Intention Estimation with Recurrent Neural Networks for Mixed Reality Environments2023 26th International Conference on Information Fusion (FUSION)10.23919/FUSION52260.2023.10224151(1-8)Online publication date: 28-Jun-2023
        • (2023)Automatic Gaze Analysis: A Survey of Deep Learning Based ApproachesIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.332133746:1(61-84)Online publication date: 15-Nov-2023
        • (2023)Appearance-based gaze estimation with feature fusion of multi-level information elementsJournal of Computational Design and Engineering10.1093/jcde/qwad03810:3(1080-1109)Online publication date: 25-Apr-2023
        • Show More Cited By

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media