skip to main content
10.1145/3450341.3458766acmconferencesArticle/Chapter ViewAbstractPublication PagesetraConference Proceedingsconference-collections
short-paper

Automatic Recognition and Augmentation of Attended Objects in Real-time using Eye Tracking and a Head-mounted Display

Published: 25 May 2021 Publication History

Abstract

Scanning and processing visual stimuli in a scene is essential for the human brain to make situation-aware decisions. Adding the ability to observe the scanning behavior and scene processing to intelligent mobile user interfaces can facilitate a new class of cognition-aware user interfaces. As a first step in this direction, we implement an augmented reality (AR) system that classifies objects at the user’s point of regard, detects visual attention to them, and augments the real objects with virtual labels that stick to the objects in real-time. We use a head-mounted AR device (Microsoft HoloLens 2) with integrated eye tracking capabilities and a front-facing camera for implementing our prototype.

References

[1]
Ecenaz Alemdag and Kursat Cagiltay. 2018. A systematic review of eye tracking research on multimedia learning. Computers & Education 125 (2018), 413 – 428. https://doi.org/10.1016/j.compedu.2018.06.023
[2]
Michael Barz, Peter Poller, and Daniel Sonntag. 2017. Evaluating Remote and Head-worn Eye Trackers in Multi-modal Speech-based HRI. In Companion of the 2017 {ACM/IEEE} International Conference on Human-Robot Interaction, {HRI} 2017, Vienna, Austria, March 6-9, 2017, Bilge Mutlu, Manfred Tscheligi, Astrid Weiss, and James E Young (Eds.). ACM, New York, NY, USA, 79–80. https://doi.org/10.1145/3029798.3038367
[3]
Michael Barz and Daniel Sonntag. 2016. Gaze-guided object classification using deep neural networks for attention-based computing. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing Adjunct - UbiComp ’16. ACM Press, New York, New York, USA, 253–256. https://doi.org/10.1145/2968219.2971389
[4]
Michael Barz, Sven Stauden, and Daniel Sonntag. 2020. Visual Search Target Inference in Natural Interaction Settings with Machine Learning. In Proceedings of the 2020 ACM Symposium on Eye Tracking Research & Applications. Association for Computing Machinery, New York, NY, USA, 1–8. https://doi.org/10.1145/3379155.3391314
[5]
Paulo Blikstein. 2013. Multimodal learning analytics. In Proceedings of the Third International Conference on Learning Analytics and Knowledge - LAK ’13. ACM Press, New York, New York, USA, 102. https://doi.org/10.1145/2460296.2460316
[6]
Ali Borji and Laurent Itti. 2013. State-of-the-Art in Visual Attention Modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1 (jan 2013), 185–207. https://doi.org/10.1109/TPAMI.2012.89
[7]
Andreas Bulling. 2016. Pervasive Attentive User Interfaces. IEEE Computer 49, 1 (jan 2016), 94–98. https://doi.org/10.1109/MC.2016.32
[8]
Cristina Conati, Sébastien Lallé, Md Abed Rahman, and Dereck Toker. 2020. Comparing and Combining Interaction Data and Eye-tracking Data for the Real-time Prediction of User Cognitive Abilities in Visualization Tasks. ACM Transactions on Interactive Intelligent Systems 10, 2 (June 2020), 12. https://doi.org/10.1145/3301400 Publisher: Association for Computing Machinery.
[9]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778. http://image-net.org/challenges/LSVRC/2015/
[10]
Koki Ijuin, Kristiina Jokinen Jokinen, Tsuneo Kato, and Seiichi Yamamoto. 2019. Eye-gaze in Social Robot Interactions. Proceedings of the Annual Conference of JSAI JSAI2019 (2019), 3J3E402–3J3E402. https://doi.org/10.11517/pjsai.JSAI2019.0_3J3E402
[11]
Sebastian Kapp, Michael Barz, Sergey Mukhametov, Daniel Sonntag, and Jochen Kuhn. 2021. ARETT: Augmented Reality Eye Tracking Toolkit for Head Mounted Displays. Sensors 21, 6 (March 2021), 2234. https://doi.org/10.3390/s21062234 Number: 6 Publisher: Multidisciplinary Digital Publishing Institute.
[12]
Sebastian Kapp, Michael Thees, Fabian Beil, Thomas Weatherby, Jan-Philipp Burde, Thomas Wilhelm, and Jochen Kuhn. 2020. The Effects of Augmented Reality: A Comparative Study in an Undergraduate Physics Laboratory Course. In Proceedings of the 12th International Conference on Computer Supported Education - Volume 2: CSEDU,. INSTICC, SciTePress, 197–206. https://doi.org/10.5220/0009793001970206
[13]
Michael F. Land and Mary Hayhoe. 2001. In what ways do eye movements contribute to everyday activities?Vision Research 41, 25–26 (nov 2001), 3559–3565. https://doi.org/10.1016/S0042-6989(01)00102-X
[14]
Cynthia Matuszek. 2018. Grounded Language Learning: Where Robotics and NLP Meet. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 5687–5691. https://doi.org/10.24963/ijcai.2018/810
[15]
Gregor Mehlmann, Markus Häring, Kathrin Janowski, Tobias Baur, Patrick Gebhard, and Elisabeth André. 2014. Exploring a Model of Gaze for Grounding in Multimodal HRI. In Proceedings of the 16th International Conference on Multimodal Interaction - ICMI ’14. ACM, New York, NY, USA, 247–254. https://doi.org/10.1145/2663204.2663275
[16]
Jason Orlosky, Takumi Toyama, Daniel Sonntag, and Kiyoshi Kiyokawa. 2014. Using Eye-Gaze and Visualization to Augment Memory. In Distributed, Ambient, and Pervasive Interactions, Norbert Streitz and Panos Markopoulos (Eds.), Vol. 8530 LNCS. Springer International Publishing, Cham, 282–291. https://doi.org/10.1007/978-3-319-07788-8_27
[17]
Sharon Oviatt. 2018. Ten Opportunities and Challenges for Advancing Student-Centered Multimodal Learning Analytics. In Proceedings of the 2018 on International Conference on Multimodal Interaction - ICMI ’18. ACM Press, New York, New York, USA, 87–94. https://doi.org/10.1145/3242969.3243010
[18]
Karen Panetta, Qianwen Wan, Aleksandra Kaszowska, Holly A. Taylor, and Sos Agaian. 2019. Software Architecture for Automating Cognitive Science Eye-Tracking Data Analysis and Object Annotation. IEEE Transactions on Human-Machine Systems 49, 3 (2019), 268–277. https://doi.org/10.1109/THMS.2019.2892919
[19]
Constantin A. Rothkopf, Dana H. Ballard, and Mary M. Hayhoe. 2016. Task and context determine where you look. Journal of Vision 7, 14 (jul 2016), 16. https://doi.org/10.1167/7.14.16
[20]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211–252. https://doi.org/10.1007/s11263-015-0816-y arxiv:1409.0575
[21]
Hosnieh Sattar, Andreas Bulling, and Mario Fritz. 2017. Predicting the Category and Attributes of Visual Search Targets Using Deep Gaze Pooling. In 2017 IEEE International Conference on Computer Vision Workshop (ICCVW). {IEEE} Computer Society, Los Alamitos, CA, USA, 2740–2748. https://doi.org/10.1109/ICCVW.2017.322
[22]
Daniel Sonntag. 2014. ERmed–Towards Medical Multimodal Cyber-Physical Environments. In Foundations of Augmented Cognition. Advancing Human Performance and Decision-Making through Adaptive Systems. Springer, 359–370. https://doi.org/10.1007/978-3-319-07527-3_34
[23]
Daniel Sonntag. 2015. Kognit: Intelligent Cognitive Enhancement Technology by Cognitive Models and Mixed Reality for Dementia Patients. (2015). https://www.aaai.org/ocs/index.php/FSS/FSS15/paper/view/11702
[24]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going Deeper With Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 9 pages.
[25]
Michael Thees, Sebastian Kapp, Martin P. Strzys, Fabian Beil, Paul Lukowicz, and Jochen Kuhn. 2020. Effects of augmented reality on learning and cognitive load in university physics laboratory courses. Computers in Human Behavior 108 (2020), 106316. https://doi.org/10.1016/j.chb.2020.106316
[26]
Takumi Toyama, Thomas Kieninger, Faisal Shafait, and Andreas Dengel. 2012. Gaze guided object recognition using a head-mounted eye tracker. In Proceedings of the Symposium on Eye Tracking Research and Applications(ETRA ’12). ACM, 91–98. https://doi.org/10.1145/2168556.2168570
[27]
Takumi Toyama and Daniel Sonntag. 2015. Towards episodic memory support for dementia patients by recognizing objects, faces and text in eye gaze. In KI 2015: Advances in Artificial Intelligence, Vol. 9324. Springer International Publishing, Cham, 316–323. https://doi.org/10.1007/978-3-319-24489-1
[28]
Julian Wolf, Stephan Hess, David Bachmann, Quentin Lohmeyer, and Mirko Meboldt. 2018. Automating areas of interest analysis in mobile eye tracking experiments based on machine learning. Journal of Eye Movement Research 11, 6 (2018), 12. https://doi.org/10.3929/ethz-b-000309840

Cited By

View all
  • (2024)A review of machine learning in scanpath analysis for passive gaze-based interactionFrontiers in Artificial Intelligence10.3389/frai.2024.13917457Online publication date: 5-Jun-2024
  • (2024)HumanEYEze 2024: Workshop on Eye Tracking for Multimodal Human-Centric ComputingProceedings of the 26th International Conference on Multimodal Interaction10.1145/3678957.3688384(696-697)Online publication date: 4-Nov-2024
  • (2024)NeighboAR: Efficient Object Retrieval using Proximity- and Gaze-based Object Grouping with an AR SystemProceedings of the ACM on Human-Computer Interaction10.1145/36555998:ETRA(1-19)Online publication date: 28-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ETRA '21 Adjunct: ACM Symposium on Eye Tracking Research and Applications
May 2021
78 pages
ISBN:9781450383578
DOI:10.1145/3450341
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 May 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. augmented reality
  2. cognition-aware computing
  3. computer vision
  4. eye tracking
  5. visual attention

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Funding Sources

Conference

ETRA '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 69 of 137 submissions, 50%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)92
  • Downloads (Last 6 weeks)6
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A review of machine learning in scanpath analysis for passive gaze-based interactionFrontiers in Artificial Intelligence10.3389/frai.2024.13917457Online publication date: 5-Jun-2024
  • (2024)HumanEYEze 2024: Workshop on Eye Tracking for Multimodal Human-Centric ComputingProceedings of the 26th International Conference on Multimodal Interaction10.1145/3678957.3688384(696-697)Online publication date: 4-Nov-2024
  • (2024)NeighboAR: Efficient Object Retrieval using Proximity- and Gaze-based Object Grouping with an AR SystemProceedings of the ACM on Human-Computer Interaction10.1145/36555998:ETRA(1-19)Online publication date: 28-May-2024
  • (2024)Leveraging Digital Trace Data to Investigate and Support Human-Centered Work ProcessesEvaluation of Novel Approaches to Software Engineering10.1007/978-3-031-64182-4_1(1-23)Online publication date: 10-Jul-2024
  • (2023)Evaluating the Usability of a Gaze-Adaptive Approach for Identifying and Comparing Raster Values between MultilayersISPRS International Journal of Geo-Information10.3390/ijgi1210041212:10(412)Online publication date: 8-Oct-2023
  • (2023)MR Object Identification and InteractionProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36108797:3(1-26)Online publication date: 27-Sep-2023
  • (2022)In-Depth Review of Augmented Reality: Tracking Technologies, Development Tools, AR Displays, Collaborative AR, and Security ConcernsSensors10.3390/s2301014623:1(146)Online publication date: 23-Dec-2022
  • (2022)Where and WhatProceedings of the ACM on Human-Computer Interaction10.1145/35308876:ETRA(1-22)Online publication date: 13-May-2022
  • (2021)Mobile Eye-Tracking Data Analysis Using Object Detection via YOLO v4Sensors10.3390/s2122766821:22(7668)Online publication date: 18-Nov-2021
  • (2021)Investigating the Usability of a Head-Mounted Display Augmented Reality Device in Elementary School ChildrenSensors10.3390/s2119662321:19(6623)Online publication date: 5-Oct-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media