research-article

Toward Facilitating Search in VR With the Assistance of Vision Large Language Models

Authors:

Chi San (Clarence) Cheung,

Zhongyue Zhang,

Mingming FanAuthors Info & Claims

VRST '24: Proceedings of the 30th ACM Symposium on Virtual Reality Software and Technology

Article No.: 35, Pages 1 - 14

https://doi.org/10.1145/3641825.3687742

Published: 09 October 2024 Publication History

Abstract

While search is a common need in Virtual Reality (VR) applications, current approaches are cumbersome, often requiring users to type on a mid-air keyboard using controllers in VR or remove VR equipment to search on a computer. We first conducted a literature review and a formative study, identifying six common search needs: knowing about one object, knowing about the object’s partial details, knowing objects with environmental context, knowing about interactions with objects, and finding objects within field of view (FOV) and out of FOV in the VR scene. Informed by these needs, we designed technology probes that leveraged recent advances in Vision Large Language Models and conducted a probe-based study with users to elicit feedback. Based on the findings, we derived design principles for VR designers and developers to consider when designing a user-friendly search interface in VR. While prior work about VR search tended to address specific aspects of search, our work contributes design considerations aimed at enhancing the ease of search in VR and potential future directions.

References

[1]

Bon Adriel Aseniero, Michael Lee, Yi Wang, Qian Zhou, Nastaran Shahmansouri, and Rhys Goldstein. 2024. Experiential Views: Towards Human Experience Evaluation of Designed Spaces using Vision-Language Models. In Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems(CHI EA ’24). Association for Computing Machinery, New York, NY, USA, Article 136, 7 pages. https://doi.org/10.1145/3613905.3650815

Digital Library

[2]

Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, and Jingren Zhou. 2023. Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond. arxiv:2308.12966 [cs.CV] https://arxiv.org/abs/2308.12966

[3]

Yejin Bang, Samuel Cahyawijaya, Nayeon Lee, Wenliang Dai, Dan Su, Bryan Wilie, Holy Lovenia, Ziwei Ji, Tiehzheng Yu, Willy Chung, Quyet Do, Xu Yan, and Pascale Fung. 2023. A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity. 675–718. https://doi.org/10.18653/v1/2023.ijcnlp-main.45

[4]

Kirsten Boehner, Janet Vertesi, Phoebe Sengers, and Paul Dourish. 2007. How HCI interprets the probes. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’07). Association for Computing Machinery, New York, NY, USA, 1077–1086. https://doi.org/10.1145/1240624.1240789

Digital Library

[5]

Keqin Chen, Zhao Zhang, Weili Zeng, Richong Zhang, Feng Zhu, and Rui Zhao. 2023. Shikra: Unleashing Multimodal LLM’s Referential Dialogue Magic.

[6]

Taizhou Chen, Yi-Shiun Wu, and Zhu Kening. 2018. Investigating different modalities of directional cues for multi-task visual-searching scenario in virtual reality. 1–5. https://doi.org/10.1145/3281505.3281516

Digital Library

[7]

Cathy Edwards. 2024. Circle (or highlight or scribble) to Search. Blog Post. https://blog.google/products/search/google-circle-to-search-android/ Accessed: 2024-05-19.

[8]

BoYu Gao, Tong Shao, Huawei Tu, Qizi Ma, Zitao Liu, and Teng Han. 2024. Exploring Bimanual Haptic Feedback for Spatial Search in Virtual Reality. IEEE transactions on visualization and computer graphics PP (03 2024). https://doi.org/10.1109/TVCG.2024.3372045

Digital Library

[9]

Valentina Gatteschi, Fabrizio Lamberti, Paolo Montuschi, and Andrea Sanna. 2016. Semantics-Based Intelligent Human-Computer Interaction. IEEE Intelligent Systems 31, 4 (2016), 11–21. https://doi.org/10.1109/MIS.2015.97

Digital Library

[10]

Google. 2024. The Circle of Life: Bringing Google Search to Android. https://blog.google/products/search/google-circle-to-search-android/. Accessed: 2024-08-13.

[11]

Kristen Grinyer and Robert J. Teather. 2022. Effects of Field of View on Dynamic Out-of-View Target Search in Virtual Reality. In 2022 IEEE Conference on Virtual Reality and 3D User Interfaces (VR). 139–148. https://doi.org/10.1109/VR51125.2022.00032

[12]

Yining Hong, Haoyu Zhen, Peihao Chen, Shuhong Zheng, Yilun Du, Zhenfang Chen, and Chuang Gan. 2023. 3D-LLM: Injecting the 3D World into Large Language Models. arxiv:2307.12981 [cs.CV] https://arxiv.org/abs/2307.12981

[13]

Sathaporn Hu, Joseph Malloch, and Derek Reilly. 2020. A Comparative Evaluation of Techniques for Locating Out of View Targets in Virtual Reality. In Graphics Interface 2021. https://doi.org/10.20380/GI2021.32

[14]

Ananya Ipsita, Levi Erickson, Yangzi Dong, Joey Huang, Alexa K Bushinski, Sraven Saradhi, Ana M Villanueva, Kylie A Peppler, Thomas S Redick, and Karthik Ramani. 2022. Towards Modeling of Virtual Reality Welding Simulators to Promote Accessible and Scalable Training. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 566, 21 pages. https://doi.org/10.1145/3491102.3517696

Digital Library

[15]

Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of Hallucination in Natural Language Generation. ACM Comput. Surv. 55, 12, Article 248 (mar 2023), 38 pages. https://doi.org/10.1145/3571730

Digital Library

[16]

Salomon Kabongo, Jennifer D’Souza, and Sören Auer. 2024. Effective Context Selection in LLM-based Leaderboard Generation: An Empirical Study. arXiv preprint arXiv:2407.02409 (2024). https://doi.org/10.48550/arXiv.2407.02409

[17]

CHI 2017nic Kao, Alejandra J. Magana, and Christos Mousas. 2021. Evaluating Tutorial-Based Instructions for Controllers in Virtual Reality Games. Proc. ACM Hum.-Comput. Interact. 5, CHI PLAY, Article 234 (oct 2021), 28 pages. https://doi.org/10.1145/3474661

Digital Library

[18]

Oliver Beren Kaul and Michael Rohs. 2017. HapticHead: A Spherical Vibrotactile Grid around the Head for 3D Guidance in Virtual and Augmented Reality. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’17). Association for Computing Machinery, New York, NY, USA, 3729–3740. https://doi.org/10.1145/3025453.3025684

Digital Library

[19]

Jin Young Kim, Mark Cramer, Jaime Teevan, and Dmitry Lagun. 2013. Understanding how people interact with web search results that change in real-time using implicit feedback. In Proceedings of the 22nd ACM International Conference on Information & Knowledge Management (San Francisco, California, USA) (CIKM ’13). Association for Computing Machinery, New York, NY, USA, 2321–2326. https://doi.org/10.1145/2505515.2505663

Digital Library

[20]

Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, and Wenzhe Shi. 2017. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. arxiv:1609.04802 [cs.CV] https://arxiv.org/abs/1609.04802

[21]

Chang Liu, Felicia Fang-Yi Tan, Shengdong Zhao, Abhiram Kanneganti, Gosavi Arundhati Tushar, and Eng Tat Khoo. 2024. Facilitating Virtual Reality Integration in Medical Education: A Case Study of Acceptability and Learning Impact in Childbirth Delivery Training. In Proceedings of the CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 458, 14 pages. https://doi.org/10.1145/3613904.3642100

Digital Library

[22]

Lijia Ma, Xingchen Xu, and Yong Tan. 2024. Crafting Knowledge: Exploring the Creative Mechanisms of Chat-Based Search Engines. arXiv preprint arXiv:2402.19421 (2024). https://doi.org/10.48550/arXiv.2402.1942

[23]

Mark Mcdaniel, Gilles Einstein, Thomas Graham, and Erica Rall. 2004. Delaying execution of intentions: Overcoming the costs of interruptions. Applied Cognitive Psychology 18 (07 2004), 533 – 547. https://doi.org/10.1002/acp.1002

[24]

Ann McNamara, Katherine Boyd, Joanne George, Weston Jones, Somyung Oh, and Annie Suther. 2019. Information Placement in Virtual Reality. In 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR). 1765–1769. https://doi.org/10.1109/VR.2019.8797891

[25]

Victor Adriel Oliveira, Luca Brayda, Luciana Nedel, and Anderson Maciel. 2017. Designing a Vibrotactile Head-Mounted Display for Spatial Awareness in 3D Spaces. IEEE Transactions on Visualization and Computer Graphics PP (01 2017), 1–1. https://doi.org/10.1109/TVCG.2017.2657238

Digital Library

[26]

Shirui Pan, Linhao Luo, Yufei Wang, Chen Chen, Jiapu Wang, and Xindong Wu. 2024. Unifying Large Language Models and Knowledge Graphs: A Roadmap. IEEE Transactions on Knowledge and Data Engineering 36, 7 (2024), 3580–3599. https://doi.org/10.1109/TKDE.2024.3352100

Digital Library

[27]

Kyeong-Beom Park and Jae Yeol Lee. 2016. Comparative Study on the Interface and Interaction for Manipulating 3D Virtual Objects in a Virtual Reality Environment. Transactions of the Society of CAD/CAM Engineers 21 (03 2016), 20–30. https://doi.org/10.7315/CADCAM.2016.020

[28]

Daniele Regazzoni, Caterina Rizzi, and Andrea Vitali. 2018. Virtual reality applications: guidelines to design natural user interface. In International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, Vol. 51739. V01BT02A029. https://doi.org/10.1115/DETC2018-85867

[29]

Maximilian Rettinger, Niklas Müller, Christopher Holzmann-Littig, Marjo Wijnen-Meijer, Gerhard Rigoll, and Christoph Schmaderer. 2021. VR-based Equipment Training for Health Professionals. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI EA ’21). Association for Computing Machinery, New York, NY, USA, Article 252, 6 pages. https://doi.org/10.1145/3411763.3451766

Digital Library

[30]

Rufat Rzayev, Polina Ugnivenko, Sarah Graf, Valentin Schwind, and Niels Henze. 2021. Reading in VR: The Effect of Text Presentation Type and Location. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 531, 10 pages. https://doi.org/10.1145/3411764.3445606

Digital Library

[31]

Jitao Sang, Tao Mei, Ying-Qing Xu, Chen Zhao, Changsheng Xu, and Shipeng Li. 2013. Interaction Design for Mobile Visual Search. IEEE Transactions on Multimedia 15, 7 (2013), 1665–1676. https://doi.org/10.1109/TMM.2013.2268052

Digital Library

[32]

Maurice Schleußinger. 2021. Information retrieval interfaces in virtual reality—A scoping review focused on current generation technology. Plos one 16, 2 (2021), e0246398.

[33]

Maurice Schleußinger. 2021. Information retrieval interfaces in virtual reality-A scoping review focused on current generation technology. PloS one 16 (02 2021), e0246398. https://doi.org/10.1371/journal.pone.0246398

[34]

Nikhil Sharma, Q. Vera Liao, and Ziang Xiao. 2024. Generative Echo Chamber? Effect of LLM-Powered Search Systems on Diverse Information Seeking. In Proceedings of the CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 1033, 17 pages. https://doi.org/10.1145/3613904.3642459

Digital Library

[35]

Xuehua Shen, Bin Tan, and ChengXiang Zhai. 2005. Context-sensitive information retrieval using implicit feedback. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Salvador, Brazil) (SIGIR ’05). Association for Computing Machinery, New York, NY, USA, 43–50. https://doi.org/10.1145/1076034.1076045

Digital Library

[36]

Shashi Kant Singh, Shubham Kumar, and Pawan Singh Mehra. 2023. Chat GPT and Google Bard AI: A Review. In 2023 International Conference on IoT, Communication and Automation Technology (ICICAT). 1–6. https://doi.org/10.1109/ICICAT57735.2023.10263706

[37]

Miriam Sturdee and Joseph Lindley. 2019. Sketching & drawing as future inquiry in HCI. In Proceedings of the Halfway to the Future Symposium 2019. 1–10.

Digital Library

[38]

Ashok Veilumuthu and Parthasarathy Ramachandran. 2007. Discovering Implicit Feedbacks from Search Engine Log Files. (2007), 231–242. https://doi.org/10.1007/978-3-540-75488-6_22

[39]

Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, and Furu Wei. 2024. Large Search Model: Redefining Search Stack in the Era of LLMs. SIGIR Forum 57, 2, Article 23 (jan 2024), 16 pages. https://doi.org/10.1145/3642979.3643006

Digital Library

[40]

Wenhai Wang, Zhe Chen, Xiaokang Chen, Jiannan Wu, Xizhou Zhu, Gang Zeng, Ping Luo, Tong Lu, Jie Zhou, Yu Qiao, 2024. Visionllm: Large language model is also an open-ended decoder for vision-centric tasks. Advances in Neural Information Processing Systems 36 (2024).

[41]

Austin Ward, Sandeep Avula, Hao-Fei Cheng, Sheikh Muhammad Sarwar, Vanessa Murdock, and Eugene Agichtein. 2023. Searching for Products in Virtual Reality: Understanding the Impact of Context and Result Presentation on User Experience. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (Taipei, Taiwan) (SIGIR ’23). Association for Computing Machinery, New York, NY, USA, 2359–2363. https://doi.org/10.1145/3539618.3592057

Digital Library

[42]

Austin Ward and Rob Capra. 2020. Immersive Search: Using Virtual Reality to Examine How a Third Dimension Impacts the Searching Process. 1621–1624. https://doi.org/10.1145/3397271.3401303

Digital Library

[43]

Austin Ward, Yiyin Gu, Sandeep Avula, and Praneeth Chakravarthy. 2021. Interacting with Information in Immersive Virtual Environments. 2600–2604. https://doi.org/10.1145/3404835.3462787

Digital Library

[44]

Frederik Winther, Linoj Ravindran, Kasper Paabøl Svendsen, and Tiare Feuchtner. 2020. Design and Evaluation of a VR Training Simulation for Pump Maintenance Based on a Use Case at Grundfos. In 2020 IEEE Conference on Virtual Reality and 3D User Interfaces (VR). 738–746. https://doi.org/10.1109/VR46266.2020.00097

[45]

Penghao Wu and Saining Xie. 2023. V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs. arxiv:2312.14135 [cs.CV] https://arxiv.org/abs/2312.14135

[46]

Junjielong Xu, Ziang Cui, Yuan Zhao, Xu Zhang, Shilin He, Pinjia He, Liqun Li, Yu Kang, Qingwei Lin, Yingnong Dang, Saravan Rajmohan, and Dongmei Zhang. 2024. UniLog: Automatic Logging via LLM and In-Context Learning. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Association for Computing Machinery, New York, NY, USA, Article 14, 12 pages. https://doi.org/10.1145/3597503.3623326

Digital Library

[47]

Xuhai Xu, Anna Yu, Tanya R. Jonker, Kashyap Todi, Feiyu Lu, Xun Qian, João Marcelo Evangelista Belo, Tianyi Wang, Michelle Li, Aran Mun, Te-Yen Wu, Junxiao Shen, Ting Zhang, Narine Kokhlikyan, Fulton Wang, Paul Sorenson, Sophie Kim, and Hrvoje Benko. 2023. XAIR: A Framework of Explainable AI in Augmented Reality. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 202, 30 pages. https://doi.org/10.1145/3544548.3581500

Digital Library

[48]

Soorim Yang, Hyeong jun Joo, and Jaeho Kim. 2024. Metaverse search system: Architecture, challenges, and potential applications. ICT Express 10, 2 (2024), 431–441. https://doi.org/10.1016/j.icte.2023.12.006

[49]

Zhe-Xin Zhang. 2024. A Design of Interface for Visual-Impaired People to Access Visual Information from Images Featuring Large Language Models and Visual Language Models. In Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems(CHI EA ’24). Association for Computing Machinery, New York, NY, USA, Article 390, 4 pages. https://doi.org/10.1145/3613905.3648648

Digital Library

[50]

Andrew Zhou and Grace Yang. 2018. Minority Report by Lemur: Supporting Search Engine with Virtual Reality. 1329–1332. https://doi.org/10.1145/3209978.3210179

Digital Library

[51]

Paul Zikas, Manos Kamarianakis, Ioanna Kartsonaki, Nick Lydatakis, Steve Kateros, Mike Kentros, Efstratios Geronikolakis, Giannis Evangelou, Achilles Apostolou, Paolo Alejandro Alejandro Catilo, and George Papagiannakis. 2021. Covid-19 - VR Strikes Back: innovative medical VR training. In ACM SIGGRAPH 2021 Immersive Pavilion (Virtual Event, USA) (SIGGRAPH ’21). Association for Computing Machinery, New York, NY, USA, Article 11, 2 pages. https://doi.org/10.1145/3450615.3464546

Digital Library

Index Terms

Toward Facilitating Search in VR With the Assistance of Vision Large Language Models
1. Human-centered computing
  1. Interaction design
    1. Interaction design process and methods
      1. Participatory design

Recommendations

VR Grabbers: Ungrounded Haptic Retargeting for Precision Grabbing Tools
UIST '18: Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology

Haptic feedback in VR is important for realistic simulation in virtual reality. However, recreating the haptic experience for hand tools in VR traditionally requires hardware with precise actuators, adding complexity to the system. We propose Ungrounded ...
Effect of VR technology matureness on VR sickness
Abstract
In this paper relationship of perceived virtual reality (VR) sickness phenomenon with different generations of virtual reality head mounted displays (VR HMD) is presented. Action content type omnidirectional video clip was watched by means of four ...
Exploring the Design of Social VR Experiences with Older Adults
DIS '19: Proceedings of the 2019 on Designing Interactive Systems Conference

There is growing interest in technologies that allow older adults to socialise across geographic boundaries. An emerging technology in this space is social virtual reality (VR). In this paper we report on a series of participatory design workshops that ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

VRST '24: Proceedings of the 30th ACM Symposium on Virtual Reality Software and Technology

October 2024

633 pages

ISBN:9798400705359

DOI:10.1145/3641825

Editors:
Benjamin Weyers
Trier University, Germany
,
Daniel Zielasko
Trier University, Germany
,
Rob Lindeman
University of Canterbury, New Zealand
,
Stefania Serafin
Aalborg University Denmark
,
Eike Langbehn
HAW Hamburg, Germany
,
Victoria Interrante
University of Minnesota, USA
,
Gerd Bruder
University of Central Florida, USA
,
J. Edward Swan II
Mississippi State University, USA
,
Christoph Borst
University of Louisiana at Lafayette, USA
,
Carolin Wienrich
Julius-Maximilians Universität Würzburg, Germany
,
Rebecca Fribourg
Ecole Centrale de Nantes, France

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 October 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Guangzhou-HKUST(GZ) Joint Funding Project
Guangzhou Science and Technology Program City-University Joint Funding Project
HKUST Practice Research with Project title RBM talent cultivation Exploration

Conference

VRST '24

Sponsor:

VRST '24: 30th ACM Symposium on Virtual Reality Software and Technology

October 9 - 11, 2024

Trier, Germany

Acceptance Rates

Overall Acceptance Rate 66 of 254 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
162
Total Downloads

Downloads (Last 12 months)162
Downloads (Last 6 weeks)24

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten