skip to main content
10.1145/3641825.3687742acmconferencesArticle/Chapter ViewAbstractPublication PagesvrstConference Proceedingsconference-collections
research-article

Toward Facilitating Search in VR With the Assistance of Vision Large Language Models

Published: 09 October 2024 Publication History

Abstract

While search is a common need in Virtual Reality (VR) applications, current approaches are cumbersome, often requiring users to type on a mid-air keyboard using controllers in VR or remove VR equipment to search on a computer. We first conducted a literature review and a formative study, identifying six common search needs: knowing about one object, knowing about the object’s partial details, knowing objects with environmental context, knowing about interactions with objects, and finding objects within field of view (FOV) and out of FOV in the VR scene. Informed by these needs, we designed technology probes that leveraged recent advances in Vision Large Language Models and conducted a probe-based study with users to elicit feedback. Based on the findings, we derived design principles for VR designers and developers to consider when designing a user-friendly search interface in VR. While prior work about VR search tended to address specific aspects of search, our work contributes design considerations aimed at enhancing the ease of search in VR and potential future directions.

References

[1]
Bon Adriel Aseniero, Michael Lee, Yi Wang, Qian Zhou, Nastaran Shahmansouri, and Rhys Goldstein. 2024. Experiential Views: Towards Human Experience Evaluation of Designed Spaces using Vision-Language Models. In Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems(CHI EA ’24). Association for Computing Machinery, New York, NY, USA, Article 136, 7 pages. https://doi.org/10.1145/3613905.3650815
[2]
Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, and Jingren Zhou. 2023. Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond. arxiv:2308.12966 [cs.CV] https://arxiv.org/abs/2308.12966
[3]
Yejin Bang, Samuel Cahyawijaya, Nayeon Lee, Wenliang Dai, Dan Su, Bryan Wilie, Holy Lovenia, Ziwei Ji, Tiehzheng Yu, Willy Chung, Quyet Do, Xu Yan, and Pascale Fung. 2023. A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity. 675–718. https://doi.org/10.18653/v1/2023.ijcnlp-main.45
[4]
Kirsten Boehner, Janet Vertesi, Phoebe Sengers, and Paul Dourish. 2007. How HCI interprets the probes. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’07). Association for Computing Machinery, New York, NY, USA, 1077–1086. https://doi.org/10.1145/1240624.1240789
[5]
Keqin Chen, Zhao Zhang, Weili Zeng, Richong Zhang, Feng Zhu, and Rui Zhao. 2023. Shikra: Unleashing Multimodal LLM’s Referential Dialogue Magic.
[6]
Taizhou Chen, Yi-Shiun Wu, and Zhu Kening. 2018. Investigating different modalities of directional cues for multi-task visual-searching scenario in virtual reality. 1–5. https://doi.org/10.1145/3281505.3281516
[7]
Cathy Edwards. 2024. Circle (or highlight or scribble) to Search. Blog Post. https://blog.google/products/search/google-circle-to-search-android/ Accessed: 2024-05-19.
[8]
BoYu Gao, Tong Shao, Huawei Tu, Qizi Ma, Zitao Liu, and Teng Han. 2024. Exploring Bimanual Haptic Feedback for Spatial Search in Virtual Reality. IEEE transactions on visualization and computer graphics PP (03 2024). https://doi.org/10.1109/TVCG.2024.3372045
[9]
Valentina Gatteschi, Fabrizio Lamberti, Paolo Montuschi, and Andrea Sanna. 2016. Semantics-Based Intelligent Human-Computer Interaction. IEEE Intelligent Systems 31, 4 (2016), 11–21. https://doi.org/10.1109/MIS.2015.97
[10]
Google. 2024. The Circle of Life: Bringing Google Search to Android. https://blog.google/products/search/google-circle-to-search-android/. Accessed: 2024-08-13.
[11]
Kristen Grinyer and Robert J. Teather. 2022. Effects of Field of View on Dynamic Out-of-View Target Search in Virtual Reality. In 2022 IEEE Conference on Virtual Reality and 3D User Interfaces (VR). 139–148. https://doi.org/10.1109/VR51125.2022.00032
[12]
Yining Hong, Haoyu Zhen, Peihao Chen, Shuhong Zheng, Yilun Du, Zhenfang Chen, and Chuang Gan. 2023. 3D-LLM: Injecting the 3D World into Large Language Models. arxiv:2307.12981 [cs.CV] https://arxiv.org/abs/2307.12981
[13]
Sathaporn Hu, Joseph Malloch, and Derek Reilly. 2020. A Comparative Evaluation of Techniques for Locating Out of View Targets in Virtual Reality. In Graphics Interface 2021. https://doi.org/10.20380/GI2021.32
[14]
Ananya Ipsita, Levi Erickson, Yangzi Dong, Joey Huang, Alexa K Bushinski, Sraven Saradhi, Ana M Villanueva, Kylie A Peppler, Thomas S Redick, and Karthik Ramani. 2022. Towards Modeling of Virtual Reality Welding Simulators to Promote Accessible and Scalable Training. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 566, 21 pages. https://doi.org/10.1145/3491102.3517696
[15]
Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of Hallucination in Natural Language Generation. ACM Comput. Surv. 55, 12, Article 248 (mar 2023), 38 pages. https://doi.org/10.1145/3571730
[16]
Salomon Kabongo, Jennifer D’Souza, and Sören Auer. 2024. Effective Context Selection in LLM-based Leaderboard Generation: An Empirical Study. arXiv preprint arXiv:2407.02409 (2024). https://doi.org/10.48550/arXiv.2407.02409
[17]
CHI 2017nic Kao, Alejandra J. Magana, and Christos Mousas. 2021. Evaluating Tutorial-Based Instructions for Controllers in Virtual Reality Games. Proc. ACM Hum.-Comput. Interact. 5, CHI PLAY, Article 234 (oct 2021), 28 pages. https://doi.org/10.1145/3474661
[18]
Oliver Beren Kaul and Michael Rohs. 2017. HapticHead: A Spherical Vibrotactile Grid around the Head for 3D Guidance in Virtual and Augmented Reality. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’17). Association for Computing Machinery, New York, NY, USA, 3729–3740. https://doi.org/10.1145/3025453.3025684
[19]
Jin Young Kim, Mark Cramer, Jaime Teevan, and Dmitry Lagun. 2013. Understanding how people interact with web search results that change in real-time using implicit feedback. In Proceedings of the 22nd ACM International Conference on Information & Knowledge Management (San Francisco, California, USA) (CIKM ’13). Association for Computing Machinery, New York, NY, USA, 2321–2326. https://doi.org/10.1145/2505515.2505663
[20]
Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, and Wenzhe Shi. 2017. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. arxiv:1609.04802 [cs.CV] https://arxiv.org/abs/1609.04802
[21]
Chang Liu, Felicia Fang-Yi Tan, Shengdong Zhao, Abhiram Kanneganti, Gosavi Arundhati Tushar, and Eng Tat Khoo. 2024. Facilitating Virtual Reality Integration in Medical Education: A Case Study of Acceptability and Learning Impact in Childbirth Delivery Training. In Proceedings of the CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 458, 14 pages. https://doi.org/10.1145/3613904.3642100
[22]
Lijia Ma, Xingchen Xu, and Yong Tan. 2024. Crafting Knowledge: Exploring the Creative Mechanisms of Chat-Based Search Engines. arXiv preprint arXiv:2402.19421 (2024). https://doi.org/10.48550/arXiv.2402.1942
[23]
Mark Mcdaniel, Gilles Einstein, Thomas Graham, and Erica Rall. 2004. Delaying execution of intentions: Overcoming the costs of interruptions. Applied Cognitive Psychology 18 (07 2004), 533 – 547. https://doi.org/10.1002/acp.1002
[24]
Ann McNamara, Katherine Boyd, Joanne George, Weston Jones, Somyung Oh, and Annie Suther. 2019. Information Placement in Virtual Reality. In 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR). 1765–1769. https://doi.org/10.1109/VR.2019.8797891
[25]
Victor Adriel Oliveira, Luca Brayda, Luciana Nedel, and Anderson Maciel. 2017. Designing a Vibrotactile Head-Mounted Display for Spatial Awareness in 3D Spaces. IEEE Transactions on Visualization and Computer Graphics PP (01 2017), 1–1. https://doi.org/10.1109/TVCG.2017.2657238
[26]
Shirui Pan, Linhao Luo, Yufei Wang, Chen Chen, Jiapu Wang, and Xindong Wu. 2024. Unifying Large Language Models and Knowledge Graphs: A Roadmap. IEEE Transactions on Knowledge and Data Engineering 36, 7 (2024), 3580–3599. https://doi.org/10.1109/TKDE.2024.3352100
[27]
Kyeong-Beom Park and Jae Yeol Lee. 2016. Comparative Study on the Interface and Interaction for Manipulating 3D Virtual Objects in a Virtual Reality Environment. Transactions of the Society of CAD/CAM Engineers 21 (03 2016), 20–30. https://doi.org/10.7315/CADCAM.2016.020
[28]
Daniele Regazzoni, Caterina Rizzi, and Andrea Vitali. 2018. Virtual reality applications: guidelines to design natural user interface. In International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, Vol. 51739. V01BT02A029. https://doi.org/10.1115/DETC2018-85867
[29]
Maximilian Rettinger, Niklas Müller, Christopher Holzmann-Littig, Marjo Wijnen-Meijer, Gerhard Rigoll, and Christoph Schmaderer. 2021. VR-based Equipment Training for Health Professionals. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI EA ’21). Association for Computing Machinery, New York, NY, USA, Article 252, 6 pages. https://doi.org/10.1145/3411763.3451766
[30]
Rufat Rzayev, Polina Ugnivenko, Sarah Graf, Valentin Schwind, and Niels Henze. 2021. Reading in VR: The Effect of Text Presentation Type and Location. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 531, 10 pages. https://doi.org/10.1145/3411764.3445606
[31]
Jitao Sang, Tao Mei, Ying-Qing Xu, Chen Zhao, Changsheng Xu, and Shipeng Li. 2013. Interaction Design for Mobile Visual Search. IEEE Transactions on Multimedia 15, 7 (2013), 1665–1676. https://doi.org/10.1109/TMM.2013.2268052
[32]
Maurice Schleußinger. 2021. Information retrieval interfaces in virtual reality—A scoping review focused on current generation technology. Plos one 16, 2 (2021), e0246398.
[33]
Maurice Schleußinger. 2021. Information retrieval interfaces in virtual reality-A scoping review focused on current generation technology. PloS one 16 (02 2021), e0246398. https://doi.org/10.1371/journal.pone.0246398
[34]
Nikhil Sharma, Q. Vera Liao, and Ziang Xiao. 2024. Generative Echo Chamber? Effect of LLM-Powered Search Systems on Diverse Information Seeking. In Proceedings of the CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 1033, 17 pages. https://doi.org/10.1145/3613904.3642459
[35]
Xuehua Shen, Bin Tan, and ChengXiang Zhai. 2005. Context-sensitive information retrieval using implicit feedback. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Salvador, Brazil) (SIGIR ’05). Association for Computing Machinery, New York, NY, USA, 43–50. https://doi.org/10.1145/1076034.1076045
[36]
Shashi Kant Singh, Shubham Kumar, and Pawan Singh Mehra. 2023. Chat GPT and Google Bard AI: A Review. In 2023 International Conference on IoT, Communication and Automation Technology (ICICAT). 1–6. https://doi.org/10.1109/ICICAT57735.2023.10263706
[37]
Miriam Sturdee and Joseph Lindley. 2019. Sketching & drawing as future inquiry in HCI. In Proceedings of the Halfway to the Future Symposium 2019. 1–10.
[38]
Ashok Veilumuthu and Parthasarathy Ramachandran. 2007. Discovering Implicit Feedbacks from Search Engine Log Files. (2007), 231–242. https://doi.org/10.1007/978-3-540-75488-6_22
[39]
Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, and Furu Wei. 2024. Large Search Model: Redefining Search Stack in the Era of LLMs. SIGIR Forum 57, 2, Article 23 (jan 2024), 16 pages. https://doi.org/10.1145/3642979.3643006
[40]
Wenhai Wang, Zhe Chen, Xiaokang Chen, Jiannan Wu, Xizhou Zhu, Gang Zeng, Ping Luo, Tong Lu, Jie Zhou, Yu Qiao, 2024. Visionllm: Large language model is also an open-ended decoder for vision-centric tasks. Advances in Neural Information Processing Systems 36 (2024).
[41]
Austin Ward, Sandeep Avula, Hao-Fei Cheng, Sheikh Muhammad Sarwar, Vanessa Murdock, and Eugene Agichtein. 2023. Searching for Products in Virtual Reality: Understanding the Impact of Context and Result Presentation on User Experience. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (Taipei, Taiwan) (SIGIR ’23). Association for Computing Machinery, New York, NY, USA, 2359–2363. https://doi.org/10.1145/3539618.3592057
[42]
Austin Ward and Rob Capra. 2020. Immersive Search: Using Virtual Reality to Examine How a Third Dimension Impacts the Searching Process. 1621–1624. https://doi.org/10.1145/3397271.3401303
[43]
Austin Ward, Yiyin Gu, Sandeep Avula, and Praneeth Chakravarthy. 2021. Interacting with Information in Immersive Virtual Environments. 2600–2604. https://doi.org/10.1145/3404835.3462787
[44]
Frederik Winther, Linoj Ravindran, Kasper Paabøl Svendsen, and Tiare Feuchtner. 2020. Design and Evaluation of a VR Training Simulation for Pump Maintenance Based on a Use Case at Grundfos. In 2020 IEEE Conference on Virtual Reality and 3D User Interfaces (VR). 738–746. https://doi.org/10.1109/VR46266.2020.00097
[45]
Penghao Wu and Saining Xie. 2023. V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs. arxiv:2312.14135 [cs.CV] https://arxiv.org/abs/2312.14135
[46]
Junjielong Xu, Ziang Cui, Yuan Zhao, Xu Zhang, Shilin He, Pinjia He, Liqun Li, Yu Kang, Qingwei Lin, Yingnong Dang, Saravan Rajmohan, and Dongmei Zhang. 2024. UniLog: Automatic Logging via LLM and In-Context Learning. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (Lisbon, Portugal) (ICSE ’24). Association for Computing Machinery, New York, NY, USA, Article 14, 12 pages. https://doi.org/10.1145/3597503.3623326
[47]
Xuhai Xu, Anna Yu, Tanya R. Jonker, Kashyap Todi, Feiyu Lu, Xun Qian, João Marcelo Evangelista Belo, Tianyi Wang, Michelle Li, Aran Mun, Te-Yen Wu, Junxiao Shen, Ting Zhang, Narine Kokhlikyan, Fulton Wang, Paul Sorenson, Sophie Kim, and Hrvoje Benko. 2023. XAIR: A Framework of Explainable AI in Augmented Reality. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 202, 30 pages. https://doi.org/10.1145/3544548.3581500
[48]
Soorim Yang, Hyeong jun Joo, and Jaeho Kim. 2024. Metaverse search system: Architecture, challenges, and potential applications. ICT Express 10, 2 (2024), 431–441. https://doi.org/10.1016/j.icte.2023.12.006
[49]
Zhe-Xin Zhang. 2024. A Design of Interface for Visual-Impaired People to Access Visual Information from Images Featuring Large Language Models and Visual Language Models. In Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems(CHI EA ’24). Association for Computing Machinery, New York, NY, USA, Article 390, 4 pages. https://doi.org/10.1145/3613905.3648648
[50]
Andrew Zhou and Grace Yang. 2018. Minority Report by Lemur: Supporting Search Engine with Virtual Reality. 1329–1332. https://doi.org/10.1145/3209978.3210179
[51]
Paul Zikas, Manos Kamarianakis, Ioanna Kartsonaki, Nick Lydatakis, Steve Kateros, Mike Kentros, Efstratios Geronikolakis, Giannis Evangelou, Achilles Apostolou, Paolo Alejandro Alejandro Catilo, and George Papagiannakis. 2021. Covid-19 - VR Strikes Back: innovative medical VR training. In ACM SIGGRAPH 2021 Immersive Pavilion (Virtual Event, USA) (SIGGRAPH ’21). Association for Computing Machinery, New York, NY, USA, Article 11, 2 pages. https://doi.org/10.1145/3450615.3464546

Index Terms

  1. Toward Facilitating Search in VR With the Assistance of Vision Large Language Models

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    VRST '24: Proceedings of the 30th ACM Symposium on Virtual Reality Software and Technology
    October 2024
    633 pages
    ISBN:9798400705359
    DOI:10.1145/3641825
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 October 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. VR search
    2. Virtual reality
    3. participatory design
    4. vision large language model

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • Guangzhou-HKUST(GZ) Joint Funding Project
    • Guangzhou Science and Technology Program City-University Joint Funding Project
    • HKUST Practice Research with Project title RBM talent cultivation Exploration

    Conference

    VRST '24

    Acceptance Rates

    Overall Acceptance Rate 66 of 254 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 162
      Total Downloads
    • Downloads (Last 12 months)162
    • Downloads (Last 6 weeks)24
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media