skip to main content
10.1145/3691573.3691618acmotherconferencesArticle/Chapter ViewAbstractPublication PagessiggraphConference Proceedingsconference-collections
short-paper

Attention Guidance through Video Script: A Case Study of Object Focusing on 360º VR Video Tours

Published: 30 September 2024 Publication History

Abstract

Within the expansive domain of virtual reality (VR), 360º VR videos immerse viewers in a spherical environment, allowing them to explore and interact with the virtual world from all angles. While this video representation offers unparalleled levels of immersion, it often lacks effective methods to guide viewers’ attention toward specific elements within the virtual environment. This paper combines the models Grounding Dino and Segment Anything (SAM) to guide attention by object focusing based on video scripts. As a case study, this work conducts the experiments on a 360º video tour on the University of Reading. The experiment results show that video scripts can improve the user experience in 360º VR Videos Tour by helping in the task of directing the user’s attention.

References

[1]
Haram Choi and Sanghun Nam. 2022. A Study on Attention Attracting Elements of 360-Degree Videos Based on VR Eye-Tracking System. Multimodal Technologies and Interaction 6, 7 (2022). https://doi.org/10.3390/mti6070054
[2]
Fabien Danieau, Antoine Guillo, and Renaud Doré. 2017. Attention guidance for immersive video content in head-mounted displays. In 2017 IEEE Virtual Reality (VR). IEEE, 205–206.
[3]
Esther Guervós, Jaime Jesús Ruiz Alonso, Pablo Pérez García, Juan Alberto Muñoz, César Díaz Martín, and Narciso García Santos. 2019. Using 360 VR video to improve the learning experience in veterinary medicine university degree. (2019).
[4]
Romain Christian Herault, Alisa Lincke, Marcelo Milrad, Elin-Sofie Forsgärde, and Carina Elmqvist. 2018. Using 360-degrees interactive videos in patient trauma treatment education: design, development and evaluation aspects. Smart Learning Environments 5, 1 (2018), 26.
[5]
Sébastien Hillaire, Anatole Lécuyer, Rémi Cozot, and Géry Casiez. 2007. Depth-of-field blur effects for first-person navigation in virtual environments. In Proceedings of the 2007 ACM symposium on VR software and technology. 203–206.
[6]
Sebastien Hillaire, Anatole Lecuyer, Remi Cozot, and Gery Casiez. 2008. Using an Eye-Tracking System to Improve Camera Motions and Depth-of-Field Blur Effects in Virtual Environments. In 2008 IEEE Virtual Reality Conference. 47–50. https://doi.org/10.1109/VR.2008.4480749
[7]
Aishwarya Kamath, Mannat Singh, Yann LeCun, Gabriel Synnaeve, Ishan Misra, and Nicolas Carion. 2021. Mdetr-modulated detection for end-to-end multi-modal understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1780–1790.
[8]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollar, and Ross Girshick. 2023. Segment Anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 4015–4026.
[9]
Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, 2017. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International journal of computer vision 123 (2017), 32–73.
[10]
Liunian Harold Li, Pengchuan Zhang, Haotian Zhang, Jianwei Yang, Chunyuan Li, Yiwu Zhong, Lijuan Wang, Lu Yuan, Lei Zhang, Jenq-Neng Hwang, Kai-Wei Chang, and Jianfeng Gao. 2022. Grounded Language-Image Pre-training. arxiv:2112.03857 [cs.CV]
[11]
Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dollár. 2015. Microsoft COCO: Common Objects in Context. arxiv:1405.0312 [cs.CV]
[12]
Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, 2023. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499 (2023).
[13]
Xing Liu, Qingyang Xiao, Vijay Gopalakrishnan, Bo Han, Feng Qian, and Matteo Varvello. 2017. 360 innovations for panoramic video streaming. In Proceedings of the 16th ACM Workshop on Hot Topics in Networks. 50–56.
[14]
Andrew MacQuarrie and Anthony Steed. 2017. Cinematic virtual reality: Evaluating the effect of display type on the viewing experience for panoramic video. In 2017 IEEE Virtual Reality (VR). IEEE, 45–54.
[15]
Carlos Marañes, Diego Gutierrez, and Ana Serrano. 2020. Exploring the impact of 360 movie cuts in users’ attention. In 2020 IEEE Conference on Virtual Reality and 3D User Interfaces (VR). IEEE, 73–82.
[16]
Bryan A Plummer, Liwei Wang, Chris M Cervantes, Juan C Caicedo, Julia Hockenmaier, and Svetlana Lazebnik. 2015. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. In Proceedings of the IEEE international conference on computer vision. 2641–2649.
[17]
Yeshwanth Pulijala, Minhua Ma, and Ashraf Ayoub. 2017. VR surgery: Interactive virtual reality application for training oral and maxillofacial surgeons using oculus rift and leap motion. Serious Games and Edutainment Applications: Volume II (2017), 187–202.
[18]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
[19]
Anastasia Schmitz, Andrew MacQuarrie, Simon Julier, Nicola Binetti, and Anthony Steed. 2020. Directing versus attracting attention: Exploring the effectiveness of central and peripheral cues in panoramic videos. In 2020 IEEE Conference on Virtual Reality and 3D User Interfaces (VR). IEEE, 63–72.
[20]
Shuai Shao, Zeming Li, Tianyuan Zhang, Chao Peng, Gang Yu, Xiangyu Zhang, Jing Li, and Jian Sun. 2019. Objects365: A Large-Scale, High-Quality Dataset for Object Detection. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 8429–8438. https://doi.org/10.1109/ICCV.2019.00852
[21]
Marco Speicher, Christoph Rosenberg, Donald Degraen, Florian Daiber, and Antonio Krúger. 2019. Exploring visual guidance in 360-degree videos. In Proceedings of the 2019 TVX. 1–12.
[22]
Arthur van Hoff. 2017. Virtual reality and the future of immersive entertainment. In Proceedings of the 2017 TVX. 129–129.
[23]
Jan Oliver Wallgrün, Mahda M Bagher, Pejman Sajjadi, and Alexander Klippel. 2020. A comparison of visual attention guiding approaches for 360 image-based vr tours. In 2020 IEEE Conference on VR and 3D User Interfaces (VR). IEEE, 83–91.
[24]
Jason W Woodworth, Andrew Yoshimura, Nicholas G Lipari, and Christoph W Borst. 2023. Design and Evaluation of Visual Cues for Restoring and Guiding Visual Attention in Eye-Tracked VR. In 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW). IEEE, 442–450.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SVR '24: Proceedings of the 26th Symposium on Virtual and Augmented Reality
September 2024
346 pages
ISBN:9798400709791
DOI:10.1145/3691573
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 September 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. 360º Videos
  2. Attention Guidance
  3. Deep Learning

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Conference

SVR 2024
SVR 2024: Symposium on Virtual and Augmented Reality
September 30 - October 3, 2024
Manaus, Brazil

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 23
    Total Downloads
  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)5
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media