short-paper

Attention Guidance through Video Script: A Case Study of Object Focusing on 360º VR Video Tours

Authors:

Paulo Vitor Santana Silva,

Arthur Ricardo Sousa Vitória,

Diogo Fernandes Costa Silva,

Arlindo Rodrigues Galvão FilhoAuthors Info & Claims

SVR '24: Proceedings of the 26th Symposium on Virtual and Augmented Reality

Pages 247 - 251

https://doi.org/10.1145/3691573.3691618

Published: 30 September 2024 Publication History

Abstract

Within the expansive domain of virtual reality (VR), 360º VR videos immerse viewers in a spherical environment, allowing them to explore and interact with the virtual world from all angles. While this video representation offers unparalleled levels of immersion, it often lacks effective methods to guide viewers’ attention toward specific elements within the virtual environment. This paper combines the models Grounding Dino and Segment Anything (SAM) to guide attention by object focusing based on video scripts. As a case study, this work conducts the experiments on a 360º video tour on the University of Reading. The experiment results show that video scripts can improve the user experience in 360º VR Videos Tour by helping in the task of directing the user’s attention.

References

[1]

Haram Choi and Sanghun Nam. 2022. A Study on Attention Attracting Elements of 360-Degree Videos Based on VR Eye-Tracking System. Multimodal Technologies and Interaction 6, 7 (2022). https://doi.org/10.3390/mti6070054

[2]

Fabien Danieau, Antoine Guillo, and Renaud Doré. 2017. Attention guidance for immersive video content in head-mounted displays. In 2017 IEEE Virtual Reality (VR). IEEE, 205–206.

[3]

Esther Guervós, Jaime Jesús Ruiz Alonso, Pablo Pérez García, Juan Alberto Muñoz, César Díaz Martín, and Narciso García Santos. 2019. Using 360 VR video to improve the learning experience in veterinary medicine university degree. (2019).

[4]

Romain Christian Herault, Alisa Lincke, Marcelo Milrad, Elin-Sofie Forsgärde, and Carina Elmqvist. 2018. Using 360-degrees interactive videos in patient trauma treatment education: design, development and evaluation aspects. Smart Learning Environments 5, 1 (2018), 26.

[5]

Sébastien Hillaire, Anatole Lécuyer, Rémi Cozot, and Géry Casiez. 2007. Depth-of-field blur effects for first-person navigation in virtual environments. In Proceedings of the 2007 ACM symposium on VR software and technology. 203–206.

Digital Library

[6]

Sebastien Hillaire, Anatole Lecuyer, Remi Cozot, and Gery Casiez. 2008. Using an Eye-Tracking System to Improve Camera Motions and Depth-of-Field Blur Effects in Virtual Environments. In 2008 IEEE Virtual Reality Conference. 47–50. https://doi.org/10.1109/VR.2008.4480749

[7]

Aishwarya Kamath, Mannat Singh, Yann LeCun, Gabriel Synnaeve, Ishan Misra, and Nicolas Carion. 2021. Mdetr-modulated detection for end-to-end multi-modal understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1780–1790.

[8]

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollar, and Ross Girshick. 2023. Segment Anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 4015–4026.

[9]

Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, 2017. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International journal of computer vision 123 (2017), 32–73.

Digital Library

[10]

Liunian Harold Li, Pengchuan Zhang, Haotian Zhang, Jianwei Yang, Chunyuan Li, Yiwu Zhong, Lijuan Wang, Lu Yuan, Lei Zhang, Jenq-Neng Hwang, Kai-Wei Chang, and Jianfeng Gao. 2022. Grounded Language-Image Pre-training. arxiv:2112.03857 [cs.CV]

[11]

Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dollár. 2015. Microsoft COCO: Common Objects in Context. arxiv:1405.0312 [cs.CV]

[12]

Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, 2023. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499 (2023).

[13]

Xing Liu, Qingyang Xiao, Vijay Gopalakrishnan, Bo Han, Feng Qian, and Matteo Varvello. 2017. 360 innovations for panoramic video streaming. In Proceedings of the 16th ACM Workshop on Hot Topics in Networks. 50–56.

Digital Library

[14]

Andrew MacQuarrie and Anthony Steed. 2017. Cinematic virtual reality: Evaluating the effect of display type on the viewing experience for panoramic video. In 2017 IEEE Virtual Reality (VR). IEEE, 45–54.

[15]

Carlos Marañes, Diego Gutierrez, and Ana Serrano. 2020. Exploring the impact of 360 movie cuts in users’ attention. In 2020 IEEE Conference on Virtual Reality and 3D User Interfaces (VR). IEEE, 73–82.

[16]

Bryan A Plummer, Liwei Wang, Chris M Cervantes, Juan C Caicedo, Julia Hockenmaier, and Svetlana Lazebnik. 2015. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. In Proceedings of the IEEE international conference on computer vision. 2641–2649.

Digital Library

[17]

Yeshwanth Pulijala, Minhua Ma, and Ashraf Ayoub. 2017. VR surgery: Interactive virtual reality application for training oral and maxillofacial surgeons using oculus rift and leap motion. Serious Games and Edutainment Applications: Volume II (2017), 187–202.

[18]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.

[19]

Anastasia Schmitz, Andrew MacQuarrie, Simon Julier, Nicola Binetti, and Anthony Steed. 2020. Directing versus attracting attention: Exploring the effectiveness of central and peripheral cues in panoramic videos. In 2020 IEEE Conference on Virtual Reality and 3D User Interfaces (VR). IEEE, 63–72.

[20]

Shuai Shao, Zeming Li, Tianyuan Zhang, Chao Peng, Gang Yu, Xiangyu Zhang, Jing Li, and Jian Sun. 2019. Objects365: A Large-Scale, High-Quality Dataset for Object Detection. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 8429–8438. https://doi.org/10.1109/ICCV.2019.00852

[21]

Marco Speicher, Christoph Rosenberg, Donald Degraen, Florian Daiber, and Antonio Krúger. 2019. Exploring visual guidance in 360-degree videos. In Proceedings of the 2019 TVX. 1–12.

Digital Library

[22]

Arthur van Hoff. 2017. Virtual reality and the future of immersive entertainment. In Proceedings of the 2017 TVX. 129–129.

Digital Library

[23]

Jan Oliver Wallgrün, Mahda M Bagher, Pejman Sajjadi, and Alexander Klippel. 2020. A comparison of visual attention guiding approaches for 360 image-based vr tours. In 2020 IEEE Conference on VR and 3D User Interfaces (VR). IEEE, 83–91.

[24]

Jason W Woodworth, Andrew Yoshimura, Nicholas G Lipari, and Christoph W Borst. 2023. Design and Evaluation of Visual Cues for Restoring and Guiding Visual Attention in Eye-Tracked VR. In 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW). IEEE, 442–450.

Index Terms

Attention Guidance through Video Script: A Case Study of Object Focusing on 360º VR Video Tours

Index terms have been assigned to the content through auto-classification.

Recommendations

Sailing through Signs: Attention Guidance towards Texts in Virtual Reality
FDG '23: Proceedings of the 18th International Conference on the Foundations of Digital Games

Virtual reality (VR) devices have technically matured during the last few years. Apart from offering better resolutions and a wider field of view, these devices provide novel interaction forms that resemble real-world interactions. Players can explore ...
HiveFive: Immersion Preserving Attention Guidance in Virtual Reality
CHI '20: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems

Recent advances in Virtual Reality (VR) technology, such as larger fields of view, have made VR increasingly immersive. However, a larger field of view often results in a user focusing on certain directions and missing relevant content presented ...
Comparing VR and Desktop 360 Video Museum Tours
MUM '22: Proceedings of the 21st International Conference on Mobile and Ubiquitous Multimedia

We investigate the user experience of taking a remote museum tour with 360 video technologies. We compare the experience of viewing a 360 video feed on a laptop screen vs. a 360 virtual reality (VR) video experienced through a head-mounted display (HMD)...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

SVR '24: Proceedings of the 26th Symposium on Virtual and Augmented Reality

September 2024

346 pages

ISBN:9798400709791

DOI:10.1145/3691573

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 September 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Conference

SVR 2024

SVR 2024: Symposium on Virtual and Augmented Reality

September 30 - October 3, 2024

Manaus, Brazil

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
23
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)5

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten