skip to main content
10.1145/3526113.3545613acmconferencesArticle/Chapter ViewAbstractPublication PagesuistConference Proceedingsconference-collections

OmniScribe: Authoring Immersive Audio Descriptions for 360° Videos

Published: 28 October 2022 Publication History


Blind people typically access videos via audio descriptions (AD) crafted by sighted describers who comprehend, select, and describe crucial visual content in the videos. 360° video is an emerging storytelling medium that enables immersive experiences that people may not possibly reach in everyday life. However, the omnidirectional nature of 360° videos makes it challenging for describers to perceive the holistic visual content and interpret spatial information that is essential to create immersive ADs for blind people. Through a formative study with a professional describer, we identified key challenges in describing 360° videos and iteratively designed OmniScribe, a system that supports the authoring of immersive ADs for 360° videos. OmniScribe uses AI-generated content-awareness overlays for describers to better grasp 360° video content. Furthermore, OmniScribe enables describers to author spatial AD and immersive labels for blind users to consume the videos immersively with our mobile prototype. In a study with 11 professional and novice describers, we demonstrated the value of OmniScribe in the authoring workflow; and a study with 8 blind participants revealed the promise of immersive AD over standard AD for 360° videos. Finally, we discuss the implications of promoting 360° video accessibility.


BlindSquare. 2022. BlindSquare.
Carmen J Branje and Deborah I Fels. 2012. Livedescribe: can amateur describers create high-quality audio description?Journal of Visual Impairment & Blindness 106, 3 (2012), 154–165.
Edoardo D’Atri, Carlo Maria Medaglia, Alexandru Serbanati, Ugo Biader Ceipidor, Emanuele Panizzi, and Alessandro D’Atri. 2007. A system to aid blind people in the mobility: A usability test and its results. In Second International Conference on Systems (ICONS’07). IEEE, Institute of Electrical and Electronics Engineers, New York, NY, USA, 35–35.
Tawanna R Dillahunt, Alex Jiahong Lu, Aarti Israni, Ruchita Lodha, Savana Brewer, Tiera S Robinson, Angela Brown Wilson, and Earnest Wheeler. 2022. The Village: Infrastructuring Community-Based Mentoring to Support Adults Experiencing Poverty. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 574, 17 pages.
David Doukhan, Jean Carrive, Félicien Vallet, Anthony Larcher, and Sylvain Meignier. 2018. An open-source speaker gender detection framework for monitoring gender equality. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, Institute of Electrical and Electronics Engineers, New York, NY, USA, 5214–5218.
Equal Entry. 2022. Audio Descriptions for 360 Degree Video: Best Practices.
Anita Fidyka and Anna Matamala. 2018. Audio description in 360º videos: Results from focus groups in Barcelona and Kraków. Translation Spaces 7, 2 (2018), 285–303.
Anita Fidyka and Anna Matamala. 2021. Retelling narrative in 360° videos: Implications for audio description. Translation Studies 14, 3 (2021), 298–312. arXiv:
Langis Gagnon, Claude Chapdelaine, David Byrns, Samuel Foucher, Maguelonne Heritier, and Vishwa Gupta. 2010. A computer-vision-assisted system for videodescription scripting. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops. IEEE, Institute of Electrical and Electronics Engineers, New York, NY, USA, 41–48.
Zheng Ge, Songtao Liu, Feng Wang, Zeming Li, and Jian Sun. 2021. YOLOX: Exceeding YOLO Series in 2021.
Cole Gleason, Amy Pavel, Emma McCamey, Christina Low, Patrick Carrington, Kris M. Kitani, and Jeffrey P. Bigham. 2020. Twitter A11y: A Browser Extension to Make Twitter Images Accessible. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1–12.
José Luis González-Mora, A Rodriguez-Hernandez, Enrique Burunat, F Martin, and Miguel A Castellano. 2006. Seeing the world by hearing: Virtual Acoustic Space (VAS) a new space perception system for blind people. In 2006 2nd International Conference on Information & Communication Technologies, Vol. 1. IEEE, Institute of Electrical and Electronics Engineers, New York, NY, USA, 837–842.
Anhong Guo, Saige McVea, Xu Wang, Patrick Clary, Ken Goldman, Yang Li, Yu Zhong, and Jeffrey P. Bigham. 2018. Investigating Cursor-Based Interactions to Support Non-Visual Exploration in the Real World. In Proceedings of the 20th International ACM SIGACCESS Conference on Computers and Accessibility (Galway, Ireland) (ASSETS ’18). Association for Computing Machinery, New York, NY, USA, 3–14.
Jaylin Herskovitz, Jason Wu, Samuel White, Amy Pavel, Gabriel Reyes, Anhong Guo, and Jeffrey P. Bigham. 2020. Making Mobile Augmented Reality Applications Accessible. In The 22nd International ACM SIGACCESS Conference on Computers and Accessibility (Virtual Event, Greece) (ASSETS ’20). Association for Computing Machinery, New York, NY, USA, Article 3, 14 pages.
Hou-Ning Hu, Yen-Chen Lin, Ming-Yu Liu, Hsien-Tzu Cheng, Yung-Ju Chang, and Min Sun. 2017. Deep 360 Pilot: Learning a Deep Agent for Piloting through 360° Sports Videos. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Institute of Electrical and Electronics Engineers, New York, NY, USA, 1396–1405.
The Smith-Kettlewell Eye Research Institute. 2022. YouDescribe.
Masatomo Kobayashi, Trisha O’Connell, Bryan Gould, Hironobu Takagi, and Chieko Asakawa. 2010. Are Synthesized Video Descriptions Acceptable?. In Proceedings of the 12th International ACM SIGACCESS Conference on Computers and Accessibility (Orlando, Florida, USA) (ASSETS ’10). Association for Computing Machinery, New York, NY, USA, 163–170.
A. Lecuyer, P. Mobuchon, C. Megard, J. Perret, C. Andriot, and J.-P. Colinot. 2003. HOMERE: a multimodal system for visually impaired people to explore virtual environments. In IEEE Virtual Reality, 2003. Proceedings.Institute of Electrical and Electronics Engineers, New York, NY, USA, 251–258.
Cheuk Yin Phipson Lee, Zhuohao Zhang, Jaylin Herskovitz, JooYoung Seo, and Anhong Guo. 2022. CollabAlly: Accessible Collaboration Awareness in Document Editing. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 596, 17 pages.
Min Seok Lee, WooSeok Shin, and Sung Won Han. 2022. TRACER: Extreme Attention Guided Salient Object Tracing Network (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence 36, 11 (Jun. 2022), 12993–12994.
Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. 2021. SwinIR: Image Restoration Using Swin Transformer.
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In Computer Vision – ECCV 2014, David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Springer International Publishing, Cham, 740–755.
Yen-Chen Lin, Yung-Ju Chang, Hou-Ning Hu, Hsien-Tzu Cheng, Chi-Wen Huang, and Min Sun. 2017. Tell Me Where to Look: Investigating Ways for Assisting Focus in 360° Video. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’17). Association for Computing Machinery, New York, NY, USA, 2535–2545.
Yung-Ta Lin, Yi-Chi Liao, Shan-Yuan Teng, Yi-Ju Chung, Liwei Chan, and Bing-Yu Chen. 2017. Outside-In: Visualizing Out-of-Sight Regions-of-Interest in a 360° Video Using Spatial Picture-in-Picture Previews. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology (Québec City, QC, Canada) (UIST ’17). Association for Computing Machinery, New York, NY, USA, 255–265.
Xingyu Liu, Patrick Carrington, Xiang ‘Anthony’ Chen, and Amy Pavel. 2021. What Makes Videos Accessible to Blind and Visually Impaired People?. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 272, 14 pages.
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows.
Shachar Maidenbaum, Shelly Levy-Tzedek, Daniel-Robert Chebat, and Amir Amedi. 2013. Increasing accessibility to the blind of virtual environments, using a virtual mobility aid based on the” EyeCane”: Feasibility study. PloS one 8, 8 (2013), e72555.
Microsoft. 2022. Microsoft Soundscape.
Mario Montagud, Issac Fraile, Juan A. Nuñez, and Sergi Fernández. 2018. ImAc: Enabling Immersive, Accessible and Personalized Media Experiences. In Proceedings of the 2018 ACM International Conference on Interactive Experiences for TV and Online Video (SEOUL, Republic of Korea) (TVX ’18). Association for Computing Machinery, New York, NY, USA, 245–250.
Mario Montagud, Pilar Orero, and Sergi Fernández. 2020. Immersive media and accessibility: hand in hand to the future. ITU (2020).
Mario Montagud, Pilar Orero, and Anna Matamala. 2020. Culture 4 all: accessibility-enabled cultural experiences through immersive VR360 content. Personal and Ubiquitous Computing 24, 6 (2020), 887–905.
Vishnu Nair, Jay L Karp, Samuel Silverman, Mohar Kalra, Hollis Lehv, Faizan Jamil, and Brian A. Smith. 2021. NavStick: Making Video Games Blind-Accessible via the Ability to Look Around. In The 34th Annual ACM Symposium on User Interface Software and Technology. Association for Computing Machinery, New York, NY, USA, 538–551.
Vishnu Nair, Shao-en Ma, Hannah Huddleston, Karen Lin, Mason Hayes, Matthew Donnelly, Ricardo E Gonzalez, Yicheng He, and Brian A. Smith. 2021. Towards a Generalized Acoustic Minimap for Visually Impaired Gamers. In The Adjunct Publication of the 34th Annual ACM Symposium on User Interface Software and Technology. Association for Computing Machinery, New York, NY, USA, 89–91.
Rosiana Natalie, Ebrima Jarjue, Hernisa Kacorri, and Kotaro Hara. 2020. ViScene: A Collaborative Authoring Tool for Scene Descriptions in Videos. In The 22nd International ACM SIGACCESS Conference on Computers and Accessibility (Virtual Event, Greece) (ASSETS ’20). Association for Computing Machinery, New York, NY, USA, Article 87, 4 pages.
Rosiana Natalie, Jolene Loh, Huei Suen Tan, Joshua Tseng, Ian Luke Yi-Ren Chan, Ebrima H Jarjue, Hernisa Kacorri, and Kotaro Hara. 2021. The Efficacy of Collaborative Authoring of Video Scene Descriptions. In The 23rd International ACM SIGACCESS Conference on Computers and Accessibility (Virtual Event, USA) (ASSETS ’21). Association for Computing Machinery, New York, NY, USA, Article 17, 15 pages.
American Council of the Blind. 2022. The Audio Description Project.
Amy Pavel, Björn Hartmann, and Maneesh Agrawala. 2017. Shot Orientation Controls for Interactive Cinematography with 360 Video. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology(Québec City, QC, Canada) (UIST ’17). Association for Computing Machinery, New York, NY, USA, 289–297.
Amy Pavel, Gabriel Reyes, and Jeffrey P. Bigham. 2020. Rescribe: Authoring and Automatically Editing Audio Descriptions. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. Association for Computing Machinery, New York, NY, USA, 747–759.
Daisuke Sato, Uran Oh, Kakuya Naito, Hironobu Takagi, Kris Kitani, and Chieko Asakawa. 2017. NavCog3: An Evaluation of a Smartphone-Based Blind Indoor Navigation Assistant with Semantic Features in a Large-Scale Environment. In Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility(Baltimore, Maryland, USA) (ASSETS ’17). Association for Computing Machinery, New York, NY, USA, 270–279.
Alexa F. Siu, Mike Sinclair, Robert Kovacs, Eyal Ofek, Christian Holz, and Edward Cutrell. 2020. Virtual Reality Without Vision: A Haptic and Auditory White Cane to Navigate Complex Virtual Worlds. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1–13.
Tomáš Souček and Jakub Lokoč. 2020. TransNet V2: An effective deep network architecture for fast shot transition detection.
Yu-Chuan Su, Dinesh Jayaraman, and Kristen Grauman. 2016. Pano2Vid: Automatic Cinematography for Watching 360° Videos.
Virgil Tiponut, Zoltan Haraszy, Daniel Ianchis, and Ioan Lie. 2008. Acoustic Virtual Reality Performing Man-Machine Interfacing of the Blind. In Proceedings of the 12th WSEAS International Conference on Systems (Heraklion, Greece) (ICS’08). World Scientific and Engineering Academy and Society (WSEAS), Stevens Point, Wisconsin, USA, 345–349.
MA Torres-Gil, O Casanova-Gonzalez, and José Luis González-Mora. 2010. Applications of virtual reality for visually impaired people. WSEAS transactions on computers 9, 2 (2010), 184–193.
World Wide Web Consortium (W3C). 2022. Audio Description or Media Alternative.
World Wide Web Consortium (W3C). 2022. Providing a movie with extended audio descriptions.
World Wide Web Consortium (W3C). 2022. W3C Image Concepts.
World Wide Web Consortium (W3C). 2022. XR Accessibility User Requirements.
Miao Wang, Yi-Jun Li, Wen-Xuan Zhang, Christian Richardt, and Shi-Min Hu. 2020. Transitioning360: Content-aware NFoV Virtual Camera Paths for 360° Video Playback. In 2020 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). Institute of Electrical and Electronics Engineers, New York, NY, USA, 185–194.
Beste F. Yuksel, Soo Jung Kim, Seung Jung Jin, Joshua Junhee Lee, Pooyan Fazli, Umang Mathur, Vaishali Bisht, Ilmi Yoon, Yue-Ting Siu, and Joshua A. Miele. 2020. Increasing Video Accessibility for Visually Impaired Users with Human-in-the-Loop Machine Learning. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI EA ’20). Association for Computing Machinery, New York, NY, USA, 1–9.
Mingrui Ray Zhang, Mingyuan Zhong, and Jacob O. Wobbrock. 2022. Ga11y: An Automated GIF Annotation System for Visually Impaired Users. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 197, 16 pages.
Yifu Zhang, Peize Sun, Yi Jiang, Dongdong Yu, Fucheng Weng, Zehuan Yuan, Ping Luo, Wenyu Liu, and Xinggang Wang. 2021. ByteTrack: Multi-Object Tracking by Associating Every Detection Box.
Yuhang Zhao, Cynthia L. Bennett, Hrvoje Benko, Edward Cutrell, Christian Holz, Meredith Ringel Morris, and Mike Sinclair. 2018. Enabling People with Visual Impairments to Navigate Virtual Reality with a Haptic and Auditory Cane Simulation. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1–14.
Yuhang Zhao, Edward Cutrell, Christian Holz, Meredith Ringel Morris, Eyal Ofek, and Andrew D. Wilson. 2019. SeeingVR: A Set of Tools to Make Virtual Reality More Accessible to People with Low Vision. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1–14.
Yu Zhong, Walter S. Lasecki, Erin Brady, and Jeffrey P. Bigham. 2015. RegionSpeak: Quick Comprehensive Spatial Descriptions of Complex Images for Blind Users. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (Seoul, Republic of Korea) (CHI ’15). Association for Computing Machinery, New York, NY, USA, 2353–2362.

Cited By

View all
  • (2024)Musical Performances in Virtual Reality with Spatial and View-Dependent Audio Descriptions for Blind and Low-Vision UsersProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3688492(1-5)Online publication date: 27-Oct-2024
  • (2024)Towards Accessible Musical Performances in Virtual Reality: Designing a Conceptual Framework for Omnidirectional Audio DescriptionsProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675618(1-17)Online publication date: 27-Oct-2024
  • (2024)Audio Description CustomizationProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675617(1-19)Online publication date: 27-Oct-2024
  • Show More Cited By

Index Terms

  1. OmniScribe: Authoring Immersive Audio Descriptions for 360° Videos



      Information & Contributors


      Published In

      cover image ACM Conferences
      UIST '22: Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology
      October 2022
      1363 pages
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].



      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 28 October 2022


      Request permissions for this article.

      Check for updates

      Author Tags

      1. 360° video
      2. Blind
      3. accessibility
      4. audio description
      5. computer vision
      6. mobile
      7. multimedia
      8. sonification
      9. virtual reality
      10. visual impairment


      • Research-article
      • Research
      • Refereed limited


      UIST '22

      Acceptance Rates

      Overall Acceptance Rate 561 of 2,567 submissions, 22%

      Upcoming Conference

      UIST '25
      The 38th Annual ACM Symposium on User Interface Software and Technology
      September 28 - October 1, 2025
      Busan , Republic of Korea


      Other Metrics

      Bibliometrics & Citations


      Article Metrics

      • Downloads (Last 12 months)176
      • Downloads (Last 6 weeks)22
      Reflects downloads up to 12 Feb 2025

      Other Metrics


      Cited By

      View all
      • (2024)Musical Performances in Virtual Reality with Spatial and View-Dependent Audio Descriptions for Blind and Low-Vision UsersProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3688492(1-5)Online publication date: 27-Oct-2024
      • (2024)Towards Accessible Musical Performances in Virtual Reality: Designing a Conceptual Framework for Omnidirectional Audio DescriptionsProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675618(1-17)Online publication date: 27-Oct-2024
      • (2024)Audio Description CustomizationProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675617(1-19)Online publication date: 27-Oct-2024
      • (2024)Auptimize: Optimal Placement of Spatial Audio Cues for Extended RealityProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676424(1-14)Online publication date: 13-Oct-2024
      • (2024)WorldScribe: Towards Context-Aware Live Visual DescriptionsProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676375(1-18)Online publication date: 13-Oct-2024
      • (2024)SoundShift: Exploring Sound Manipulations for Accessible Mixed-Reality AwarenessProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661556(116-132)Online publication date: 1-Jul-2024
      • (2024)Making Short-Form Videos Accessible with Hierarchical Video SummariesProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642839(1-17)Online publication date: 11-May-2024
      • (2024)“It’s Kind of Context Dependent”: Understanding Blind and Low Vision People’s Video Accessibility Preferences Across Viewing ScenariosProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642238(1-20)Online publication date: 11-May-2024
      • (2024)Review of AI Technologies for Enhancing the Lives of Visually Impaired Individuals: Applications, Outcomes, and Future Directions2024 International Conference on Information and Communication Technology for Development for Africa (ICT4DA)10.1109/ICT4DA62874.2024.10777268(241-246)Online publication date: 18-Nov-2024
      • (2023)Opportunities for Accessible Virtual Reality Design for Immersive Musical Performances for Blind and Low-Vision PeopleProceedings of the 2023 ACM Symposium on Spatial User Interaction10.1145/3607822.3614540(1-21)Online publication date: 13-Oct-2023
      • Show More Cited By

      View Options

      Login options

      View options


      View or Download as a PDF file.



      View online with eReader.


      HTML Format

      View this article in HTML Format.

      HTML Format






      Share this Publication link

      Share on social media