Work in Progress

Exploring Audio Icons for Content-Based Navigation in Voice User Interfaces

Authors:

Jonas Kjeldmand Jensen,

Daniel AshbrookAuthors Info & Claims

CUI '23: Proceedings of the 5th International Conference on Conversational User Interfaces

Article No.: 51, Pages 1 - 9

https://doi.org/10.1145/3571884.3604302

Published: 19 July 2023 Publication History

Abstract

Voice interaction is an increasingly popular technology, allowing users to control devices and applications without the need for physical interaction or ocular attention. Augmented voice playback control features, such as audio icons, have the potential to significantly improve voice navigation for instructional videos. This study evaluates audio icons for improving how-to video navigation in a Wizard-of-Oz-controlled setup with 24 participants assembling a wooden robot using a voice-controlled laptop. Results showed that audio icons helped participants complete the task faster, with fewer voice commands, and higher satisfaction. However, some usability challenges were observed. Significant differences in perceived usability were found between audio icons placed with visual points-of-action and the baseline, but not between the baseline and audio icons at 30-second intervals. These findings provide valuable insights for VUI system researchers and designers to advance the use of audio icons for improving voice interface navigation.

References

[1]

Abdullah X. Ali, Meredith Ringel Morris, and Jacob O. Wobbrock. 2018. Crowdsourcing Similarity Judgments for Agreement Analysis in End-User Elicitation Studies. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology. ACM, Berlin Germany, 177–188. https://doi.org/10.1145/3242587.3242621

Digital Library

[2]

Morteza Behrooz, Sarah Mennicken, Jennifer Thom, Rohit Kumar, and Henriette Cramer. 2019. AUGMENTING MUSIC LISTENING EXPERIENCES ON VOICE ASSISTANTS. (2019), 8.

[3]

Meera M. Blattner, Denise A. Sumikawa, and Robert M. Greenberg. 1989. Earcons and icons: their structure and common design principles. Human-Computer Interaction 4, 1 (March 1989), 11–44. https://doi.org/10.1207/s15327051hci0401_1

Digital Library

[4]

Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative Research in Psychology 3 (Jan. 2006), 77–101. https://doi.org/10.1191/1478088706qp063oa

[5]

John Brooke. 1995. SUS: A quick and dirty usability scale. Usability Eval. Ind. 189 (Nov. 1995).

[6]

João Paulo Cabral and Gerard Bastiaan Remijn. 2019. Auditory icons: Design and physical characteristics. Applied Ergonomics 78 (July 2019), 224–239. https://doi.org/10.1016/j.apergo.2019.02.008

[7]

Julia Cambre and Chinmay Kulkarni. 2020. Methods and Tools for Prototyping Voice Interfaces. In Proceedings of the 2nd Conference on Conversational User Interfaces(CUI ’20). Association for Computing Machinery, New York, NY, USA, 1–4. https://doi.org/10.1145/3405755.3406148

Digital Library

[8]

Minsuk Chang, Mina Huh, and Juho Kim. 2021. RubySlippers: Supporting Content-based Voice Navigation for How-to Videos. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. Number 97. Association for Computing Machinery, New York, NY, USA, 1–14. https://doi.org/10.1145/3411764.3445131

Digital Library

[9]

Minsuk Chang, Oliver Wang, Maneesh Agrawala, and Juho Kim. 2019. How to Design Voice Based Navigation for How-To Videos. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, Glasgow Scotland Uk, 1–11. https://doi.org/10.1145/3290605.3300931

Digital Library

[10]

Peggy Chi, Nathan Frey, Katrina Panovich, and Irfan Essa. 2021. Automatic Instructional Video Creation from a Markdown-Formatted Tutorial. In The 34th Annual ACM Symposium on User Interface Software and Technology. ACM, Virtual Event USA, 677–690. https://doi.org/10.1145/3472749.3474778

Digital Library

[11]

Gennaro Cordasco, Marilena Esposito, Francesco Masucci, Maria Teresa Riviello, Anna Esposito, Gérard Chollet, Stephan Schlögl, Pierrick Milhorat, and Gianni Pelosi. 2014. Assessing Voice User Interfaces: The vassist system prototype. In 2014 5th IEEE Conference on Cognitive Infocommunications (CogInfoCom). 91–96. https://doi.org/10.1109/CogInfoCom.2014.7020425

[12]

Chris Crockford and Harry Agius. 2006. An empirical investigation into user navigation of digital video using the VCR-like control set. International Journal of Human-Computer Studies 64, 4 (April 2006), 340–355. https://doi.org/10.1016/j.ijhcs.2005.08.012

Digital Library

[13]

Ádám Csapó and György Wersényi. 2013. Overview of auditory representations in human-machine interfaces. Comput. Surveys 46, 2 (Dec. 2013), 19:1–19:23. https://doi.org/10.1145/2543581.2543586

Digital Library

[14]

Tilman Dingler, Jeffrey Lindsay, and Bruce N Walker. 2008. LEARNABILTIY OF SOUND CUES FOR ENVIRONMENTAL FEATURES: AUDITORY ICONS, EARCONS, SPEARCONS, AND SPEECH. (2008), 6.

[15]

Pierre Dragicevic, Gonzalo Ramos, Jacobo Bibliowitcz, Derek Nowrouzezahrai, Ravin Balakrishnan, and Karan Singh. 2008. Video browsing by direct manipulation. In Proceeding of the twenty-sixth annual CHI conference on Human factors in computing systems - CHI ’08. ACM Press, Florence, Italy, 237. https://doi.org/10.1145/1357054.1357096

Digital Library

[16]

William Gaver. 1989. The SonicFinder: An Interface That Uses Auditory Icons. Human-Computer Interaction 4, 1 (March 1989), 67–94. https://doi.org/10.1207/s15327051hci0401_3

Digital Library

[17]

Debjyoti Ghosh, Pin Sym Foong, Shan Zhang, and Shengdong Zhao. 2018. Assessing the Utility of the System Usability Scale for Evaluating Voice-based User Interfaces. In Proceedings of the Sixth International Symposium of Chinese CHI(ChineseCHI ’18). Association for Computing Machinery, New York, NY, USA, 11–15. https://doi.org/10.1145/3202667.3204844

Digital Library

[18]

Sandra G. Hart. 2006. Nasa-Task Load Index (NASA-TLX); 20 Years Later. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 50, 9 (Oct. 2006), 904–908. https://doi.org/10.1177/154193120605000909 Publisher: SAGE Publications Inc.

[19]

Adriana Iñiguez Carrillo, Laura Gaytán-Lugo, Miguel Garcia-Ruiz, and Rocio Arellano. 2021. Usability Questionnaires to Evaluate Voice User Interfaces. IEEE Latin America Transactions 100 (March 2021), 4771. https://doi.org/10.1109/TLA.2021.9468439

[20]

Myounghoon Jeon, Thomas M. Gable, Benjamin K. Davison, Michael A. Nees, Jeff Wilson, and Bruce N. Walker. 2015. Menu Navigation With In-Vehicle Technologies: Auditory Menu Cues Improve Dual Task Performance, Preference, and Workload. International Journal of Human-Computer Interaction 31, 1 (Jan. 2015), 1–16. https://doi.org/10.1080/10447318.2014.925774 Publisher: Taylor & Francis Ltd.

[21]

Bridjet Lee and Kasia Muldner. 2020. Instructional Video Design: Investigating the Impact of Monologue- and Dialogue-style Presentations. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM, Honolulu HI USA, 1–12. https://doi.org/10.1145/3313831.3376845

Digital Library

[22]

Justin Matejka, Tovi Grossman, and George Fitzmaurice. 2012. Swift: reducing the effects of latency in online video scrubbing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, Austin Texas USA, 637–646. https://doi.org/10.1145/2207676.2207766

Digital Library

[23]

Justin Matejka, Tovi Grossman, and George Fitzmaurice. 2013. Swifter: improved online video scrubbing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems(CHI ’13). Association for Computing Machinery, New York, NY, USA, 1159–1168. https://doi.org/10.1145/2470654.2466149

Digital Library

[24]

Chelsea Myers, Anushay Furqan, Jessica Nebolsky, Karina Caro, and Jichen Zhu. 2018. Patterns for How Users Overcome Obstacles in Voice User Interfaces. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems(CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–7. https://doi.org/10.1145/3173574.3173580

Digital Library

[25]

Chelsea Myers, Anushay Furqan, and Jichen Zhu. 2018. Adaptable Utterances in Voice User Interfaces to Increase Learnability. (2018), 6.

[26]

Chelsea M. Myers, Anushay Furqan, and Jichen Zhu. 2019. The Impact of User Characteristics and Preferences on Performance with an Unfamiliar Voice User Interface. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems(CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–9. https://doi.org/10.1145/3290605.3300277

Digital Library

[27]

Cuong Nguyen and Feng Liu. 2015. Making Software Tutorial Video Responsive. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems(CHI ’15). Association for Computing Machinery, New York, NY, USA, 1565–1568. https://doi.org/10.1145/2702123.2702209

Digital Library

[28]

Elnaz Nouri, Robert Sim, Adam Fourney, and Ryen W. White. 2020. Step-wise Recommendation for Complex Task Support. In Proceedings of the 2020 Conference on Human Information Interaction and Retrieval. ACM, Vancouver BC Canada, 203–212. https://doi.org/10.1145/3343413.3377964

Digital Library

[29]

Suporn Pongnumkul, Mira Dontcheva, Wilmot Li, Jue Wang, Lubomir Bourdev, Shai Avidan, and Michael F. Cohen. 2011. Pause-and-play: automatically linking screencast video tutorials with applications. In Proceedings of the 24th annual ACM symposium on User interface software and technology(UIST ’11). Association for Computing Machinery, New York, NY, USA, 135–144. https://doi.org/10.1145/2047196.2047213

Digital Library

[30]

Martin Porcheron, Joel E. Fischer, and Stuart Reeves. 2021. Pulling Back the Curtain on the Wizards of Oz. Proceedings of the ACM on Human-Computer Interaction 4, CSCW3 (Jan. 2021), 243:1–243:22. https://doi.org/10.1145/3432942

Digital Library

[31]

Edin Sabic, Scott Mishler, Jing Chen, and Bin Hu. 2017. Recognition of Car Warnings: An Analysis of Various Alert Types. In Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems. ACM, Denver Colorado USA, 2010–2016. https://doi.org/10.1145/3027063.3053149

Digital Library

[32]

Arjun Srinivasan, Mira Dontcheva, Eytan Adar, and Seth Walker. 2019. Discovering natural language commands in multimodal interfaces. In Proceedings of the 24th International Conference on Intelligent User Interfaces(IUI ’19). Association for Computing Machinery, New York, NY, USA, 661–672. https://doi.org/10.1145/3301275.3302292

Digital Library

[33]

Sylvaine Tuncer, Barry Brown, and Oskar Lindwall. 2020. On Pause: How Online Instructional Videos are Used to Achieve Practical Tasks. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM, Honolulu HI USA, 1–12. https://doi.org/10.1145/3313831.3376759

Digital Library

[34]

Alexandra Vtyurina. 2019. Towards Non-Visual Web Search. In Proceedings of the 2019 Conference on Human Information Interaction and Retrieval(CHIIR ’19). Association for Computing Machinery, New York, NY, USA, 429–432. https://doi.org/10.1145/3295750.3298976

Digital Library

[35]

Alexandra Vtyurina and Adam Fourney. 2018. Exploring the Role of Conversational Cues in Guided Task Support with Virtual Assistants. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1–7. https://doi.org/10.1145/3173574.3173782

Digital Library

[36]

Bruce N. Walker, Jeffrey Lindsay, Amanda Nance, Yoko Nakano, Dianne K. Palladino, Tilman Dingler, and Myounghoon Jeon. 2013. Spearcons (Speech-Based Earcons) Improve Navigation Performance in Advanced Auditory Menus. Human Factors 55, 1 (Feb. 2013), 157–182. https://doi.org/10.1177/0018720812450587 Publisher: SAGE Publications Inc.

[37]

Sarah Weir, Juho Kim, Krzysztof Z. Gajos, and Robert C. Miller. 2015. Learnersourcing Subgoal Labels for How-to Videos. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing(CSCW ’15). Association for Computing Machinery, New York, NY, USA, 405–416. https://doi.org/10.1145/2675133.2675219

Digital Library

[38]

Pavani Yalla and Bruce N. Walker. 2008. Advanced auditory menus: design and evaluation of auditory scroll bars. In Proceedings of the 10th international ACM SIGACCESS conference on Computers and accessibility(Assets ’08). Association for Computing Machinery, New York, NY, USA, 105–112. https://doi.org/10.1145/1414471.1414492

Digital Library

[39]

Matin Yarmand, Dongwook Yoon, Samuel Dodson, Ido Roll, and Sidney S. Fels. 2019. "Can you believe [1:21]?!": Content and Time-Based Reference Patterns in Video Comments. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, Glasgow Scotland Uk, 1–12. https://doi.org/10.1145/3290605.3300719

Digital Library

[40]

Yaxi Zhao, Razan Jaber, Donald McMillan, and Cosmin Munteanu. 2022. “Rewind to the Jiggling Meat Part”: Understanding Voice Control of Instructional Videos in Everyday Tasks. In CHI Conference on Human Factors in Computing Systems. ACM, New Orleans LA USA, 1–11. https://doi.org/10.1145/3491102.3502036

Digital Library

Index Terms

Exploring Audio Icons for Content-Based Navigation in Voice User Interfaces
1. Human-centered computing
  1. Interaction design
    1. Empirical studies in interaction design

Recommendations

RubySlippers: Supporting Content-based Voice Navigation for How-to Videos
CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

Directly manipulating the timeline, such as scrubbing for thumbnails, is the standard way of controlling how-to videos. However, when how-to videos involve physical activities, people inconveniently alternate between controlling the video and performing ...
How to Design Voice Based Navigation for How-To Videos
CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems

When watching how-to videos related to physical tasks, users' hands are often occupied by the task, making voice input a natural fit. To better understand the design space of voice interactions for how-to video navigation, we conducted three think-aloud ...
Towards virtualization of user interfaces based on UsiXML
Web3D '05: Proceedings of the tenth international conference on 3D Web technology

A model-based approach is presented for structuring a development process of virtual user interfaces based on UsiXML, a XML-compliant User Interface Description Language. UsiXML provides a Concrete User Interface description that remains independent ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CUI '23: Proceedings of the 5th International Conference on Conversational User Interfaces

July 2023

504 pages

ISBN:9798400700149

DOI:10.1145/3571884

Editors:
Minha Lee
Eindhoven University of Technology, Netherlands
,
Cosmin Munteanu
University of Waterloo, Canada
,
Martin Porcheron
Bold Insight, UK
,
Johanne Trippas
RMIT University, Australia
,
Sarah Theres Völkel
Google, Germany

Copyright © 2023 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2023

Check for updates

Author Tags

Qualifiers

Work in progress
Research
Refereed limited

Conference

CUI '23

Sponsor:

SIGCHI

CUI '23: ACM conference on Conversational User Interfaces

July 19 - 21, 2023

Eindhoven, Netherlands

Acceptance Rates

Overall Acceptance Rate 34 of 100 submissions, 34%

Upcoming Conference

CUI '25

Sponsor:
sigchi

ACM Conversational User Interfaces 2025

July 7 - 9, 2025

Waterloo , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
141
Total Downloads

Downloads (Last 12 months)66
Downloads (Last 6 weeks)3

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten