skip to main content
10.1145/3571884.3604302acmconferencesArticle/Chapter ViewAbstractPublication PagescuiConference Proceedingsconference-collections
Work in Progress

Exploring Audio Icons for Content-Based Navigation in Voice User Interfaces

Published: 19 July 2023 Publication History

Abstract

Voice interaction is an increasingly popular technology, allowing users to control devices and applications without the need for physical interaction or ocular attention. Augmented voice playback control features, such as audio icons, have the potential to significantly improve voice navigation for instructional videos. This study evaluates audio icons for improving how-to video navigation in a Wizard-of-Oz-controlled setup with 24 participants assembling a wooden robot using a voice-controlled laptop. Results showed that audio icons helped participants complete the task faster, with fewer voice commands, and higher satisfaction. However, some usability challenges were observed. Significant differences in perceived usability were found between audio icons placed with visual points-of-action and the baseline, but not between the baseline and audio icons at 30-second intervals. These findings provide valuable insights for VUI system researchers and designers to advance the use of audio icons for improving voice interface navigation.

References

[1]
Abdullah X. Ali, Meredith Ringel Morris, and Jacob O. Wobbrock. 2018. Crowdsourcing Similarity Judgments for Agreement Analysis in End-User Elicitation Studies. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology. ACM, Berlin Germany, 177–188. https://doi.org/10.1145/3242587.3242621
[2]
Morteza Behrooz, Sarah Mennicken, Jennifer Thom, Rohit Kumar, and Henriette Cramer. 2019. AUGMENTING MUSIC LISTENING EXPERIENCES ON VOICE ASSISTANTS. (2019), 8.
[3]
Meera M. Blattner, Denise A. Sumikawa, and Robert M. Greenberg. 1989. Earcons and icons: their structure and common design principles. Human-Computer Interaction 4, 1 (March 1989), 11–44. https://doi.org/10.1207/s15327051hci0401_1
[4]
Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative Research in Psychology 3 (Jan. 2006), 77–101. https://doi.org/10.1191/1478088706qp063oa
[5]
John Brooke. 1995. SUS: A quick and dirty usability scale. Usability Eval. Ind. 189 (Nov. 1995).
[6]
João Paulo Cabral and Gerard Bastiaan Remijn. 2019. Auditory icons: Design and physical characteristics. Applied Ergonomics 78 (July 2019), 224–239. https://doi.org/10.1016/j.apergo.2019.02.008
[7]
Julia Cambre and Chinmay Kulkarni. 2020. Methods and Tools for Prototyping Voice Interfaces. In Proceedings of the 2nd Conference on Conversational User Interfaces(CUI ’20). Association for Computing Machinery, New York, NY, USA, 1–4. https://doi.org/10.1145/3405755.3406148
[8]
Minsuk Chang, Mina Huh, and Juho Kim. 2021. RubySlippers: Supporting Content-based Voice Navigation for How-to Videos. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. Number 97. Association for Computing Machinery, New York, NY, USA, 1–14. https://doi.org/10.1145/3411764.3445131
[9]
Minsuk Chang, Oliver Wang, Maneesh Agrawala, and Juho Kim. 2019. How to Design Voice Based Navigation for How-To Videos. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, Glasgow Scotland Uk, 1–11. https://doi.org/10.1145/3290605.3300931
[10]
Peggy Chi, Nathan Frey, Katrina Panovich, and Irfan Essa. 2021. Automatic Instructional Video Creation from a Markdown-Formatted Tutorial. In The 34th Annual ACM Symposium on User Interface Software and Technology. ACM, Virtual Event USA, 677–690. https://doi.org/10.1145/3472749.3474778
[11]
Gennaro Cordasco, Marilena Esposito, Francesco Masucci, Maria Teresa Riviello, Anna Esposito, Gérard Chollet, Stephan Schlögl, Pierrick Milhorat, and Gianni Pelosi. 2014. Assessing Voice User Interfaces: The vassist system prototype. In 2014 5th IEEE Conference on Cognitive Infocommunications (CogInfoCom). 91–96. https://doi.org/10.1109/CogInfoCom.2014.7020425
[12]
Chris Crockford and Harry Agius. 2006. An empirical investigation into user navigation of digital video using the VCR-like control set. International Journal of Human-Computer Studies 64, 4 (April 2006), 340–355. https://doi.org/10.1016/j.ijhcs.2005.08.012
[13]
Ádám Csapó and György Wersényi. 2013. Overview of auditory representations in human-machine interfaces. Comput. Surveys 46, 2 (Dec. 2013), 19:1–19:23. https://doi.org/10.1145/2543581.2543586
[14]
Tilman Dingler, Jeffrey Lindsay, and Bruce N Walker. 2008. LEARNABILTIY OF SOUND CUES FOR ENVIRONMENTAL FEATURES: AUDITORY ICONS, EARCONS, SPEARCONS, AND SPEECH. (2008), 6.
[15]
Pierre Dragicevic, Gonzalo Ramos, Jacobo Bibliowitcz, Derek Nowrouzezahrai, Ravin Balakrishnan, and Karan Singh. 2008. Video browsing by direct manipulation. In Proceeding of the twenty-sixth annual CHI conference on Human factors in computing systems - CHI ’08. ACM Press, Florence, Italy, 237. https://doi.org/10.1145/1357054.1357096
[16]
William Gaver. 1989. The SonicFinder: An Interface That Uses Auditory Icons. Human-Computer Interaction 4, 1 (March 1989), 67–94. https://doi.org/10.1207/s15327051hci0401_3
[17]
Debjyoti Ghosh, Pin Sym Foong, Shan Zhang, and Shengdong Zhao. 2018. Assessing the Utility of the System Usability Scale for Evaluating Voice-based User Interfaces. In Proceedings of the Sixth International Symposium of Chinese CHI(ChineseCHI ’18). Association for Computing Machinery, New York, NY, USA, 11–15. https://doi.org/10.1145/3202667.3204844
[18]
Sandra G. Hart. 2006. Nasa-Task Load Index (NASA-TLX); 20 Years Later. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 50, 9 (Oct. 2006), 904–908. https://doi.org/10.1177/154193120605000909 Publisher: SAGE Publications Inc.
[19]
Adriana Iñiguez Carrillo, Laura Gaytán-Lugo, Miguel Garcia-Ruiz, and Rocio Arellano. 2021. Usability Questionnaires to Evaluate Voice User Interfaces. IEEE Latin America Transactions 100 (March 2021), 4771. https://doi.org/10.1109/TLA.2021.9468439
[20]
Myounghoon Jeon, Thomas M. Gable, Benjamin K. Davison, Michael A. Nees, Jeff Wilson, and Bruce N. Walker. 2015. Menu Navigation With In-Vehicle Technologies: Auditory Menu Cues Improve Dual Task Performance, Preference, and Workload. International Journal of Human-Computer Interaction 31, 1 (Jan. 2015), 1–16. https://doi.org/10.1080/10447318.2014.925774 Publisher: Taylor & Francis Ltd.
[21]
Bridjet Lee and Kasia Muldner. 2020. Instructional Video Design: Investigating the Impact of Monologue- and Dialogue-style Presentations. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM, Honolulu HI USA, 1–12. https://doi.org/10.1145/3313831.3376845
[22]
Justin Matejka, Tovi Grossman, and George Fitzmaurice. 2012. Swift: reducing the effects of latency in online video scrubbing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, Austin Texas USA, 637–646. https://doi.org/10.1145/2207676.2207766
[23]
Justin Matejka, Tovi Grossman, and George Fitzmaurice. 2013. Swifter: improved online video scrubbing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems(CHI ’13). Association for Computing Machinery, New York, NY, USA, 1159–1168. https://doi.org/10.1145/2470654.2466149
[24]
Chelsea Myers, Anushay Furqan, Jessica Nebolsky, Karina Caro, and Jichen Zhu. 2018. Patterns for How Users Overcome Obstacles in Voice User Interfaces. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems(CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–7. https://doi.org/10.1145/3173574.3173580
[25]
Chelsea Myers, Anushay Furqan, and Jichen Zhu. 2018. Adaptable Utterances in Voice User Interfaces to Increase Learnability. (2018), 6.
[26]
Chelsea M. Myers, Anushay Furqan, and Jichen Zhu. 2019. The Impact of User Characteristics and Preferences on Performance with an Unfamiliar Voice User Interface. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems(CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–9. https://doi.org/10.1145/3290605.3300277
[27]
Cuong Nguyen and Feng Liu. 2015. Making Software Tutorial Video Responsive. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems(CHI ’15). Association for Computing Machinery, New York, NY, USA, 1565–1568. https://doi.org/10.1145/2702123.2702209
[28]
Elnaz Nouri, Robert Sim, Adam Fourney, and Ryen W. White. 2020. Step-wise Recommendation for Complex Task Support. In Proceedings of the 2020 Conference on Human Information Interaction and Retrieval. ACM, Vancouver BC Canada, 203–212. https://doi.org/10.1145/3343413.3377964
[29]
Suporn Pongnumkul, Mira Dontcheva, Wilmot Li, Jue Wang, Lubomir Bourdev, Shai Avidan, and Michael F. Cohen. 2011. Pause-and-play: automatically linking screencast video tutorials with applications. In Proceedings of the 24th annual ACM symposium on User interface software and technology(UIST ’11). Association for Computing Machinery, New York, NY, USA, 135–144. https://doi.org/10.1145/2047196.2047213
[30]
Martin Porcheron, Joel E. Fischer, and Stuart Reeves. 2021. Pulling Back the Curtain on the Wizards of Oz. Proceedings of the ACM on Human-Computer Interaction 4, CSCW3 (Jan. 2021), 243:1–243:22. https://doi.org/10.1145/3432942
[31]
Edin Sabic, Scott Mishler, Jing Chen, and Bin Hu. 2017. Recognition of Car Warnings: An Analysis of Various Alert Types. In Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems. ACM, Denver Colorado USA, 2010–2016. https://doi.org/10.1145/3027063.3053149
[32]
Arjun Srinivasan, Mira Dontcheva, Eytan Adar, and Seth Walker. 2019. Discovering natural language commands in multimodal interfaces. In Proceedings of the 24th International Conference on Intelligent User Interfaces(IUI ’19). Association for Computing Machinery, New York, NY, USA, 661–672. https://doi.org/10.1145/3301275.3302292
[33]
Sylvaine Tuncer, Barry Brown, and Oskar Lindwall. 2020. On Pause: How Online Instructional Videos are Used to Achieve Practical Tasks. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM, Honolulu HI USA, 1–12. https://doi.org/10.1145/3313831.3376759
[34]
Alexandra Vtyurina. 2019. Towards Non-Visual Web Search. In Proceedings of the 2019 Conference on Human Information Interaction and Retrieval(CHIIR ’19). Association for Computing Machinery, New York, NY, USA, 429–432. https://doi.org/10.1145/3295750.3298976
[35]
Alexandra Vtyurina and Adam Fourney. 2018. Exploring the Role of Conversational Cues in Guided Task Support with Virtual Assistants. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1–7. https://doi.org/10.1145/3173574.3173782
[36]
Bruce N. Walker, Jeffrey Lindsay, Amanda Nance, Yoko Nakano, Dianne K. Palladino, Tilman Dingler, and Myounghoon Jeon. 2013. Spearcons (Speech-Based Earcons) Improve Navigation Performance in Advanced Auditory Menus. Human Factors 55, 1 (Feb. 2013), 157–182. https://doi.org/10.1177/0018720812450587 Publisher: SAGE Publications Inc.
[37]
Sarah Weir, Juho Kim, Krzysztof Z. Gajos, and Robert C. Miller. 2015. Learnersourcing Subgoal Labels for How-to Videos. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing(CSCW ’15). Association for Computing Machinery, New York, NY, USA, 405–416. https://doi.org/10.1145/2675133.2675219
[38]
Pavani Yalla and Bruce N. Walker. 2008. Advanced auditory menus: design and evaluation of auditory scroll bars. In Proceedings of the 10th international ACM SIGACCESS conference on Computers and accessibility(Assets ’08). Association for Computing Machinery, New York, NY, USA, 105–112. https://doi.org/10.1145/1414471.1414492
[39]
Matin Yarmand, Dongwook Yoon, Samuel Dodson, Ido Roll, and Sidney S. Fels. 2019. "Can you believe [1:21]?!": Content and Time-Based Reference Patterns in Video Comments. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, Glasgow Scotland Uk, 1–12. https://doi.org/10.1145/3290605.3300719
[40]
Yaxi Zhao, Razan Jaber, Donald McMillan, and Cosmin Munteanu. 2022. “Rewind to the Jiggling Meat Part”: Understanding Voice Control of Instructional Videos in Everyday Tasks. In CHI Conference on Human Factors in Computing Systems. ACM, New Orleans LA USA, 1–11. https://doi.org/10.1145/3491102.3502036

Index Terms

  1. Exploring Audio Icons for Content-Based Navigation in Voice User Interfaces

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CUI '23: Proceedings of the 5th International Conference on Conversational User Interfaces
    July 2023
    504 pages
    ISBN:9798400700149
    DOI:10.1145/3571884
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 July 2023

    Check for updates

    Author Tags

    1. Conversational Interaction
    2. How-to Videos
    3. Non-Linear Instructional Video
    4. Video Navigation
    5. Voice-Based Navigation
    6. Wizard-of-Oz

    Qualifiers

    • Work in progress
    • Research
    • Refereed limited

    Conference

    CUI '23
    Sponsor:
    CUI '23: ACM conference on Conversational User Interfaces
    July 19 - 21, 2023
    Eindhoven, Netherlands

    Acceptance Rates

    Overall Acceptance Rate 34 of 100 submissions, 34%

    Upcoming Conference

    CUI '25
    ACM Conversational User Interfaces 2025
    July 7 - 9, 2025
    Waterloo , ON , Canada

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 141
      Total Downloads
    • Downloads (Last 12 months)66
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 26 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media