skip to main content
10.1145/3532106.3533455acmconferencesArticle/Chapter ViewAbstractPublication PagesdisConference Proceedingsconference-collections
research-article

Losing Its Touch: Understanding User Perception of Multimodal Interaction and Smart Assistance

Published: 13 June 2022 Publication History

Abstract

Intelligent Personal Assistants (IPA) are advertised as reliable companions in the everyday life to simplify household tasks. Due to speech-based usability issues, users struggle to deeply engage with current systems. The capabilities of newer generations of standalone devices are even extended by a display, also to address some weaknesses like memorizing auditive information. So far, it is unclear how the potential of a multimodal experience is realized by designers and appropriated by users. Therefore, we observed 20 participants in a controlled setting, planning a dinner with the help of an audio-visual-based IPA, namely Alexa Echo Show. Our study reveals ambiguous mental models of perceived and experienced device capabilities, leading to confusion. Meanwhile, the additional visual output channel could not counterbalance the weaknesses of voice interaction. Finally, we aim to illustrate users’ conceptual understandings of IPAs and provide implications to rethink audiovisual output for voice-first standalone devices.

References

[1]
Amazon Europe Core S.à r.l.2019. Amazon.de: Essen & trinken: Alexa Skills. Amazon Europe Core S.à r.l. https://www.amazon.de/s?bbn=10068461031&rh=n%3A10068460031%2Cn%3A%2110068461031%2Cn%3A10536643031&dc&fst=as%3Aoff&qid=1553770362&rnid=10068461031&ref=lp_10068460031_nr_n_2
[2]
Tawfiq Ammari, Jofish Kaye, Janice Y. Tsai, and Frank Bentley. 2019. Music, Search, and IoT: How people (really) use voice assistants. ACM Trans. Comput. Interact. 26, 3 (2019), 17 – 28. https://doi.org/10.1145/3311956
[3]
Mark Billinghurst. 1998. Put That Where? Voice and Gesture at the Graphics Interface. Comput. Graph. 32, 4 (nov 1998), 60–63. https://doi.org/10.1145/307710.307730
[4]
Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology 3, 2 (2006), 77–101.
[5]
Michael Burmester, Katharina Zeiner, Katharina Schippert, and Axel Platz. 2019. Creating Positive Experiences with Digital Companions. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI EA ’19). Association for Computing Machinery, New York, NY, USA, 1–6. https://doi.org/10.1145/3290607.3312821
[6]
Yujie Chen, Zhifei Mao, and Jack Linchuan Qiu. 2018. Super-sticky WeChat and Chinese society(1 ed.). Emerald Publishing Limited, Howard House, Wagon Lane, Bingley BD16 1WA, UK.
[7]
Minji Cho, Sang-su Lee, and Kun-Pyo Lee. 2019. Once a Kind Friend is Now a Thing: Understanding How Conversational Agents at Home Are Forgotten. In Proceedings of the 2019 on Designing Interactive Systems Conference (San Diego, CA, USA) (DIS ’19). Association for Computing Machinery, New York, NY, USA, 1557–1569. https://doi.org/10.1145/3322276.3322332
[8]
Leigh Clark, Philip Doyle, Diego Garaialde, Emer Gilmartin, Stephan Schlögl, Jens Edlund, Matthew Aylett, João Cabral, Cosmin Munteanu, Justin Edwards, and Benjamin R Cowan. 2019. The State of Speech in HCI: Trends, Themes and Challenges. Interact. Comput. 31, 4 (dec 2019), 349–371. https://doi.org/10.1093/iwc/iwz016 arxiv:1810.06828
[9]
Leigh Clark, Cosmin Munteanu, Vincent Wade, Benjamin R. Cowan, Nadia Pantidi, Orla Cooney, Philip Doyle, Diego Garaialde, Justin Edwards, Brendan Spillane, Emer Gilmartin, and Christine Murad. 2019. What Makes a Good Conversation?. In Proc. 2019 CHI Conf. Hum. Factors Comput. Syst. - CHI ’19. ACM Press, New York, New York, USA, 1–12. https://doi.org/10.1145/3290605.3300705 arxiv:1901.06525
[10]
Eric Corbett and Astrid Weber. 2016. What can I say?. In Proc. 18th Int. Conf. Human-Computer Interact. with Mob. Devices Serv. - MobileHCI ’16. ACM Press, New York, New York, USA, 72–82. https://doi.org/10.1145/2935334.2935386
[11]
Benjamin R. Cowan, Nadia Pantidi, David Coyle, Kellie Morrissey, Peter Clarke, Sara Al-Shehri, David Earley, and Natasha Bandeira. 2017. ”What can i help you with?”: Infrequent users’ experiences of intelligent personal assistants. In Proc. 19th Int. Conf. Human-Computer Interact. with Mob. Devices Serv. MobileHCI 2017. ACM Press, New York, New York, USA, 1–12. https://doi.org/10.1145/3098279.3098539
[12]
Kajetan Enge, Alexander Rind, Michael Iber, Robert Höldrich, and Wolfgang Aigner. 2021. It’s about Time: Adopting Theoretical Constructs from Visualization for Sonification. In Audio Most. 2021. ACM, New York, NY, USA, 64–71. https://doi.org/10.1145/3478384.3478415
[13]
Elena Gatti and Christina Richter. 2019. WeChat – Die chinesische Super-App. Springer Fachmedien Wiesbaden, Wiesbaden, 23–30. https://doi.org/10.1007/978-3-658-18692-0_3
[14]
Jonathan Grudin and Richard Jacques. 2019. Chatbots, Humbots, and the Quest for Artificial General Intelligence. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3290605.3300439
[15]
Jonathan Grudin and Richard Jacques. 2019. Chatbots, Humbots, and the Quest for Artificial General Intelligence. In Proc. 2019 CHI Conf. Hum. Factors Comput. Syst. - CHI ’19. ACM Press, New York, New York, USA, 1–11. https://doi.org/10.1145/3290605.3300439
[16]
Mohit Jain, Pratyush Kumar, Ramachandra Kota, and Shwetak N. Patel. 2018. Evaluating and Informing the Design of Chatbots. In Proceedings of the 2018 Designing Interactive Systems Conference (Hong Kong, China) (DIS ’18). Association for Computing Machinery, New York, NY, USA, 895–906. https://doi.org/10.1145/3196709.3196735
[17]
Simon A Kingaby. 2021. Data-Driven Alexa Skills: Voice Access to Rich Data Sources for Enterprise Applications (1 ed.). Springer, La Vergne, TN, USA.
[18]
Veronika Krauß, Florian Jasche, Sheree May Saßmannshausen, Thomas Ludwig, and Alexander Boden. 2021. Research and Practice Recommendations for Mixed Reality Design – Different Perspectives from the Community. In Proceedings of the 27th ACM Symposium on Virtual Reality Software and Technology (Osaka, Japan) (VRST ’21). Association for Computing Machinery, New York, NY, USA, Article 24, 13 pages. https://doi.org/10.1145/3489849.3489876
[19]
Hannah Limerick, James W. Moore, and David Coyle. 2015. Empirical Evidence for a Diminished Sense of Agency in Speech Interfaces. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (Seoul, Republic of Korea) (CHI ’15). Association for Computing Machinery, New York, NY, USA, 3967–3970. https://doi.org/10.1145/2702123.2702379
[20]
Weiyuan Liu. 2010. Natural user interface- next mainstream product user interface. In 2010 IEEE 11th International Conference on Computer-Aided Industrial Design Conceptual Design 1, Vol. 1. Institute of Electrical and Electronics Engineers, Yiwu, China, 203–205. https://doi.org/10.1109/CAIDCD.2010.5681374
[21]
Gustavo López, Luis Quesada, and Luis A. Guerrero. 2018. Alexa vs. Siri vs. Cortana vs. Google Assistant: A Comparison of Speech-Based Natural User Interfaces. In Advances in Human Factors and Systems Interaction, Isabel L. Nunes (Ed.). Springer International Publishing, Cham, 241–250.
[22]
Ewa Luger and Abigail Sellen. 2016. ”Like Having a Really Bad PA”: The Gulf between User Expectation and Experience of Conversational Agents. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’16). Association for Computing Machinery, New York, NY, USA, 5286–5297. https://doi.org/10.1145/2858036.2858288
[23]
Cosmin Munteanu and Gerald Penn. 2018. Speech and hands-free interaction: Myths, challenges, and opportunities. In Conf. Hum. Factors Comput. Syst. - Proc., Vol. 2018-April. ACM Press, New York, New York, USA, 1–4. https://doi.org/10.1145/3170427.3170660
[24]
Christine Murad, Cosmin Munteanu, Benjamin R. Cowan, and Leigh Clark. 2019. Revolution or Evolution? Speech Interaction and HCI Design Guidelines. IEEE Pervasive Comput. 18, 2 (apr 2019), 33–45. https://doi.org/10.1109/MPRV.2019.2906991
[25]
Chelsea Myers, Anushay Furqan, Jessica Nebolsky, Karina Caro, and Jichen Zhu. 2018. Patterns for how users overcome obstacles in Voice User Interfaces. In Conf. Hum. Factors Comput. Syst. - Proc., Vol. 2018-April. ACM Press, New York, New York, USA, 1–7. https://doi.org/10.1145/3173574.3173580
[26]
Chelsea M. Myers, Anushay Furqan, and Jichen Zhu. 2019. The impact of user characteristics and preferences on performance with an unfamiliar voice user interface. In Conf. Hum. Factors Comput. Syst. - Proc.ACM Press, New York, New York, USA, 1–9. https://doi.org/10.1145/3290605.3300277
[27]
David B Nieborg and Anne Helmond. 2019. The political economy of Facebook’s platformization in the mobile ecosystem: Facebook Messenger as a platform instance. Media, Culture & Society 41, 2 (2019), 196–218.
[28]
Donald A Norman. 2010. Natural user interfaces are not natural. interactions 17, 3 (May 2010), 6-10. URL: http://doi. acm. org/10.1145/1744161.1744163, doi 10(2010), 1744161–1744163.
[29]
Sharon Oviatt. 1996. Multimodal Interfaces for Dynamic Interactive Maps. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vancouver, British Columbia, Canada) (CHI ’96). Association for Computing Machinery, New York, NY, USA, 95–102. https://doi.org/10.1145/238386.238438
[30]
Sharon Oviatt, Rachel Coulston, and Rebecca Lunsford. 2004. When Do We Interact Multimodally? Cognitive Load and Multimodal Communication Patterns. In Proceedings of the 6th International Conference on Multimodal Interfaces (State College, PA, USA) (ICMI ’04). Association for Computing Machinery, New York, NY, USA, 129–136. https://doi.org/10.1145/1027933.1027957
[31]
Martin Porcheron, Joel E. Fischer, Stuart Reeves, and Sarah Sharples. 2018. Voice Interfaces in Everyday Life. In Proc. 2018 CHI Conf. Hum. Factors Comput. Syst. - CHI ’18, Vol. 2018-April. ACM Press, New York, New York, USA, 1–12. https://doi.org/10.1145/3173574.3174214
[32]
Martin Porcheron, Joel E. Fischer, and Sarah Sharples. 2017. ”Do animals have accents?”: Talking with agents in multi-party conversation. In Proc. ACM Conf. Comput. Support. Coop. Work. CSCW. ACM Press, New York, New York, USA, 207–219. https://doi.org/10.1145/2998181.2998298
[33]
Alex Sciuto, Arnita Saini, Jodi Forlizzi, and Jason I. Hong. 2018. ”Hey Alexa, What’s Up?”: A Mixed-Methods Studies of In-Home Conversational Agent Usage. In Proceedings of the 2018 Designing Interactive Systems Conference (Hong Kong, China) (DIS ’18). Association for Computing Machinery, New York, NY, USA, 857–868. https://doi.org/10.1145/3196709.3196772
[34]
Ben Shneiderman. 2000. The limits of speech recognition. Commun. ACM 43, 9 (2000), 63–65.
[35]
James Simpson. 2020. Are CUIs Just GUIs with Speech Bubbles?. In Proc. 2nd Conf. Conversational User Interfaces. ACM, New York, NY, USA, 1–3. https://doi.org/10.1145/3405755.3406143
[36]
Marc Steinberg. 2020. LINE as super app: Platformization in East Asia. Social Media+ Society 6, 2 (2020), 2056305120933285.
[37]
Strategy Analytics. 2019. Strategy Analytics: Prime Day Smart Speaker Sales Boost Keeps Amazon Well Ahead of the Chasing Pack in Q3 2019. Strategy Analytics. https://news.strategyanalytics.com/press-releases/press-release-details/2019/Strategy-Analytics-Prime-Day-Smart-Speaker-Sales-Boost-Keeps-Amazon-Well-Ahead-of-the-Chasing-Pack-in-Q3-2019/default.aspx
[38]
Bruce N Walker and Gregory Kramer. 2004. Ecological psychoacoustics and auditory displays: Hearing, grouping, and meaning making. In Ecological psychoacoustics. Brill, Leiden, Niederlande, 149–174.
[39]
Ryen W. White. 2018. Skill discovery in virtual assistants. Commun. ACM 61, 11 (oct 2018), 106–113. https://doi.org/10.1145/3185336

Cited By

View all
  • (2024)Do I Just Tap My Headset?Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314517:4(1-28)Online publication date: 12-Jan-2024
  • (2024)Digitale GestaltungVerbraucherinformatik10.1007/978-3-662-68706-2_6(261-300)Online publication date: 25-Mar-2024
  • (2023)Getting the Residents’ Attention: The Perception of Warning Channels in Smart Home Warning SystemsProceedings of the 2023 ACM Designing Interactive Systems Conference10.1145/3563657.3596076(1114-1127)Online publication date: 10-Jul-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DIS '22: Proceedings of the 2022 ACM Designing Interactive Systems Conference
June 2022
1947 pages
ISBN:9781450393584
DOI:10.1145/3532106
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Alexa Skills
  2. Household Companion
  3. Intelligent Personal Assistant
  4. Smart Display
  5. Voice Assistants
  6. multimodal Interaction

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

DIS '22
Sponsor:
DIS '22: Designing Interactive Systems Conference
June 13 - 17, 2022
Virtual Event, Australia

Acceptance Rates

Overall Acceptance Rate 1,158 of 4,684 submissions, 25%

Upcoming Conference

DIS '25
Designing Interactive Systems Conference
July 5 - 9, 2025
Funchal , Portugal

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)69
  • Downloads (Last 6 weeks)2
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Do I Just Tap My Headset?Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314517:4(1-28)Online publication date: 12-Jan-2024
  • (2024)Digitale GestaltungVerbraucherinformatik10.1007/978-3-662-68706-2_6(261-300)Online publication date: 25-Mar-2024
  • (2023)Getting the Residents’ Attention: The Perception of Warning Channels in Smart Home Warning SystemsProceedings of the 2023 ACM Designing Interactive Systems Conference10.1145/3563657.3596076(1114-1127)Online publication date: 10-Jul-2023
  • (2023)Designing an Interaction Concept for Assisted Cooking in Smart Kitchens: Focus on Human Agency, Proactivity, and MultimodalityProceedings of the 2023 ACM Designing Interactive Systems Conference10.1145/3563657.3595975(1128-1144)Online publication date: 10-Jul-2023
  • (2023)“Foggy sounds like nothing” — enriching the experience of voice assistants with sonic overlaysPersonal and Ubiquitous Computing10.1007/s00779-023-01722-327:5(1927-1947)Online publication date: 6-Jun-2023
  • (2023)Research on Emotional Design Strategies of Voice Interaction on Smartphones: A Case Study of College Students’ Use of Smart PhonesHCI International 2023 Posters10.1007/978-3-031-35989-7_12(101-108)Online publication date: 9-Jul-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media