research-article

Losing Its Touch: Understanding User Perception of Multimodal Interaction and Smart Assistance

Authors:

Margarita Esau,

Veronika Krauß,

Gunnar StevensAuthors Info & Claims

DIS '22: Proceedings of the 2022 ACM Designing Interactive Systems Conference

Pages 1288 - 1299

https://doi.org/10.1145/3532106.3533455

Published: 13 June 2022 Publication History

Abstract

Intelligent Personal Assistants (IPA) are advertised as reliable companions in the everyday life to simplify household tasks. Due to speech-based usability issues, users struggle to deeply engage with current systems. The capabilities of newer generations of standalone devices are even extended by a display, also to address some weaknesses like memorizing auditive information. So far, it is unclear how the potential of a multimodal experience is realized by designers and appropriated by users. Therefore, we observed 20 participants in a controlled setting, planning a dinner with the help of an audio-visual-based IPA, namely Alexa Echo Show. Our study reveals ambiguous mental models of perceived and experienced device capabilities, leading to confusion. Meanwhile, the additional visual output channel could not counterbalance the weaknesses of voice interaction. Finally, we aim to illustrate users’ conceptual understandings of IPAs and provide implications to rethink audiovisual output for voice-first standalone devices.

References

[1]

Amazon Europe Core S.à r.l.2019. Amazon.de: Essen & trinken: Alexa Skills. Amazon Europe Core S.à r.l. https://www.amazon.de/s?bbn=10068461031&rh=n%3A10068460031%2Cn%3A%2110068461031%2Cn%3A10536643031&dc&fst=as%3Aoff&qid=1553770362&rnid=10068461031&ref=lp_10068460031_nr_n_2

[2]

Tawfiq Ammari, Jofish Kaye, Janice Y. Tsai, and Frank Bentley. 2019. Music, Search, and IoT: How people (really) use voice assistants. ACM Trans. Comput. Interact. 26, 3 (2019), 17 – 28. https://doi.org/10.1145/3311956

Digital Library

[3]

Mark Billinghurst. 1998. Put That Where? Voice and Gesture at the Graphics Interface. Comput. Graph. 32, 4 (nov 1998), 60–63. https://doi.org/10.1145/307710.307730

Digital Library

[4]

Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology 3, 2 (2006), 77–101.

[5]

Michael Burmester, Katharina Zeiner, Katharina Schippert, and Axel Platz. 2019. Creating Positive Experiences with Digital Companions. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI EA ’19). Association for Computing Machinery, New York, NY, USA, 1–6. https://doi.org/10.1145/3290607.3312821

Digital Library

[6]

Yujie Chen, Zhifei Mao, and Jack Linchuan Qiu. 2018. Super-sticky WeChat and Chinese society(1 ed.). Emerald Publishing Limited, Howard House, Wagon Lane, Bingley BD16 1WA, UK.

[7]

Minji Cho, Sang-su Lee, and Kun-Pyo Lee. 2019. Once a Kind Friend is Now a Thing: Understanding How Conversational Agents at Home Are Forgotten. In Proceedings of the 2019 on Designing Interactive Systems Conference (San Diego, CA, USA) (DIS ’19). Association for Computing Machinery, New York, NY, USA, 1557–1569. https://doi.org/10.1145/3322276.3322332

Digital Library

[8]

Leigh Clark, Philip Doyle, Diego Garaialde, Emer Gilmartin, Stephan Schlögl, Jens Edlund, Matthew Aylett, João Cabral, Cosmin Munteanu, Justin Edwards, and Benjamin R Cowan. 2019. The State of Speech in HCI: Trends, Themes and Challenges. Interact. Comput. 31, 4 (dec 2019), 349–371. https://doi.org/10.1093/iwc/iwz016 arxiv:1810.06828

[9]

Leigh Clark, Cosmin Munteanu, Vincent Wade, Benjamin R. Cowan, Nadia Pantidi, Orla Cooney, Philip Doyle, Diego Garaialde, Justin Edwards, Brendan Spillane, Emer Gilmartin, and Christine Murad. 2019. What Makes a Good Conversation?. In Proc. 2019 CHI Conf. Hum. Factors Comput. Syst. - CHI ’19. ACM Press, New York, New York, USA, 1–12. https://doi.org/10.1145/3290605.3300705 arxiv:1901.06525

Digital Library

[10]

Eric Corbett and Astrid Weber. 2016. What can I say?. In Proc. 18th Int. Conf. Human-Computer Interact. with Mob. Devices Serv. - MobileHCI ’16. ACM Press, New York, New York, USA, 72–82. https://doi.org/10.1145/2935334.2935386

Digital Library

[11]

Benjamin R. Cowan, Nadia Pantidi, David Coyle, Kellie Morrissey, Peter Clarke, Sara Al-Shehri, David Earley, and Natasha Bandeira. 2017. ”What can i help you with?”: Infrequent users’ experiences of intelligent personal assistants. In Proc. 19th Int. Conf. Human-Computer Interact. with Mob. Devices Serv. MobileHCI 2017. ACM Press, New York, New York, USA, 1–12. https://doi.org/10.1145/3098279.3098539

Digital Library

[12]

Kajetan Enge, Alexander Rind, Michael Iber, Robert Höldrich, and Wolfgang Aigner. 2021. It’s about Time: Adopting Theoretical Constructs from Visualization for Sonification. In Audio Most. 2021. ACM, New York, NY, USA, 64–71. https://doi.org/10.1145/3478384.3478415

Digital Library

[13]

Elena Gatti and Christina Richter. 2019. WeChat – Die chinesische Super-App. Springer Fachmedien Wiesbaden, Wiesbaden, 23–30. https://doi.org/10.1007/978-3-658-18692-0_3

[14]

Jonathan Grudin and Richard Jacques. 2019. Chatbots, Humbots, and the Quest for Artificial General Intelligence. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3290605.3300439

Digital Library

[15]

Jonathan Grudin and Richard Jacques. 2019. Chatbots, Humbots, and the Quest for Artificial General Intelligence. In Proc. 2019 CHI Conf. Hum. Factors Comput. Syst. - CHI ’19. ACM Press, New York, New York, USA, 1–11. https://doi.org/10.1145/3290605.3300439

Digital Library

[16]

Mohit Jain, Pratyush Kumar, Ramachandra Kota, and Shwetak N. Patel. 2018. Evaluating and Informing the Design of Chatbots. In Proceedings of the 2018 Designing Interactive Systems Conference (Hong Kong, China) (DIS ’18). Association for Computing Machinery, New York, NY, USA, 895–906. https://doi.org/10.1145/3196709.3196735

Digital Library

[17]

Simon A Kingaby. 2021. Data-Driven Alexa Skills: Voice Access to Rich Data Sources for Enterprise Applications (1 ed.). Springer, La Vergne, TN, USA.

[18]

Veronika Krauß, Florian Jasche, Sheree May Saßmannshausen, Thomas Ludwig, and Alexander Boden. 2021. Research and Practice Recommendations for Mixed Reality Design – Different Perspectives from the Community. In Proceedings of the 27th ACM Symposium on Virtual Reality Software and Technology (Osaka, Japan) (VRST ’21). Association for Computing Machinery, New York, NY, USA, Article 24, 13 pages. https://doi.org/10.1145/3489849.3489876

Digital Library

[19]

Hannah Limerick, James W. Moore, and David Coyle. 2015. Empirical Evidence for a Diminished Sense of Agency in Speech Interfaces. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (Seoul, Republic of Korea) (CHI ’15). Association for Computing Machinery, New York, NY, USA, 3967–3970. https://doi.org/10.1145/2702123.2702379

Digital Library

[20]

Weiyuan Liu. 2010. Natural user interface- next mainstream product user interface. In 2010 IEEE 11th International Conference on Computer-Aided Industrial Design Conceptual Design 1, Vol. 1. Institute of Electrical and Electronics Engineers, Yiwu, China, 203–205. https://doi.org/10.1109/CAIDCD.2010.5681374

[21]

Gustavo López, Luis Quesada, and Luis A. Guerrero. 2018. Alexa vs. Siri vs. Cortana vs. Google Assistant: A Comparison of Speech-Based Natural User Interfaces. In Advances in Human Factors and Systems Interaction, Isabel L. Nunes (Ed.). Springer International Publishing, Cham, 241–250.

[22]

Ewa Luger and Abigail Sellen. 2016. ”Like Having a Really Bad PA”: The Gulf between User Expectation and Experience of Conversational Agents. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’16). Association for Computing Machinery, New York, NY, USA, 5286–5297. https://doi.org/10.1145/2858036.2858288

Digital Library

[23]

Cosmin Munteanu and Gerald Penn. 2018. Speech and hands-free interaction: Myths, challenges, and opportunities. In Conf. Hum. Factors Comput. Syst. - Proc., Vol. 2018-April. ACM Press, New York, New York, USA, 1–4. https://doi.org/10.1145/3170427.3170660

Digital Library

[24]

Christine Murad, Cosmin Munteanu, Benjamin R. Cowan, and Leigh Clark. 2019. Revolution or Evolution? Speech Interaction and HCI Design Guidelines. IEEE Pervasive Comput. 18, 2 (apr 2019), 33–45. https://doi.org/10.1109/MPRV.2019.2906991

Digital Library

[25]

Chelsea Myers, Anushay Furqan, Jessica Nebolsky, Karina Caro, and Jichen Zhu. 2018. Patterns for how users overcome obstacles in Voice User Interfaces. In Conf. Hum. Factors Comput. Syst. - Proc., Vol. 2018-April. ACM Press, New York, New York, USA, 1–7. https://doi.org/10.1145/3173574.3173580

Digital Library

[26]

Chelsea M. Myers, Anushay Furqan, and Jichen Zhu. 2019. The impact of user characteristics and preferences on performance with an unfamiliar voice user interface. In Conf. Hum. Factors Comput. Syst. - Proc.ACM Press, New York, New York, USA, 1–9. https://doi.org/10.1145/3290605.3300277

Digital Library

[27]

David B Nieborg and Anne Helmond. 2019. The political economy of Facebook’s platformization in the mobile ecosystem: Facebook Messenger as a platform instance. Media, Culture & Society 41, 2 (2019), 196–218.

[28]

Donald A Norman. 2010. Natural user interfaces are not natural. interactions 17, 3 (May 2010), 6-10. URL: http://doi. acm. org/10.1145/1744161.1744163, doi 10(2010), 1744161–1744163.

Digital Library

[29]

Sharon Oviatt. 1996. Multimodal Interfaces for Dynamic Interactive Maps. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vancouver, British Columbia, Canada) (CHI ’96). Association for Computing Machinery, New York, NY, USA, 95–102. https://doi.org/10.1145/238386.238438

Digital Library

[30]

Sharon Oviatt, Rachel Coulston, and Rebecca Lunsford. 2004. When Do We Interact Multimodally? Cognitive Load and Multimodal Communication Patterns. In Proceedings of the 6th International Conference on Multimodal Interfaces (State College, PA, USA) (ICMI ’04). Association for Computing Machinery, New York, NY, USA, 129–136. https://doi.org/10.1145/1027933.1027957

Digital Library

[31]

Martin Porcheron, Joel E. Fischer, Stuart Reeves, and Sarah Sharples. 2018. Voice Interfaces in Everyday Life. In Proc. 2018 CHI Conf. Hum. Factors Comput. Syst. - CHI ’18, Vol. 2018-April. ACM Press, New York, New York, USA, 1–12. https://doi.org/10.1145/3173574.3174214

Digital Library

[32]

Martin Porcheron, Joel E. Fischer, and Sarah Sharples. 2017. ”Do animals have accents?”: Talking with agents in multi-party conversation. In Proc. ACM Conf. Comput. Support. Coop. Work. CSCW. ACM Press, New York, New York, USA, 207–219. https://doi.org/10.1145/2998181.2998298

Digital Library

[33]

Alex Sciuto, Arnita Saini, Jodi Forlizzi, and Jason I. Hong. 2018. ”Hey Alexa, What’s Up?”: A Mixed-Methods Studies of In-Home Conversational Agent Usage. In Proceedings of the 2018 Designing Interactive Systems Conference (Hong Kong, China) (DIS ’18). Association for Computing Machinery, New York, NY, USA, 857–868. https://doi.org/10.1145/3196709.3196772

Digital Library

[34]

Ben Shneiderman. 2000. The limits of speech recognition. Commun. ACM 43, 9 (2000), 63–65.

Digital Library

[35]

James Simpson. 2020. Are CUIs Just GUIs with Speech Bubbles?. In Proc. 2nd Conf. Conversational User Interfaces. ACM, New York, NY, USA, 1–3. https://doi.org/10.1145/3405755.3406143

Digital Library

[36]

Marc Steinberg. 2020. LINE as super app: Platformization in East Asia. Social Media+ Society 6, 2 (2020), 2056305120933285.

[37]

Strategy Analytics. 2019. Strategy Analytics: Prime Day Smart Speaker Sales Boost Keeps Amazon Well Ahead of the Chasing Pack in Q3 2019. Strategy Analytics. https://news.strategyanalytics.com/press-releases/press-release-details/2019/Strategy-Analytics-Prime-Day-Smart-Speaker-Sales-Boost-Keeps-Amazon-Well-Ahead-of-the-Chasing-Pack-in-Q3-2019/default.aspx

[38]

Bruce N Walker and Gregory Kramer. 2004. Ecological psychoacoustics and auditory displays: Hearing, grouping, and meaning making. In Ecological psychoacoustics. Brill, Leiden, Niederlande, 149–174.

[39]

Ryen W. White. 2018. Skill discovery in virtual assistants. Commun. ACM 61, 11 (oct 2018), 106–113. https://doi.org/10.1145/3185336

Digital Library

Cited By

Khurana AGlueck MChilana P(2024)Do I Just Tap My Headset?Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314517:4(1-28)Online publication date: 12-Jan-2024
https://dl.acm.org/doi/10.1145/3631451
Esau-Held MKrauß VEssing B(2024)Digitale GestaltungVerbraucherinformatik10.1007/978-3-662-68706-2_6(261-300)Online publication date: 25-Mar-2024
https://doi.org/10.1007/978-3-662-68706-2_6
Haesler SWendelborn MReuter C(2023)Getting the Residents’ Attention: The Perception of Warning Channels in Smart Home Warning SystemsProceedings of the 2023 ACM Designing Interactive Systems Conference10.1145/3563657.3596076(1114-1127)Online publication date: 10-Jul-2023
https://dl.acm.org/doi/10.1145/3563657.3596076
Show More Cited By

Recommendations

Extending chatterbot system into multimodal interaction framework with embodied contextual understanding
HRI '12: Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction

This work aims to realize multimodal interaction with embodied contextual understanding based on the simple chatterbot system. A system framework is proposed to integrate the dialogue system into a 3D simulation platform, SIGVerse to attain multimodal ...
Multimodal interaction: A suitable strategy for including older users?

The major promise of multimodal user interfaces for older users is that they have the choice to select the input modality (or combination of modalities) that best fits their needs and capabilities. Two studies investigated if multimodal interfaces with ...
Multimodal interaction and believability: how can we design and evaluate the next generation of IPA?
HCI '17: Proceedings of the 31st British Computer Society Human Computer Interaction Conference

Believability of Intelligent personal assistants (IPA) has proven to be an important building block of successful human-agent interaction. Yet, only a handful of studies have focused on proposing and validating possible approaches to enhance such ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DIS '22: Proceedings of the 2022 ACM Designing Interactive Systems Conference

June 2022

1947 pages

ISBN:9781450393584

DOI:10.1145/3532106

Editors:
Florian `Floyd' Mueller
Monash University, Melbourne, Australia
,
Stefan Greuter
Deakin University, Melbourne, Australia
,
Rohit Ashok Khot
RMIT University, Melbourne, Australia
,
Penny Sweetser
The Australian National University, Canberra, Australia
,
Marianna Obrist
University College London, UK

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

DIS '22

Sponsor:

SIGCHI

DIS '22: Designing Interactive Systems Conference

June 13 - 17, 2022

Virtual Event, Australia

Acceptance Rates

Overall Acceptance Rate 1,158 of 4,684 submissions, 25%

Upcoming Conference

DIS '25

Sponsor:
sigchi

Designing Interactive Systems Conference

July 5 - 9, 2025

Funchal , Portugal

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
340
Total Downloads

Downloads (Last 12 months)69
Downloads (Last 6 weeks)2

Reflects downloads up to 25 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Khurana AGlueck MChilana P(2024)Do I Just Tap My Headset?Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314517:4(1-28)Online publication date: 12-Jan-2024
https://dl.acm.org/doi/10.1145/3631451
Esau-Held MKrauß VEssing B(2024)Digitale GestaltungVerbraucherinformatik10.1007/978-3-662-68706-2_6(261-300)Online publication date: 25-Mar-2024
https://doi.org/10.1007/978-3-662-68706-2_6
Haesler SWendelborn MReuter C(2023)Getting the Residents’ Attention: The Perception of Warning Channels in Smart Home Warning SystemsProceedings of the 2023 ACM Designing Interactive Systems Conference10.1145/3563657.3596076(1114-1127)Online publication date: 10-Jul-2023
https://dl.acm.org/doi/10.1145/3563657.3596076
Weber JEsau-Held MSchiller MThaden EManstetten DStevens G(2023)Designing an Interaction Concept for Assisted Cooking in Smart Kitchens: Focus on Human Agency, Proactivity, and MultimodalityProceedings of the 2023 ACM Designing Interactive Systems Conference10.1145/3563657.3595975(1128-1144)Online publication date: 10-Jul-2023
https://dl.acm.org/doi/10.1145/3563657.3595975
Esau-Held MMarsh AKrauß VStevens G(2023)“Foggy sounds like nothing” — enriching the experience of voice assistants with sonic overlaysPersonal and Ubiquitous Computing10.1007/s00779-023-01722-327:5(1927-1947)Online publication date: 6-Jun-2023
https://dl.acm.org/doi/10.1007/s00779-023-01722-3
Li LZhou M(2023)Research on Emotional Design Strategies of Voice Interaction on Smartphones: A Case Study of College Students’ Use of Smart PhonesHCI International 2023 Posters10.1007/978-3-031-35989-7_12(101-108)Online publication date: 9-Jul-2023
https://doi.org/10.1007/978-3-031-35989-7_12

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten