research-article

What Could Possibly Go Wrong When Interacting with Proactive Smart Speakers? A Case Study Using an ESM Application

Authors:
Jing Wei

School of Computing and Information Systems, University of Melbourne, Australia

School of Computing and Information Systems, University of Melbourne, Australia
View Profile

,
Benjamin Tag

School of Computing and Information Systems, University of Melbourne, Australia

School of Computing and Information Systems, University of Melbourne, Australia
View Profile

,
Johanne R Trippas

School of Computing and Information Systems, University of Melbourne, Australia

School of Computing and Information Systems, University of Melbourne, Australia
View Profile

,
Tilman Dingler

School of Computing and Information Systems, University of Melbourne, Australia

School of Computing and Information Systems, University of Melbourne, Australia
View Profile

,
Vassilis Kostakos

School of Computing and Information Systems, University of Melbourne, Australia

School of Computing and Information Systems, University of Melbourne, Australia
View Profile

CHI '22: Proceedings of the 2022 CHI Conference on Human Factors in Computing SystemsApril 2022Article No.: 276Pages 1–15https://doi.org/10.1145/3491102.3517432

Published:29 April 2022Publication History

CHI '22: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems

Pages 1–15

ABSTRACT

Voice user interfaces (VUIs) have made their way into people’s daily lives, from voice assistants to smart speakers. Although VUIs typically just react to direct user commands, increasingly, they incorporate elements of proactive behaviors. In particular, proactive smart speakers have the potential for many applications, ranging from healthcare to entertainment; however, their usability in everyday life is subject to interaction errors. To systematically investigate the nature of errors, we designed a voice-based Experience Sampling Method (ESM) application to run on proactive speakers. We captured 1,213 user interactions in a 3-week field deployment in 13 participants’ homes. Through auxiliary audio recordings and logs, we identify substantial interaction errors and strategies that users apply to overcome those errors. We further analyze the interaction timings and provide insights into the time cost of errors. We find that, even for answering simple ESMs, interaction errors occur frequently and can hamper the usability of proactive speakers and user experience. Our work also identifies multiple facets of VUIs that can be improved in terms of the timing of speech.

Supplemental Material

3491102.3517432-talk-video.mp4

mp4

81.6 MB

Download

References

[1] [n.d.]. https://www.amazon.com/cubic-ai-5-Minute-Plank-Workout/dp/B06XHTCB3ZGoogle Scholar
[2] [n.d.]. https://assistant.google.com/services/a/uid/000000addca8c8f3Google Scholar
2021. Smart Speaker Market Global Industry Trends, Share, Size and Forecast Report. https://www.marketwatch.com/press-release/smart-speaker-market-global-industry-trends-share-size-and-forecast-report-2021-02-17?tesla=yGoogle Scholar
Mohammad Aliannejadi, Manajit Chakraborty, Esteban Andrés Ríssola, and Fabio Crestani. 2020. Harnessing evolution of multi-turn conversations for effective answer retrieval. In Proceedings of the 2020 Conference on Human Information Interaction and Retrieval. Association for Computing Machinery, New York, NY, USA, 33–42.Google ScholarDigital Library
Tawfiq Ammari, Jofish Kaye, Janice Y Tsai, and Frank Bentley. 2019. Music, Search, and IoT: How People (Really) Use Voice Assistants.ACM Trans. Comput. Hum. Interact. 26, 3 (2019), 17–1.Google ScholarDigital Library
Bruce Balentine and David P. Morgan. 2001. How to build a speech recognition application: a style guide for telephony dialogues. EIG Press, San Ramon, CA.Google Scholar
Curtis A Becker. 1979. Semantic context and word frequency effects in visual word recognition.Journal of Experimental Psychology: Human Perception and Performance 5, 2(1979), 252.Google ScholarCross Ref
Erin Beneteau, Olivia K Richards, Mingrui Zhang, Julie A Kientz, Jason Yip, and Alexis Hiniker. 2019. Communication breakdowns between families and Alexa. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3290605.3300473Google ScholarDigital Library
Frank Bentley, Chris Luvogt, Max Silverman, Rushani Wirasinghe, Brooke White, and Danielle Lottridge. 2018. Understanding the long-term use of smart speaker assistants. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 3 (2018), 1–24.Google ScholarDigital Library
Timothy W Bickmore, Ha Trinh, Stefan Olafsson, Teresa K O’Leary, Reza Asadi, Nathaniel M Rickles, and Ricardo Cruz. 2018. Patient and consumer safety risks when using conversational assistants for medical information: an observational study of Siri, Alexa, and Google Assistant. Journal of medical Internet research 20, 9 (2018), e11510.Google ScholarCross Ref
Niall Bolger and Jean-Philippe Laurenceau. 2013. Intensive longitudinal methods: An introduction to diary and experience sampling research. Guilford Press, New York, NY, US.Google Scholar
Julia Cambre, Alex C Williams, Afsaneh Razi, Ian Bicking, Abraham Wallin, Janice Tsai, Chinmay Kulkarni, and Jofish Kaye. 2021. Firefox Voice: An Open and Extensible Voice Assistant Built Upon the Web. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1–18. https://doi.org/10.1145/3411764.3445409Google ScholarDigital Library
Justine Cauell, Tim Bickmore, Lee Campbell, and Hannes Vilhjalmsson. 2000. Designing embodied conversational agents. Embodied conversational agents 29 (2000), 29–63.Google Scholar
Irene Celino and Gloria Re Calegari. 2020. Submitting surveys via a conversational interface: an evaluation of user acceptance and approach effectiveness. International Journal of Human-Computer Studies 139 (2020), 102410.Google ScholarCross Ref
Narae Cha, Auk Kim, Cheul Young Park, Soowon Kang, Mingyu Park, Jae-Gil Lee, Sangsu Lee, and Uichin Lee. 2020. Hello There! Is Now a Good Time to Talk? Opportune Moments for Proactive Interactions with Smart Speakers. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 3 (2020), 1–28.Google ScholarDigital Library
Ruth Chambers and Paul Beaney. 2020. The potential of placing a digital assistant in patients’ homes.Google Scholar
Amy Cheng, Vaishnavi Raghavaraju, Jayanth Kanugo, Yohanes P Handrianto, and Yi Shang. 2018. Development and evaluation of a healthy coping voice interface application using the Google home for elderly patients with type 2 diabetes. In 2018 15th IEEE Annual Consumer Communications & Networking Conference (CCNC). IEEE, New York, US, 1–5. https://doi.org/10.1109/CCNC.2018.8319283Google ScholarDigital Library
Janghee Cho and Emilee Rader. 2020. The Role of Conversational Grounding in Supporting Symbiosis Between People and Digital Assistants. Proceedings of the ACM on Human-Computer Interaction 4, CSCW1(2020), 1–28.Google ScholarDigital Library
Woohyeok Choi, Sangkeun Park, Duyeon Kim, Youn-kyung Lim, and Uichin Lee. 2019. Multi-stage receptivity model for mobile just-in-time health intervention. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 2 (2019), 1–26.Google ScholarDigital Library
Leigh Clark, Nadia Pantidi, Orla Cooney, Philip Doyle, Diego Garaialde, Justin Edwards, Brendan Spillane, Emer Gilmartin, Christine Murad, Cosmin Munteanu, 2019. What makes a good conversation? Challenges in designing truly conversational agents. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3290605.3300705Google ScholarDigital Library
Richard L Clayton and Debbie LS Winter. 1992. Speech data entry: results of a test of voice recognition for survey data collection. JOURNAL OF OFFICIAL STATISTICS-STOCKHOLM- 8 (1992), 377–377.Google Scholar
Michael H Cohen, Michael Harris Cohen, James P Giangola, and Jennifer Balogh. 2004. Voice user interface design. Addison-Wesley Professional, Boston, MA, USA.Google ScholarDigital Library
Hasan Shahid Ferdous, Bernd Ploderer, Hilary Davis, Frank Vetere, and Kenton O’hara. 2016. Commensality and the social use of technology during family mealtime. ACM Transactions on Computer-Human Interaction (TOCHI) 23, 6(2016), 1–26.Google ScholarDigital Library
Anna K Fletcher and Greg Shaw. 2011. How voice-recognition software presents a useful transcription tool for qualitative and mixed methods researchers. International Journal of Multiple Research Approaches 5, 2 (2011), 200–206.Google ScholarCross Ref
Markus Funk, Carie Cunningham, Duygu Kanver, Christopher Saikalis, and Rohan Pansare. 2020. Usable and Acceptable Response Delays of Conversational Agents in Automotive User Interfaces. In 12th International Conference on Automotive User Interfaces and Interactive Vehicular Applications. Association for Computing Machinery, New York, NY, USA, 262–269. https://doi.org/10.1145/3409120.3410651Google ScholarDigital Library
Shiyoh Goetsu and Tetsuya Sakai. 2020. Different types of voice user interface failures may cause different degrees of frustration. arXiv preprint arXiv:2002.03582(2020).Google Scholar
Sharon Goldwater, Dan Jurafsky, and Christopher D Manning. 2010. Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates. Speech Communication 52, 3 (2010), 181–200.Google ScholarCross Ref
Danula Hettiachchi, Zhanna Sarsenbayeva, Fraser Allison, Niels van Berkel, Tilman Dingler, Gabriele Marini, Vassilis Kostakos, and Jorge Goncalves. 2020. ”Hi! I am the Crowd Tasker” Crowdsourcing through Digital Voice Assistants. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1–14. https://doi.org/10.1145/3313831.3376320Google ScholarDigital Library
Jeff Huang and Efthimis N Efthimiadis. 2009. Analyzing and evaluating query reformulation strategies in web search logs. In Proceedings of the 18th ACM conference on Information and knowledge management. Association for Computing Machinery, New York, NY, USA, 77–86. https://doi.org/10.1145/1645953.1645966Google ScholarDigital Library
Jiepu Jiang, Wei Jeng, and Daqing He. 2013. How do users respond to voice input errors? Lexical and phonetic query reformulation in voice search. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. Association for Computing Machinery, New York, NY, USA, 143–152. https://doi.org/10.1145/2484028.2484092Google ScholarDigital Library
Alan Kennedy, Alan Wilkes, Leona Elder, and Wayne S Murray. 1988. Dialogue with machines. Cognition 30, 1 (1988), 37–72.Google ScholarCross Ref
Auk Kim, Woohyeok Choi, Jungmi Park, Kyeyoon Kim, and Uichin Lee. 2018. Interrupting Drivers for Interactions: Predicting Opportune Moments for In-vehicle Proactive Auditory-verbal Tasks. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 4 (2018), 1–28.Google ScholarDigital Library
Auk Kim, Jung-Mi Park, and Uichin Lee. 2020. Interruptibility for in-vehicle multitasking: influence of voice task demands and adaptive behaviors. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 1 (2020), 1–22.Google ScholarDigital Library
Allison Koenecke, Andrew Nam, Emily Lake, Joe Nudell, Minnie Quartey, Zion Mengesha, Connor Toups, John R Rickford, Dan Jurafsky, and Sharad Goel. 2020. Racial disparities in automated speech recognition. Proceedings of the National Academy of Sciences 117, 14(2020), 7684–7689.Google ScholarCross Ref
Mitsuki Komori, Yuichiro Fujimoto, Jianfeng Xu, Kazuyuki Tasaka, Hiromasa Yanagihara, and Kinya Fujita. 2019. Experimental Study on Estimation of Opportune Moments for Proactive Voice Information Service Based on Activity Transition for People Living Alone. In International Conference on Human-Computer Interaction. Springer, Springer International Publishing, Cham, 527–539.Google Scholar
Deepak Kumar, Riccardo Paccagnella, Paul Murley, Eric Hennenfent, Joshua Mason, Adam Bates, and Michael Bailey. 2018. Skill squatting attacks on Amazon Alexa. In 27th {USENIX} Security Symposium ({USENIX} Security 18). USENIX Association, Baltimore, MD, 33–47. https://www.usenix.org/conference/usenixsecurity18/presentation/kumarGoogle Scholar
Dounia Lahoual and Myriam Frejus. 2019. When users assist the voice assistants: From supervision to failure resolution. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1–8. https://doi.org/10.1145/3343413.3377968Google ScholarDigital Library
Reed Larson and Mihaly Csikszentmihalyi. 2014. The experience sampling method. In Flow and the foundations of positive psychology. Springer, Dordrecht, 21–34.Google Scholar
Sunok Lee, Minji Cho, and Sangsu Lee. 2020. What If Conversational Agents Became Invisible? Comparing Users’ Mental Models According to Physical Entity of AI Speaker. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 3 (2020), 1–24.Google ScholarDigital Library
Toby Jia-Jun Li, Jingya Chen, Haijun Xia, Tom M Mitchell, and Brad A Myers. 2020. Multi-Modal Repairs of Conversational Breakdowns in Task-Oriented Dialogs. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. Association for Computing Machinery, New York, NY, USA, 1094–1107. https://doi.org/10.1145/3379337.3415820Google ScholarDigital Library
Anthony J Liddicoat. 2021. An introduction to conversation analysis. Bloomsbury Publishing, London, England.Google Scholar
Ewa Luger and Abigail Sellen. 2016. ”Like Having a Really Bad PA”: The Gulf between User Expectation and Experience of Conversational Agents. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’16). Association for Computing Machinery, New York, NY, USA, 5286–5297. https://doi.org/10.1145/2858036.2858288Google ScholarDigital Library
Yuhan Luo, Bongshin Lee, and Eun Kyoung Choe. 2020. TandemTrack: Shaping consistent exercise experience by complementing a mobile app with a smart speaker. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376616Google ScholarDigital Library
Raju Maharjan, Darius Adam Rohani, Per Bækgaard, Jakob Bardram, and Kevin Doherty. 2021. Can we talk? Design Implications for the Questionnaire-Driven Self-Report of Health and Wellbeing via Conversational Agent. In CUI 2021-3rd Conference on Conversational User Interfaces. Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3469595.3469600Google ScholarDigital Library
Donald McMillan, Moira McGregor, and Barry Brown. 2015. From in the Wild to in Vivo: Video Analysis of Mobile Device Use. In Proceedings of the 17th International Conference on Human-Computer Interaction with Mobile Devices and Services (Copenhagen, Denmark) (MobileHCI ’15). Association for Computing Machinery, New York, NY, USA, 494–503. https://doi.org/10.1145/2785830.2785883Google ScholarDigital Library
Michael Frederick McTear, Zoraida Callejas, and David Griol. 2016. The Conversational Interface. Springer, Cham.Google Scholar
Hyunsu Mun, Hyungjin Lee, Soohyun Kim, and Youngseok Lee. 2020. A Smart Speaker Performance Measurement Tool. In Proceedings of the 35th Annual ACM Symposium on Applied Computing. Association for Computing Machinery, New York, NY, USA, 755–762. https://doi.org/10.1145/3341105.3373990Google ScholarDigital Library
Hyunsu Mun and Youngseok Lee. 2020. Accelerating Smart Speaker Service with Content Prefetching and Local Control. In 2020 IEEE 17th Annual Consumer Communications & Networking Conference (CCNC). IEEE, New York, US, 1–6.Google Scholar
Chelsea Myers, Anushay Furqan, Jessica Nebolsky, Karina Caro, and Jichen Zhu. 2018. Patterns for how users overcome obstacles in voice user interfaces. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1–7. https://doi.org/10.1145/3173574.3173580Google ScholarDigital Library
Chelsea M Myers, Anushay Furqan, and Jichen Zhu. 2019. The impact of user characteristics and preferences on performance with an unfamiliar voice user interface. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1–9. https://doi.org/10.1145/3290605.3300277Google ScholarDigital Library
Nick Nikiforakis, Marco Balduzzi, Lieven Desmet, Frank Piessens, and Wouter Joosen. 2014. Soundsquatting: Uncovering the use of homophones in domain squatting. In International Conference on Information Security. Springer International Publishing, Cham, 291–308.Google ScholarCross Ref
Chaewon Park, Yoonseob Lim, Jongsuk Choi, and Jee Eun Sung. 2021. Changes in linguistic behaviors based on smart speaker task performance and pragmatic skills in multiple turn-taking interactions. Intelligent Service Robotics 14, 3 (2021), 1–16.Google ScholarDigital Library
Sonia Paul. 2017. Voice Is the Next Big Platform, Unless You Have an Accent | Backchannel. https://www.wired.com/2017/03/voice-is-the-next-big-platform-unless-you-have-an-accent/Google Scholar
Hannah RM Pelikan and Mathias Broth. 2016. Why that nao? how humans adapt to a conventional humanoid robot in taking turns-at-talk. In Proceedings of the 2016 CHI conference on human factors in computing systems. Association for Computing Machinery, New York, NY, USA, 4921–4932. https://doi.org/10.1145/2858036.2858478Google ScholarDigital Library
Martin Pielot, Bruno Cardoso, Kleomenis Katevas, Joan Ser à, Aleksandar Matic, and Nuria Oliver. 2017. Beyond interruptibility: Predicting opportune moments to engage mobile phone users. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 3 (sep 2017), 1–25. https://doi.org/10.1145/3130956Google ScholarDigital Library
Martin Pielot, Tilman Dingler, Jose San Pedro, and Nuria Oliver. 2015. When Attention is Not Scarce - Detecting Boredom from Mobile Phone Usage. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (Osaka, Japan) (UbiComp ’15). Association for Computing Machinery, New York, NY, USA, 825–836. https://doi.org/10.1145/2750858.2804252Google ScholarDigital Library
Martin Porcheron, Joel E Fischer, Stuart Reeves, and Sarah Sharples. 2018. Voice interfaces in everyday life. In proceedings of the 2018 CHI conference on human factors in computing systems. Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3173574.3174214Google ScholarDigital Library
Martin Porcheron, Joel E Fischer, and Sarah Sharples. 2017. ”Do Animals Have Accents?” Talking with Agents in Multi-Party Conversation. In Proceedings of the 2017 ACM conference on computer supported cooperative work and social computing. Association for Computing Machinery, New York, NY, USA, 207–219. https://doi.org/10.1145/2998181.2998298Google ScholarDigital Library
Aung Pyae and Paul Scifleet. 2019. Investigating the role of user’s English language proficiency in using a voice user interface: A case of Google Home smart speaker. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1–6. https://doi.org/10.1145/3290607.3313038Google ScholarDigital Library
Stuart Reeves, Martin Porcheron, and Joel Fischer. 2018. ’This is not what we wanted’ designing for conversation with voice interfaces. Interactions 26, 1 (2018), 46–51.Google ScholarDigital Library
Melanie Revilla, Mick P Couper, Oriol J Bosch, and Marc Asensio. 2020. Testing the use of voice input in a smartphone web survey. Social Science Computer Review 38, 2 (2020), 207–224.Google ScholarDigital Library
Felicia Roberts, Alexander L Francis, and Melanie Morgan. 2006. The interaction of inter-turn silence with prosodic cues in listener perceptions of “trouble” in conversation. Speech communication 48, 9 (2006), 1079–1093.Google Scholar
Alex Sciuto, Arnita Saini, Jodi Forlizzi, and Jason I. Hong. 2018. ”Hey Alexa, What’s Up?”: A Mixed-Methods Studies of In-Home Conversational Agent Usage. In Proceedings of the 2018 Designing Interactive Systems Conference (Hong Kong, China) (DIS ’18). Association for Computing Machinery, New York, NY, USA, 857–868. https://doi.org/10.1145/3196709.3196772Google ScholarDigital Library
Hyewon Suh, Nina Shahriaree, Eric B Hekler, and Julie A Kientz. 2016. Developing and validating the user burden scale: A tool for assessing user burden in computing systems. In Proceedings of the 2016 CHI conference on human factors in computing systems. Association for Computing Machinery, New York, NY, USA, 3988–3999. https://doi.org/10.1145/2858036.2858448Google ScholarDigital Library
Jaime Teevan, Eytan Adar, Rosie Jones, and Michael A. S. Potts. 2007. Information Re-Retrieval: Repeat Queries in Yahoo’s Logs. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Amsterdam, The Netherlands) (SIGIR ’07). Association for Computing Machinery, New York, NY, USA, 151–158. https://doi.org/10.1145/1277741.1277770Google ScholarDigital Library
Daphne Townsend, Frank Knoefel, and Rafik Goubran. 2011. Privacy versus autonomy: a tradeoff model for smart home monitoring technologies. In 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE, New York, US, 4749–4752.Google ScholarCross Ref
Johanne R. Trippas, Damiano Spina, Lawrence Cavedon, Hideo Joho, and Mark Sanderson. 2018. Informing the Design of Spoken Conversational Search: Perspective Paper. In Proceedings of the 2018 Conference on Human Information Interaction & Retrieval(CHIIR ’18). Association for Computing Machinery, New York, NY, USA, 32–41.Google ScholarDigital Library
Niels Van Berkel, Denzil Ferreira, and Vassilis Kostakos. 2017. The experience sampling method on mobile devices. ACM Computing Surveys (CSUR) 50, 6 (2017), 1–40. https://doi.org/10.1145/3123988Google ScholarDigital Library
Jing Wei, Tilman Dingler, and Vassilis Kostakos. 2021. Developing the Proactive Speaker Prototype Based on Google Home. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1–6. https://doi.org/10.1145/3411763.3451642Google ScholarDigital Library
Yukang Yan, Chun Yu, Wengrui Zheng, Ruining Tang, Xuhai Xu, and Yuanchun Shi. 2020. FrownOnError: Interrupting Responses from Smart Speakers by Facial Expressions. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1–14. https://doi.org/10.1145/3313831.3376810Google ScholarDigital Library
Nan Zhang, Xianghang Mi, Xuan Feng, XiaoFeng Wang, Yuan Tian, and Feng Qian. 2019. Dangerous Skills: Understanding and Mitigating Security Risks of Voice-Controlled Third-Party Functions on Virtual Personal Assistant Systems. In 2019 IEEE Symposium on Security and Privacy (SP). IEEE, New York, US, 1381–1396. https://doi.org/10.1109/SP.2019.00016Google ScholarCross Ref

Index Terms

What Could Possibly Go Wrong When Interacting with Proactive Smart Speakers? A Case Study Using an ESM Application
1. Human-centered computing
  1. Human computer interaction (HCI)
  2. Interaction design
    1. Empirical studies in interaction design

Recommendations

Developing the Proactive Speaker Prototype Based on Google Home
CHI EA '21: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems

Smart speakers and conversational interfaces increasingly make it into consumer’s homes. Listening to users’ commands assigns them a rather passive role. Proactive speakers, on the other hand, have the potential to empower a broad range of applications ...
Read More
Understanding User Perceptions of Proactive Smart Speakers

Voice assistants, such as Amazon's Alexa and Google Home, increasingly find their way into consumer homes. Their functionality, however, is currently limited to being passive answer machines rather than proactively engaging users in conversations. ...
Read More
May I Interrupt? Diverging Opinions on Proactive Smart Speakers
CUI '21: Proceedings of the 3rd Conference on Conversational User Interfaces

Although smart speakers support increasingly complex multi-turn dialogues, they still play a mostly reactive role, responding to user’s questions or requests. With rapid technological advances, they are becoming more capable of initiating conversations ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHI '22: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems
April 2022
10459 pages
ISBN:9781450391573
DOI:10.1145/3491102
Editors:
Simone Barbosa
PUC-Rio, Brazil
,
Cliff Lampe
University of Michigan, USA
,
Caroline Appert
Université Paris-Saclay, France
,
David A. Shamma
Toyota Research Institute, USA
,
Steven Drucker
Microsoft Research, USA
,
Julie Williamson
University of Glasgow, UK
,
Koji Yatani
University of Tokyo, Japan
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 April 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Google Home
Voice user interface
interaction error
smart speakers
user experience
voice assistants
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate6,199of26,314submissions,24%
Upcoming Conference
CHI '24

Sponsor:

sigchi

CHI Conference on Human Factors in Computing Systems

May 11 - 16, 2024

Honolulu , HI , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 622
  Total Downloads
- Downloads (Last 12 months)180
- Downloads (Last 6 weeks)21
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

What Could Possibly Go Wrong When Interacting with Proactive Smart Speakers? A Case Study Using an ESM Application

CHI '22: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Developing the Proactive Speaker Prototype Based on Google Home

Understanding User Perceptions of Proactive Smart Speakers

May I Interrupt? Diverging Opinions on Proactive Smart Speakers