ABSTRACT
‘Wake words’ such as "Alexa" or "Hey Siri", as conversation design elements, mimic the interactionally rich ‘summons-answer’ sequence in natural conversation, but their function amounts to little more than a button-push: simply activating the interface. In practice, however, users vocally overdesign their wake words with all the detail of a ‘real’ interactional summons. We hear users uttering wake words with a specific prosody and intonation, as though for a particular recipient in a particular social/pragmatic context. This presents a puzzle for designers of conversational user interfaces (CUIs). Previous research suggests that expert users simplify their talk when interacting with CUIs, but with wake words we observe the opposite. When users do the extra interactional work of varying their wake words in ways that seem ‘recipient designed’ for a specific other, does that suggest that designers are successfully eliciting natural interaction from users, or is it violating user expectations? Our two case studies highlight how the mismatch between user expectations and the limitations of how wake words are currently implemented can lead to cascades of interactional trouble, especially in multi-party conversations. We argue that designers should find new ways to activate CUIs that align users’ expectations with conversational system design.
- Charles Goodwin. 2007. Interactive footing. In Reporting Talk, Elizabeth Holt and Rebecca Clift (eds.). Cambridge University Press, Cambridge, 16–46. DOI:https://doi.org/10.1017/CBO9780511486654.003Google Scholar
- Alexa Hepburn and Galina B Bolden. 2017. Transcribing for social research. Sage, London.Google Scholar
- William Housley, Saul Albert, and Elizabeth Stokoe. 2019. Natural Action Processing. In Proceedings of the Halfway to the Future Symposium 2019 (HTTF 2019), Association for Computing Machinery, Nottingham, United Kingdom, 1–4. DOI:https://doi.org/10.1145/3363384.3363478Google ScholarDigital Library
- Razan Jaber, Donald McMillan, Jordi Solsona Belenguer, and Barry Brown. 2019. Patterns of gaze in speech agent interaction. In Proceedings of the 1st International Conference on Conversational User Interfaces - CUI ’19, ACM Press, Dublin, Ireland, 1–10. DOI:https://doi.org/10.1145/3342775.3342791Google ScholarDigital Library
- Seung-Hee Lee. 2006. Second summonings in Korean telephone conversation openings. Language in Society. 35, 02. DOI:https://doi.org/10.1017/S0047404506060118Google ScholarCross Ref
- Gene H Lerner. 2003. Selecting next speaker: The context-sensitive operation of a context-free organization. Language in Society. 32, 02, 177–201. DOI:https://doi.org/10.1017/S004740450332202XGoogle ScholarCross Ref
- Ewa Luger and Abigail Sellen. 2016. “Like Having a Really Bad PA”: The Gulf between User Expectation and Experience of Conversational Agents. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI ’16), Association for Computing Machinery, New York, NY, USA, 5286–5297. DOI:https://doi.org/10.1145/2858036.2858288Google ScholarDigital Library
- Robert J. Moore and Raphael Arar. 2019. Conversational UX design: A practitioner's guide to the natural conversation framework. Association for Computing Machinery, New York, NY, USA.Google Scholar
- Clifford Nass and Youngme Moon. 2000. Machines and Mindlessness: Social Responses to Computers. Journal of Social Issues 56, 1 (2000), 81–103. DOI:https://doi.org/10.1111/0022-4537.00153Google ScholarCross Ref
- Hannah R. M. Pelikan and Mathias Broth. 2016. Why That Nao? In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems - CHI \textquotesingle16, ACM Press. DOI:https://doi.org/10.1145/2858036.2858478Google Scholar
- Danielle Pillet-Shore. 2018. How to Begin. Research on Language and Social Interaction 51, 3 (July 2018), 213–231. DOI:https://doi.org/10.1080/08351813.2018.1485224Google ScholarCross Ref
- Martin Porcheron, Joel E Fischer, Stuart Reeves, and Sarah Sharples. 2018. Voice Interfaces in Everyday Life. In Proceedings of the 2018 ACM Conference on Human Factors in Computing Systems - CHI’18, ACM Press. DOI:https://doi.org/10.1145/3173574.3174214Google ScholarDigital Library
- Stuart Reeves, Martin Porcheron, and Joel Fischer. 2018. “This is not what we wanted”: designing for conversation with voice interfaces. Interactions 26, 1, 46–51. DOI:https://doi.org/10.1145/3296699Google ScholarDigital Library
- Harvey Sacks. 1995. Lectures on conversation. Wiley-Blackwell, London.Google Scholar
- Emanuel A Schegloff. 1968. Sequencing in Conversational Openings. American Anthropologist 70, 6, 1075–1095. DOI:https://doi.org/10.1525/aa.1968.70.6.02a00030Google ScholarCross Ref
- Emanuel A Schegloff. 1988. Presequences and indirection: Applying speech act theory to ordinary conversation. Journal of Pragmatics 12, 1 (1988), 55–62.Google ScholarCross Ref
- Emanuel A Schegloff. 2007. Sequence organization in interaction: Volume 1: A primer in conversation analysis. Cambridge University Press, Cambridge.Google Scholar
Recommendations
Progressivity for voice interface design
CUI '19: Proceedings of the 1st International Conference on Conversational User InterfacesDrawing from Conversation Analysis (CA), we examine how the orientation towards progressivity in talk-keeping things moving-might help us better understand and design for voice interactions. We introduce progressivity by surveying its explication in CA, ...
Understanding How People Use Natural Language to Ask for Recommendations
RecSys '17: Proceedings of the Eleventh ACM Conference on Recommender SystemsThe technical barriers for conversing with recommender systems using natural language are vanishing. Already, there are commercial systems that facilitate interactions with an AI agent. For instance, it is possible to say "what should I watch" to an ...
RichReview: blending ink, speech, and gesture to support collaborative document review
UIST '14: Proceedings of the 27th annual ACM symposium on User interface software and technologyThis paper introduces a novel document annotation system that aims to enable the kinds of rich communication that usually only occur in face-to-face meetings. Our system, RichReview, lets users create annotations on top of digital documents using three ...
Comments