skip to main content
10.1145/3411764.3445579acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article

Designers Characterize Naturalness in Voice User Interfaces: Their Goals, Practices, and Challenges

Published: 07 May 2021 Publication History

Abstract

This work investigates the practices and challenges of voice user interface (VUI) designers. Existing VUI design guidelines recommend that designers strive for natural human-agent conversation. However, the literature leaves a critical gap regarding how designers pursue naturalness in VUIs and what their struggles are in doing so. Bridging this gap is necessary for identifying designers’ needs and supporting them. Our interviews with 20 VUI designers identified 12 ways that designers characterize and approach naturalness in VUIs. We categorized these characteristics into three groupings based on the types of conversational context that each characteristic contributes to: Social, Transactional, and Core. Our results contribute new findings on designers’ challenges, such as a design dilemma in augmenting task-oriented VUIs with social conversations, difficulties in writing for spoken language, lack of proper tool support for imbuing synthesized voice with expressivity, and implications for developing design tools and guidelines.

References

[1]
Marshall D. Abrams, George E. Lindamood, and Thomas N. Pyke. 1973. Measuring and Modelling Man-Computer Interaction. In Proceedings of the 1973 ACM SIGME Symposium(SIGME ’73). Association for Computing Machinery, New York, NY, USA, 136–142. https://doi.org/10.1145/800268.809345
[2]
Aaron Adler and Randall Davis. 2007. Speech and Sketching for Multimodal Design. In ACM SIGGRAPH 2007 Courses (San Diego, California) (SIGGRAPH ’07). Association for Computing Machinery, New York, NY, USA, 14–es. https://doi.org/10.1145/1281500.1281525
[3]
F Niyi Akinnaso. 1982. On the differences between spoken and written language. Language and speech 25, 2 (1982), 97–125.
[4]
James Allen, Donna Byron, Myroslava Dzikovska, George Ferguson, Lucian Galescu, and Amanda Stent. 2000. An architecture for a generic dialogue shell. Natural Language Engineering 6, 3-4 (2000), 213–228.
[5]
Amazon. 2018. Things Every Alexa Skill Should Do: Pass the One-Breath Test. Amazon. https://developer.amazon.com/blogs/alexa/post/531ffdd7-acf3-43ca-9831-9c375b08afe0/things-every-alexa-skill-should-do-pass-the-one-breath-test
[6]
Amazon. 2020. Conversational AI. Amazon. https://developer.amazon.com/en-US/alexa/alexa-skills-kit/conversational-ai
[7]
Amazon. 2020. Design Process: The process of thinking through the design of a voice experience. https://developer.amazon.com/fr/designing-for-voice/design-process/ Accessed: 2020-01-31.
[8]
Amazon. 2020. Supported SSML Tags. Amazon. https://docs.aws.amazon.com/polly/latest/dg/supportedtags.html
[9]
Amazon. 2020. Voice Design for Alexa Experiences: Be Adaptable. Amazon. https://developer.amazon.com/en-US/docs/alexa/alexa-design/adaptable.html
[10]
Amazon. 2020. Write Out a Script with Conversational Turns. Amazon. https://developer.amazon.com/en-US/docs/alexa/alexa-design/script.html
[11]
AmazonAlexa. 2019. Amazon Alexa Live Design Track - Free Online Conference for Voice Developers. https://www.twitch.tv/videos/418170087?t=0h50m26s
[12]
Tawfiq Ammari, Jofish Kaye, Janice Y. Tsai, and Frank Bentley. 2019. Music, Search, and IoT: How People (Really) Use Voice Assistants. ACM Trans. Comput.-Hum. Interact. 26, 3, Article 17 (April 2019), 28 pages. https://doi.org/10.1145/3311956
[13]
T Bennett. 1977. Verb voice in Unplanned and Planned narratives. Keenan, EO yT. Bennett (ed.): Discourse across time and space. SCOPIL 5 (1977).
[14]
Gregory S Berns. 2004. Something funny happened to reward. Trends in cognitive sciences 8, 5 (2004), 193–194.
[15]
Li Bian and Henry Holtzman. 2011. Qooqle: Search with Speech, Gesture, and Social Media. In Proceedings of the 13th International Conference on Ubiquitous Computing (Beijing, China) (UbiComp ’11). Association for Computing Machinery, New York, NY, USA, 541–542. https://doi.org/10.1145/2030112.2030203
[16]
Timothy Bickmore and Justine Cassell. 2001. Relational Agents: A Model and Implementation of Building User Trust. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Seattle, Washington, USA) (CHI ’01). Association for Computing Machinery, New York, NY, USA, 396–403. https://doi.org/10.1145/365024.365304
[17]
Timothy W. Bickmore and Rosalind W. Picard. 2005. Establishing and Maintaining Long-Term Human-Computer Relationships. ACM Trans. Comput.-Hum. Interact. 12, 2 (June 2005), 293–327. https://doi.org/10.1145/1067860.1067867
[18]
Susanne Bødker. 2006. When second wave HCI meets third wave challenges. In Proceedings of the 4th Nordic conference on Human-computer interaction: changing roles. Association for Computing Machinery, New York, NY, USA, 1–8.
[19]
Gladys Borchers. 1936. An approach to the problem of oral style. Quarterly Journal of Speech 22, 1 (1936), 114–117.
[20]
Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative Research in Psychology 3, 2 (2006), 77–101. https://doi.org/10.1191/1478088706qp063oa
[21]
Virginia Braun, Victoria Clarke, Nikki Hayfield, and Gareth Terry. 2010. Thematic Analysis. Springer Singapore, Singapore, Chapter 48, 843–860. https://doi.org/10.1007/978-981-10-5251-4_103
[22]
Patricia Braunger and Wolfgang Maier. 2017. Natural Language Input for In-Car Spoken Dialog Systems: How Natural is Natural?. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue. Association for Computational Linguistics, Pennsylvania, PA, USA, 137–146.
[23]
Gillian Brown, Gillian D Brown, Gillian R Brown, Brown Gillian, and George Yule. 1983. Discourse analysis. Cambridge university press, Cambridge, UK.
[24]
J. Cassell, T. Bickmore, M. Billinghurst, L. Campbell, K. Chang, H. Vilhjálmsson, and H. Yan. 1999. Embodiment in Conversational Interfaces: Rea. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Pittsburgh, Pennsylvania, USA) (CHI ’99). Association for Computing Machinery, New York, NY, USA, 520–527. https://doi.org/10.1145/302979.303150
[25]
Jonathan P. Chang, Caleb Chiam, Liye Fu, Andrew Wang, Justine Zhang, and Cristian Danescu-Niculescu-Mizil. 2020. ConvoKit: A Toolkit for the Analysis of Conversations. In Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Association for Computational Linguistics, 1st virtual meeting, 57–60. https://www.aclweb.org/anthology/2020.sigdial-1.8
[26]
Rebecca Cherng-Shiow Chang, Hsi-Peng Lu, and Peishan Yang. 2018. Stereotypes or golden rules? Exploring likable voice traits of social robots as active aging companions for tech-savvy baby boomers in Taiwan. Computers in Human Behavior 84 (2018), 194 – 210. https://doi.org/10.1016/j.chb.2018.02.025
[27]
Leigh Clark, Nadia Pantidi, Orla Cooney, Philip Doyle, Diego Garaialde, Justin Edwards, Brendan Spillane, Emer Gilmartin, Christine Murad, Cosmin Munteanu, 2019. What makes a good conversation? challenges in designing truly conversational agents. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1–12.
[28]
Victoria Clarke and Virginia Braun. 2014. Thematic Analysis. Springer, New York, NY, 1947–1952. https://doi.org/10.1007/978-1-4614-5583-7_311
[29]
Michael H Cohen, Michael Harris Cohen, James P Giangola, and Jennifer Balogh. 2004. Voice user interface design. Addison-Wesley Professional, Boston, USA.
[30]
Ron Cole, Dominic W Massaro, Jacques de Villiers, Brian Rundle, Khaldoun Shobaki, Johan Wouters, Michael Cohen, Jonas Baskow, Patrick Stone, Pamela Connors, 1999. New tools for interactive speech and language training: using animated conversational agents in the classroom of profoundly deaf children. In MATISSE-ESCA/SOCRATES Workshop on Method and Tool Innovations for Speech Science Education. ISCA, BAIXAS, FRANCE.
[31]
Lucie Daubigney, Matthieu Geist, Senthilkumar Chandramohan, and Olivier Pietquin. 2012. A comprehensive reinforcement learning framework for dialogue management optimization. IEEE Journal of Selected Topics in Signal Processing 6, 8 (2012), 891–902.
[32]
Ken H Davis, R Biddulph, and Stephen Balashek. 1952. Automatic recognition of spoken digits. The Journal of the Acoustical Society of America 24, 6 (1952), 637–642.
[33]
Philip R Doyle, Justin Edwards, Odile Dumbleton, Leigh Clark, and Benjamin R Cowan. 2019. Mapping Perceptions of Humanness in Intelligent Personal Assistant Interaction. In Proceedings of the 21st International Conference on Human-Computer Interaction with Mobile Devices and Services. Association for Computing Machinery, New York, NY, USA, 1–12.
[34]
Gerard HJ Drieman. 1962. Differences between written and spoken language: An exploratory study. Acta Psychologica 20(1962), 78–100.
[35]
Suzanne Eggins and Diana Slade. 2005. Analysing casual conversation. Equinox Publishing Ltd., Sheffield, United Kingdom.
[36]
Yuan-Yi Fan, Soyoung Shin, and Vids Samanta. 2017. Contour: An Efficient Voice-enabled Workflow for Producing Text-to-Speech Content. In Adjunct Publication of the 30th Annual ACM Symposium on User Interface Software and Technology. Association for Computing Machinery, New York, NY, USA, 133–135.
[37]
Piedad Garrido, F Martinez, and Christian Guetl. 2010. Adding semantic web knowledge to intelligent personal assistant agents. In Proceedings of the ISWC 2010 Workshops. Philippe Cudre-Mauroux, Cambridge, MA, USA, 1–12.
[38]
Emer Gilmartin, Francesca Bonin, Loredana Cerrato, Carl Vogel, and Nick Campbell. 2015. What’s the game and who’s got the ball? genre in spoken interaction. In 2015 AAAI Spring symposium series. AAAI, Menlo Park, CA, US.
[39]
Przemysław Głomb, Michał Romaszewski, Sebastian Opozda, and Arkadiusz Sochan. 2012. Choosing and Modeling the Hand Gesture Database for a Natural User Interface. In Gesture and Sign Language in Human-Computer Interaction and Embodied Communication, Eleni Efthimiou, Georgios Kouroupetroglou, and Stavroula-Evita Fotinea (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 24–35.
[40]
Google. 2020. Conversation design. Google. https://developers.google.com/assistant/actions/design
[41]
Google. 2020. Conversation Design Guideline. Google. https://designguidelines.withgoogle.com/conversation/conversation-design/welcome.html
[42]
Google. 2020. Conversational components–Informational statements. Google. https://designguidelines.withgoogle.com/conversation/conversational-components/informational-statements.html
[43]
David Griol, Javier Carbó, and José M Molina. 2013. An automatic dialog simulation technique to develop and evaluate interactive conversational agents. Applied Artificial Intelligence 27, 9 (2013), 759–780.
[44]
Lone Koefoed Hansen and Peter Dalsgaard. 2015. Note to Self: Stop Calling Interfaces “Natural”. In Proceedings of The Fifth Decennial Aarhus Conference on Critical Alternatives (Aarhus, Denmark) (CA ’15). Aarhus University Press, Aarhus N, 65–68. https://doi.org/10.7146/aahcc.v1i1.21316
[45]
Randy Allen Harris. 2004. Voice interaction design: crafting the new conversational speech systems. Elsevier, Amsterdam, Netherlands.
[46]
Thomas Kevin Harris and Roni Rosenfeld. 2004. A universal speech interface for appliances. In Eighth International Conference on Spoken Language Processing. ISCA, BAIXAS, FRANCE.
[47]
Francis Heylighen and Jean-Marc Dewaele. 2002. Variation in the contextuality of language: An empirical measure. Foundations of science 7, 3 (2002), 293–340.
[48]
Katri Hiovain, Päivi Kristiina Jokinen, 2016. Acoustic features of different types of laughter in North Sami conversational speech. In Proceedings of the LREC Workshop Just talking–casual talk among humans and machines, Portorož, Slovenia, 2016. European Language Resources Association (ELRA), France.
[49]
Jim Hollan and Scott Stornetta. 1992. Beyond being there. In Proceedings of the SIGCHI conference on Human factors in computing systems. Association for Computing Machinery, New York, NY, USA, 119–125.
[50]
Terence KL Hui and R Simon Sherratt. 2017. Towards disappearing user interfaces for ubiquitous computing: human enhancement from sixth sense to super senses. Journal of Ambient Intelligence and Humanized Computing 8, 3 (2017), 449–465.
[51]
IBM. 2019. Expressive SSML. IBM. https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-expressive
[52]
Apple Inc. 2020. Siri Style Guide - SiriKit. https://developer.apple.com/siri/style-guide/
[53]
SoundHound Inc.2020. Natural Dialogue and Conversation with Voice UI. SoundHound Inc. https://www.soundhound.com/vui-guide/enabling-natural-conversations-in-vui
[54]
INTUITY. 1996. INTUITY™ CONVERSANT® System Version 6.0. Lucent Technologies, Bell Labs Innovations Bell Labs Innovations, NJ, USA.
[55]
Hyunhoon Jung, Hee Jae Kim, Seongeun So, Jinjoong Kim, and Changhoon Oh. 2019. TurtleTalk: an educational programming game for children with voice user interface. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1–6.
[56]
Scott R. Klemmer, Anoop K. Sinha, Jack Chen, James A. Landay, Nadeem Aboobaker, and Annie Wang. 2000. Suede: A Wizard of Oz Prototyping Tool for Speech User Interfaces. In Proceedings of the 13th Annual ACM Symposium on User Interface Software and Technology (San Diego, California, USA) (UIST ’00). Association for Computing Machinery, New York, NY, USA, 1–10. https://doi.org/10.1145/354401.354406
[57]
A. Kumar, F. Metze, and M. Kam. 2014. Enabling the Rapid Development and Adoption of Speech-User Interfaces. Computer 47, 1 (2014), 40–47.
[58]
Unseok Lee and Jiro Tanaka. 2013. Finger Identification and Hand Gesture Recognition Techniques for Natural User Interface. In Proceedings of the 11th Asia Pacific Conference on Computer Human Interaction (Bangalore, India) (APCHI ’13). Association for Computing Machinery, New York, NY, USA, 274–279. https://doi.org/10.1145/2525194.2525296
[59]
Lizi Liao, Yunshan Ma, Xiangnan He, Richang Hong, and Tat-seng Chua. 2018. Knowledge-aware multimodal dialogue systems. In Proceedings of the 26th ACM international conference on Multimedia. Association for Computing Machinery, New York, NY, USA, 801–809.
[60]
Vice Media LLC. 2020. Google Duplex Puts AI Into a Social Uncanny Valley. Vice Media LLC. https://www.vice.com/en_us/article/d3kgkk/google-duplex-assistant-voice-call-dystopia
[61]
B. Loureiro and R. Rodrigues. 2011. Multi-touch as a Natural User Interface for elders: A survey. In 6th Iberian Conference on Information Systems and Technologies (CISTI 2011). IEEE, New Jersey, NJ, United States, 1–6.
[62]
Bruce T Lowerre. 1976. The HARPY speech recognition system. Technical Report. CARNEGIE-MELLON UNIV PITTSBURGH PA DEPT OF COMPUTER SCIENCE.
[63]
Ewa Luger and Abigail Sellen. 2016. “Like Having a Really Bad PA”: The Gulf between User Expectation and Experience of Conversational Agents. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (San Jose, California, USA) (CHI ’16). Association for Computing Machinery, New York, NY, USA, 5286–5297. https://doi.org/10.1145/2858036.2858288
[64]
John Lyons. 1977. Semantics Cambridge University Press. Cambridge, UK 1(1977).
[65]
Speech Technology Magazine. 2018. Q&A: Designing Natural Language Home Assistants Using Amazon Alexa. https://www.speechtechmag.com/Articles/ReadArticle.aspx?ArticleID=123486
[66]
Alessio Malizia and Andrea Bellucci. 2012. The Artificiality of Natural User Interfaces. Commun. ACM 55, 3 (March 2012), 36–38. https://doi.org/10.1145/2093548.2093563
[67]
Michael L Mauldin. 1994. Chatterbots, tinymuds, and the turing test: Entering the loebner prize competition. In AAAI, Vol. 94. AAAI, Menlo Park, CA, US, 16–21.
[68]
Michael F McTear. 2016. The rise of the conversational interface: A new kid on the block?. In International Workshop on Future and Emerging Trends in Language Technology. Springer, New York, NY, USA, 38–49.
[69]
Michael F. McTear. 2017. The Rise of the Conversational Interface: A New Kid on the Block?. In Future and Emerging Trends in Language Technology. Machine Learning and Big Data, José F Quesada, Francisco-Jesús Martín Mateos, and Teresa López Soto (Eds.). Springer International Publishing, Cham, 38–49.
[70]
Robert J Moore and Raphael Arar. 2019. Conversational UX Design: A Practitioner’s Guide to the Natural Conversation Framework. Morgan & Claypool, CA, USA.
[71]
Christine Murad and Cosmin Munteanu. 2019. ” I don’t know what you’re talking about, HALexa” the case for voice user interface guidelines. In Proceedings of the 1st International Conference on Conversational User Interfaces. Association for Computing Machinery, New York, NY, USA, 1–3.
[72]
Christine Murad and Cosmin Munteanu. 2020. Designing Voice Interfaces: Back to the (Curriculum) Basics. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3313831.3376522
[73]
Christine Murad, Cosmin Munteanu, Leigh Clark, and Benjamin R Cowan. 2018. Design guidelines for hands-free speech interaction. In Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services Adjunct. Association for Computing Machinery, New York, NY, USA, 269–276.
[74]
Chelsea M. Myers, David Grethlein, Anushay Furqan, Santiago Ontañón, and Jichen Zhu. 2019. Modeling Behavior Patterns with an Unfamiliar Voice User Interface. In Proceedings of the 27th ACM Conference on User Modeling, Adaptation and Personalization (Larnaca, Cyprus) (UMAP ’19). Association for Computing Machinery, New York, NY, USA, 196–200. https://doi.org/10.1145/3320435.3320475
[75]
Clifford Nass and Youngme Moon. 2000. Machines and Mindlessness: Social Responses to Computers. Journal of Social Issues 56, 1 (2000), 81–103. https://doi.org/10.1111/0022-4537.00153
[76]
Clifford Nass, Youngme Moon, B. J. Fogg, Byron Reeves, and Chris Dryer. 1995. Can Computer Personalities Be Human Personalities?. In Conference Companion on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’95). Association for Computing Machinery, New York, NY, USA, 228–229. https://doi.org/10.1145/223355.223538
[77]
Clifford Nass, Youngme Moon, B. J. Fogg, Byron Reeves, and Chris Dryer. 1995. Can Computer Personalities Be Human Personalities?. In Conference Companion on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’95). Association for Computing Machinery, New York, NY, USA, 228–229. https://doi.org/10.1145/223355.223538
[78]
Clifford Nass, Youngme Moon, and Nancy Green. 1997. Are Machines Gender Neutral? Gender-Stereotypic Responses to Computers With Voices. Journal of Applied Social Psychology 27, 10 (1997), 864–876. https://doi.org/10.1111/j.1559-1816.1997.tb00275.x
[79]
Clifford Ivar Nass and Scott Brave. 2005. Wired for speech: How voice activates and advances the human-computer relationship. MIT press Cambridge, MA, USA.
[80]
R. S. Nickerson. 1976. On Conversational Interaction with Computers. In Proceedings of the ACM/SIGGRAPH Workshop on User-Oriented Design of Interactive Graphics Systems (Pittsburgh, PA) (UODIGS ’76). Association for Computing Machinery, New York, NY, USA, 101–113. https://doi.org/10.1145/1024273.1024286
[81]
Donald A. Norman. 2010. Natural User Interfaces Are Not Natural. Interactions 17, 3 (May 2010), 6–10. https://doi.org/10.1145/1744161.1744163
[82]
Ulin Nuha. 2014. Transactional and Interpersonal Conversation Texts in English Textbook. Register Journal 7, 2 (2014), 205–224.
[83]
Roy C O’Donnell. 1974. Syntactic differences between speech and writing. American Speech 49, 1/2 (1974), 102–110.
[84]
Shigehiro Oishi, Selin Kesebir, Casey Eggleston, and Felicity F Miao. 2014. A hedonic story has a transmission advantage over a eudaimonic story.Journal of Experimental Psychology: General 143, 6 (2014), 2153.
[85]
Ioannis Papaioannou, Christian Dondrup, Jekaterina Novikova, and Oliver Lemon. 2017. Hybrid chat and task dialogue for more engaging hri using reinforcement learning. In 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN). IEEE, New Jersey, NJ, United States, 593–598.
[86]
Cathy Pearl. 2016. Designing voice user interfaces: principles of conversational experiences. ”O’Reilly Media, Inc.”, Massachusetts, USA.
[87]
Diana Pérez-Marín and Ismael Pascual-Nieto. 2013. An exploratory study on how children interact with pedagogic conversational agents. Behaviour & Information Technology 32, 9 (2013), 955–964.
[88]
Martin Porcheron, Joel E. Fischer, Stuart Reeves, and Sarah Sharples. 2018. Voice Interfaces in Everyday Life. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, Article 640, 12 pages. https://doi.org/10.1145/3173574.3174214
[89]
Aung Pyae and Tapani N. Joelsson. 2018. Investigating the Usability and User Experiences of Voice User Interface: A Case of Google Home Smart Speaker. In Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services Adjunct(Barcelona, Spain) (MobileHCI ’18). Association for Computing Machinery, New York, NY, USA, 127–131. https://doi.org/10.1145/3236112.3236130
[90]
Silvia Quarteroni. 2018. Natural Language Processing for Industrial Applications. Spektrum 41(2018), 105.
[91]
W3C Recommendation. 2010. SSML stardard set up by W3C Recommendation. W3C Recommendation. https://www.w3.org/TR/speech-synthesis11/#:~:text=SSML%20is%20part%20of%20a,in%20Web%20and%20other%20applications.
[92]
Gisela Redeker. 1984. On differences between spoken and written language. Discourse processes 7, 1 (1984), 43–55.
[93]
Byron Reeves and Clifford Ivar Nass. 1996. The media equation: How people treat computers, television, and new media like real people and places.Cambridge university press, Cambridge,UK.
[94]
Jack C Richards, Jack C Richards, 1990. The language teaching matrix. Cambridge University Press, Cambridge,UK.
[95]
Steven Ross, Elizabeth Brownholtz, and Robert Armes. 2004. Voice User Interface Principles for a Conversational Agent. In Proceedings of the 9th International Conference on Intelligent User Interfaces (Funchal, Madeira, Portugal) (IUI ’04). Association for Computing Machinery, New York, NY, USA, 364–365. https://doi.org/10.1145/964442.964536
[96]
James L Ryan, Richard L Crandall, and Marion C Medwedeff. 1966. A conversational system for incremental compilation and execution in a time-sharing environment. In Proceedings of the November 7-10, 1966, fall joint computer conference. Association for Computing Machinery, New York, NY, USA, 1–21.
[97]
Tracy Sanders, Kristin E Oleson, Deborah R Billings, Jessie YC Chen, and Peter A Hancock. 2011. A model of human-robot trust: Theoretical model development. In Proceedings of the human factors and ergonomics society annual meeting, Vol. 55. SAGE Publications, Los Angeles, CA, 1432–1436.
[98]
Sergio Sayago, Barbara Barbosa Neves, and Benjamin R Cowan. 2019. Voice Assistants and Older People: Some Open Issues. In Proceedings of the 1st International Conference on Conversational User Interfaces (Dublin, Ireland) (CUI ’19). Association for Computing Machinery, New York, NY, USA, Article 7, 3 pages. https://doi.org/10.1145/3342775.3342803
[99]
Klaus P Schneider. 1988. Small talk: Analyzing phatic discourse. Vol. 1. Hitzeroth, New York, NY, USA.
[100]
Tanja Schultz, Alan W Black, Sameer Badaskar, Matthew Hornyak, and John Kominek. 2007. Spice: Web-based tools for rapid language adaptation in speech processing systems. In Eighth Annual Conference of the International Speech Communication Association. ISCA, BAIXAS, FRANCE.
[101]
Sanni Siltanen and Jouko Hyväkkä. 2006. Implementing a Natural User Interface for Camera Phones Using Visual Tags. In Proceedings of the 7th Australasian User Interface Conference - Volume 50 (Hobart, Australia) (AUIC ’06). Australian Computer Society, Inc., AUS, 113–116.
[102]
Gabriel Skantze. 2007. Error handling in spoken dialogue systems-managing uncertainty, grounding and miscommunication. Gabriel Skantze, Stockholm, Sweden.
[103]
Alessandro Soro, Samuel Aldo Iacolina, Riccardo Scateni, and Selene Uras. 2011. Evaluation of User Gestures in Multi-Touch Interaction: A Case Study in Pair-Programming. In Proceedings of the 13th International Conference on Multimodal Interfaces (Alicante, Spain) (ICMI ’11). Association for Computing Machinery, New York, NY, USA, 161–168. https://doi.org/10.1145/2070481.2070508
[104]
Pei-Hao Su, Milica Gasic, Nikola Mrksic, Lina Rojas-Barahona, Stefan Ultes, David Vandyke, Tsung-Hsien Wen, and Steve Young. 2016. On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems. arXiv:1605.07669
[105]
Bernhard Suhm. 2003. Towards best practices for speech user interface design. In Eighth European Conference on Speech Communication and Technology. ISCA, BAIXAS, FRANCE.
[106]
Deborah Tannen. 1982. Oral and literate strategies in spoken and written narratives. Language 58, 1 (1982), 1–21.
[107]
Alan M Turing. 2009. Computing machinery and intelligence. In Parsing the Turing Test. Springer, New York, NY, USA, 23–65.
[108]
Andries Van Dam. 2001. User interfaces: disappearing, dissolving, and evolving. Commun. ACM 44, 3 (2001), 50–52.
[109]
Laura Pfeifer Vardoulakis, Lazlo Ring, Barbara Barry, Candace L. Sidner, and Timothy Bickmore. 2012. Designing Relational Agents as Long Term Social Companions for Older Adults. In Intelligent Virtual Agents, Yukiko Nakano, Michael Neff, Ana Paiva, and Marilyn Walker (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 289–302.
[110]
Ning Wang, David V Pynadath, Susan G Hill, and Aberdeen Proving Ground. 2015. Building trust in a human-robot team with automatically generated explanations. In Proceedings of the Interservice/Industry Training, Simulation and Education Conference (I/ITSEC), Vol. 15315. National Training and Simulation Association, Orlando, Florida, 1–12.
[111]
Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, 2017. Tacotron: Towards End-to-End Speech Synthesis. arXiv:1703.10135
[112]
Zhuxiaona Wei and James A Landay. 2018. Evaluating speech-based smart devices using new usability heuristics. IEEE Pervasive Computing 17, 2 (2018), 84–96.
[113]
Mark Weiser. 1999. The Computer for the 21st Century. SIGMOBILE Mob. Comput. Commun. Rev. 3, 3 (July 1999), 3–11. https://doi.org/10.1145/329124.329126
[114]
Daniel Wigdor and Dennis Wixon. 2011. Brave NUI World: Designing Natural User Interfaces for Touch and Gesture (1st ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
[115]
Charles H Woolbert. 1922. Speaking and writing—A study of differences. Quarterly Journal of Speech 8, 3 (1922), 271–285.
[116]
Rui Yan. 2018. ”Chitty-Chitty-Chat Bot”: Deep Learning for Conversational AI. In IJCAI, Vol. 18. IJCAI, California, CA, USA, 5520–5526.
[117]
Nicole Yankelovich, Gina-Anne Levow, and Matt Marx. 1995. Designing SpeechActs: Issues in speech user interfaces. In Proceedings of the SIGCHI conference on Human factors in computing systems. Association for Computing Machinery, New York, NY, USA, 369–376.
[118]
Steve Young, Milica Gašić, Blaise Thomson, and Jason D Williams. 2013. Pomdp-based statistical spoken dialog systems: A review. Proc. IEEE 101, 5 (2013), 1160–1179.
[119]
Zhou Yu, Leah Nicolich-Henkin, Alan W Black, and Alexander Rudnicky. 2016. A wizard-of-oz study on a non-task-oriented dialog systems that reacts to user engagement. In Proceedings of the 17th annual meeting of the Special Interest Group on Discourse and Dialogue. Association for Computational Linguistics, Pennsylvania, PA, USA, 55–63.
[120]
J. K. Zao, T. T. Gan, C. K. You, S. J. R. Méndez, C. E. Chung, Y. T. Wang, T. Mullen, and T. P. Jung. 2014. Augmented Brain Computer Interaction Based on Fog Computing and Linked Data. In 2014 International Conference on Intelligent Environments. IEEE, New Jersey, NJ, United States, 374–377. https://doi.org/10.1109/IE.2014.54

Cited By

View all
  • (2024)Hey Building! Novel Interfaces for Parametric Design Manipulations in Virtual RealityProceedings of the ACM on Human-Computer Interaction10.1145/36981408:ISS(330-355)Online publication date: 24-Oct-2024
  • (2024)Grounding with Structure: Exploring Design Variations of Grounded Human-AI Collaboration in a Natural Language InterfaceProceedings of the ACM on Human-Computer Interaction10.1145/36869028:CSCW2(1-27)Online publication date: 8-Nov-2024
  • (2024)Toward a Third-Kind Voice for Conversational Agents in an Era of Blurring Boundaries Between Machine and Human SoundsProceedings of the 6th ACM Conference on Conversational User Interfaces10.1145/3640794.3665880(1-7)Online publication date: 8-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems
May 2021
10862 pages
ISBN:9781450380966
DOI:10.1145/3411764
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 May 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. challenges
  2. characterize
  3. designers
  4. naturalness
  5. practices
  6. voice user interfaces (VUI)

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

CHI '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

Upcoming Conference

CHI 2025
ACM CHI Conference on Human Factors in Computing Systems
April 26 - May 1, 2025
Yokohama , Japan

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)230
  • Downloads (Last 6 weeks)53
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Hey Building! Novel Interfaces for Parametric Design Manipulations in Virtual RealityProceedings of the ACM on Human-Computer Interaction10.1145/36981408:ISS(330-355)Online publication date: 24-Oct-2024
  • (2024)Grounding with Structure: Exploring Design Variations of Grounded Human-AI Collaboration in a Natural Language InterfaceProceedings of the ACM on Human-Computer Interaction10.1145/36869028:CSCW2(1-27)Online publication date: 8-Nov-2024
  • (2024)Toward a Third-Kind Voice for Conversational Agents in an Era of Blurring Boundaries Between Machine and Human SoundsProceedings of the 6th ACM Conference on Conversational User Interfaces10.1145/3640794.3665880(1-7)Online publication date: 8-Jul-2024
  • (2024)Like My Aunt Dorothy: Effects of Conversational Styles on Perceptions, Acceptance and Metaphorical Descriptions of Voice Assistants during Later AdulthoodProceedings of the ACM on Human-Computer Interaction10.1145/36373658:CSCW1(1-21)Online publication date: 26-Apr-2024
  • (2024)Inaudible Backdoor Attack via Stealthy Frequency Trigger Injection in Audio SpectrogramProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3649345(31-45)Online publication date: 29-May-2024
  • (2024)Talking to Multi-Party Conversational Agents in Advisory Services: Command-based vs. Conversational InteractionsProceedings of the ACM on Human-Computer Interaction10.1145/36330728:GROUP(1-25)Online publication date: 21-Feb-2024
  • (2024)Better to Ask Than Assume: Proactive Voice Assistants’ Communication Strategies That Respect User Agency in a Smart Home EnvironmentProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642193(1-17)Online publication date: 11-May-2024
  • (2024)EchoTap: Non-Verbal Sound Interaction with Knock and Tap GesturesInternational Journal of Human–Computer Interaction10.1080/10447318.2024.2348837(1-22)Online publication date: 3-Jun-2024
  • (2023)User Experience of Digital Voice Assistant: Conceptualization and MeasurementACM Transactions on Computer-Human Interaction10.1145/362278231:1(1-35)Online publication date: 29-Nov-2023
  • (2023)Metaphors in Voice User Interfaces: A Slippery FishACM Transactions on Computer-Human Interaction10.1145/360932630:6(1-37)Online publication date: 25-Sep-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media