Abstract
In recent years, voice-AI systems have seen significant improvements in intelligibility and naturalness, but the human experience when talking to a machine is still remarkably different from the experience of talking to a fellow human. In this paper, we explore one dimension of such differences, i.e., the occurrence of disfluency in machine speech and how it may impact human listeners’ processing and memory of linguistic information. We conducted a human-machine conversation task in Mandarin Chinese using a humanoid social robot (Furhat), with different types of machine speech (pre-recorded natural speech vs. synthesized speech, fluent vs. disfluent). During the task, the human interlocutor was tested in terms of how well they remembered the information presented by the robot. The results showed that disfluent speech (surrounded by “um”/“uh”) did not benefit memory retention both in pre-recorded speech and in synthesized speech. We discuss the implications of current findings and possible directions of future work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ammari, T., Kaye, J., Tsai, J., Bentley, F.: Music, search, and IoT: how people (really) use voice assistants. ACM Trans. Comput.-Hum. Interact. 26 (2019). https://doi.org/10.1145/3311956
Kopp, S., Krämer, N.: Revisiting human-agent communication: the importance of joint co-construction and understanding mental states. Front. Psychol. 12 (2021). https://www.frontiersin.org/articles/10.3389/fpsyg.2021.580955/full
Dingemanse, M.: Between sound and speech: liminal signs in interaction. Res. Lang. Soc. Interact. 53, 188–196 (2020)
Shriberg, E.: Spontaneous speech: how people really talk and why engineers should care. In: Interspeech 2005, pp. 1781–1784 (2005). https://www.isca-speech.org/archive/interspeech_2005/shriberg05_interspeech.html
Mitra, V., et al.: Analysis and tuning of a voice assistant system for dysfluent speech. In: Interspeech 2021, pp. 4848–4852 (2021). https://www.isca-speech.org/archive/interspeech_2021/mitra21_interspeech.html
Wu, J., Ahuja, K., Li, R., Chen, V., Bigham, J.: ScratchThat: supporting command-agnostic speech repair in voice-driven assistants. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 3 (2019). https://doi.org/10.1145/3328934
Sutton, S., Foulkes, P., Kirk, D., Lawson, S.: Voice as a design material: sociophonetic inspired design strategies in human-computer interaction. In: Conference on Human Factors in Computing Systems - Proceedings, pp. 1–14 (2019)
Schmitt, A., Zierau, N., Janson, A., Leimeister, J.: Voice as a contemporary frontier of interaction design. In: European Conference On Information Systems (ECIS) (2021)
Adell, J., Escudero, D., Bonafonte, A.: Production of filled pauses in concatenative speech synthesis based on the underlying fluent sentence. Speech Commun. 54, 459–476 (2012). https://www.sciencedirect.com/science/article/pii/S0167639311001580
Betz, S., Wagner, P., Schlangen, D.: Micro-structure of disfluencies: basics for conversational speech synthesis. In: Interspeech 2015, pp. 2222–2226 (2015). https://www.isca-speech.org/archive/interspeech_2015/betz15_interspeech.html
Dall, R., Tomalin, M., Wester, M.: Synthesising filled pauses: representation and datamixing. In: Proceedings of the 9th ISCA Workshop On Speech Synthesis Workshop (SSW 9), pp. 7–13 (2016)
Betz, S., Carlmeyer, B., Wagner, P., Wrede, B.: Interactive hesitation synthesis: modelling and evaluation. Multimodal Technol. Interact. 2, 9 (2018). http://www.mdpi.com/2414-4088/2/1/9
Carlmeyer, B., Betz, S., Wagner, P., Wrede, B., Schlangen, D.: The hesitating robot - implementation and first impressions. In: Companion of the 2018 ACM/IEEE International Conference on Human-Robot Interaction, pp. 77–78 (2018). https://doi.org/10.1145/3173386.3176992
Székely, É., Henter, G., Beskow, J., Gustafson, J.: How to train your fillers: uh and um in spontaneous speech synthesis. In: 10th ISCA Workshop On Speech Synthesis (SSW 10) (2019)
Zonca, J., Folsø, A., Sciutti, A.: The role of reciprocity in human-robot social influence. IScience 24, 103424 (2021). https://www.sciencedirect.com/science/article/pii/S258900422101395X
Cohn, M., Liang, K., Sarian, M., Zellou, G., Yu, Z.: Speech rate adjustments in conversations with an Amazon Alexa socialbot. Front. Commun. 6, 1–8 (2021)
Zellou, G., Cohn, M., Kline, T.: The influence of conversational role on phonetic alignment toward voice-AI and human interlocutors. Lang. Cogn. Neurosci. 1–15 (2021). https://doi.org/10.1080/23273798.2021.1931372
Wudarczyk, O., et al.: Robots facilitate human language production. Sci. Rep. 11, 16737 (2021)
Niculescu, A., van Dijk, B., Nijholt, A., Li, H., See, S.L.: Making social robots more attractive: the effects of voice pitch, humor and empathy. Int. J. Soc. Robot. 5(2), 171–191 (2012). https://doi.org/10.1007/s12369-012-0171-x
Kühne, K., Fischer, M., Zhou, Y.: The human takes it all: humanlike synthesized voices are perceived as less eerie and more likable. Evidence from a subjective ratings study. Front. Neurorobot. 14 (2020). https://www.frontiersin.org/article/10.3389/fnbot.2020.593732
Arnold, J., Tanenhaus, M., Altmann, R., Fagnano, M.: The old and thee, uh, new: disfluency and reference resolution. Psychol. Sci. 15, 578–582 (2004)
Arnold, J., Tanenhaus, M.: Disfluency Effects in Comprehension: How New Information Can Become Accessible. The Processing and Acquisition of Reference (2011)
MacGregor, L., Corley, M., Donaldson, D.: Listening to the sound of silence: disfluent silent pauses in speech have consequences for listeners. Neuropsychologia 48, 3982–3992 (2010). https://linkinghub.elsevier.com/retrieve/pii/S0028393210004148
Corley, M., MacGregor, L., Donaldson, D.: It’s the way that you, ER, say it: hesitations in speech affect language comprehension. Cognition 105, 658–668 (2007). https://www.sciencedirect.com/science/article/pii/S0010027706002186
Collard, P., Corley, M., MacGregor, L., Donaldson, D.: Attention orienting effects of hesitations in speech: evidence from ERPs. J. Exp. Psychol. Learn. Mem. Cogn. 34, 696–702 (2008). http://doi.apa.org/getdoi.cfm?doi=10.1037/0278-7393.34.3.696
Fraundorf, S., Watson, D.: The disfluent discourse: effects of filled pauses on recall. J. Mem. Lang. 65, 161–175 (2011)
Bosker, H., Tjiong, J., Quené, H., Sanders, T., De Jong, N.: Both native and non-native disfluencies trigger listeners’ attention. In: The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015) (2015)
Muhlack, B., et al.: Revisiting recall effects of filler particles in German and English. In: Proceedings of Interspeech 2021 (2021)
Al Moubayed, S., Beskow, J., Skantze, G., Granström, B.: Furhat: a back-projected human-like robot head for multiparty human-machine interaction. In: Esposito, A., Esposito, A.M., Vinciarelli, A., Hoffmann, R., Müller, V.C. (eds.) Cognitive Behavioural Systems. LNCS, vol. 7403, pp. 114–130. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34584-5_9
Skantze, G.: Turn-taking in conversational systems and human-robot interaction: a review. Comput. Speech Lang. 67, 101178 (2021)
Cominelli, L., et al.: Promises and trust in human-robot interaction. Sci. Rep. 11, 9687 (2021)
Zhao, Y., Jurafsky, D.: A preliminary study of Mandarin filled pauses (2005)
Bates, D., Mächler, M., Bolker, B., Walker, S.: Fitting linear mixed-effects models using LME4. J. Stat. Softw. 67, 1–48 (2015)
Venables, W., Ripley, B., Venables, W.: Modern Applied Statistics with S. Springer, Cham (2002).OCLC: ocm49312402
Voskuilen, C., Ratcliff, R., Fennell, A., McKoon, G.: Diffusion models of memory and decision making. Learn. Mem. Comprehensive Reference 227–241 (2017). https://linkinghub.elsevier.com/retrieve/pii/B9780128093245210456
Corley, M., Hartsuiker, R.: Why um helps auditory word recognition: the temporal delay hypothesis. Plos One 6, e19792 (2011). https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0019792
Van Engen, K., Peelle, J.: Listening effort and accented speech. Front. Hum. Neurosci. 8 (2014). https://www.frontiersin.org/article/10.3389/fnhum.2014.00577
Carlmeyer, B., Schlangen, D., Wrede, B.: Look at me: self-interruptions as attention booster? In: Proceedings of the Fourth International Conference on Human Agent Interaction, pp. 221–224 (2016). https://dl.acm.org/doi/10.1145/2974804.298048
Carlmeyer, B., Schlangen, D., Wrede, B.: Exploring self-interruptions as a strategy for regaining the attention of distracted users. In: Proceedings of the 1st Workshop on Embodied Interaction with Smart Environments - EISE 2016, pp. 1–6 (2016). http://dl.acm.org/citation.cfm?doid=3008028.3008029
Acknowledgement
We thank Albert Chau, Sarah Chen, Yitian Hong, and Xiaofu Zhang for their assistance with the experiment.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, X., Liesenfeld, A., Li, S., Yao, Y. (2022). Effects of Filled Pauses on Memory Recall in Human-Robot Interaction in Mandarin Chinese. In: Harris, D., Li, WC. (eds) Engineering Psychology and Cognitive Ergonomics. HCII 2022. Lecture Notes in Computer Science(), vol 13307. Springer, Cham. https://doi.org/10.1007/978-3-031-06086-1_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-06086-1_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06085-4
Online ISBN: 978-3-031-06086-1
eBook Packages: Computer ScienceComputer Science (R0)