Effects of Filled Pauses on Memory Recall in Human-Robot Interaction in Mandarin Chinese

Chen, Xinyi; Liesenfeld, Andreas; Li, Shiyue; Yao, Yao

doi:10.1007/978-3-031-06086-1_1

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13307))

Included in the following conference series:

International Conference on Human-Computer Interaction

1500 Accesses

Abstract

In recent years, voice-AI systems have seen significant improvements in intelligibility and naturalness, but the human experience when talking to a machine is still remarkably different from the experience of talking to a fellow human. In this paper, we explore one dimension of such differences, i.e., the occurrence of disfluency in machine speech and how it may impact human listeners’ processing and memory of linguistic information. We conducted a human-machine conversation task in Mandarin Chinese using a humanoid social robot (Furhat), with different types of machine speech (pre-recorded natural speech vs. synthesized speech, fluent vs. disfluent). During the task, the human interlocutor was tested in terms of how well they remembered the information presented by the robot. The results showed that disfluent speech (surrounded by “um”/“uh”) did not benefit memory retention both in pre-recorded speech and in synthesized speech. We discuss the implications of current findings and possible directions of future work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ammari, T., Kaye, J., Tsai, J., Bentley, F.: Music, search, and IoT: how people (really) use voice assistants. ACM Trans. Comput.-Hum. Interact. 26 (2019). https://doi.org/10.1145/3311956
Kopp, S., Krämer, N.: Revisiting human-agent communication: the importance of joint co-construction and understanding mental states. Front. Psychol. 12 (2021). https://www.frontiersin.org/articles/10.3389/fpsyg.2021.580955/full
Dingemanse, M.: Between sound and speech: liminal signs in interaction. Res. Lang. Soc. Interact. 53, 188–196 (2020)
Article Google Scholar
Shriberg, E.: Spontaneous speech: how people really talk and why engineers should care. In: Interspeech 2005, pp. 1781–1784 (2005). https://www.isca-speech.org/archive/interspeech_2005/shriberg05_interspeech.html
Mitra, V., et al.: Analysis and tuning of a voice assistant system for dysfluent speech. In: Interspeech 2021, pp. 4848–4852 (2021). https://www.isca-speech.org/archive/interspeech_2021/mitra21_interspeech.html
Wu, J., Ahuja, K., Li, R., Chen, V., Bigham, J.: ScratchThat: supporting command-agnostic speech repair in voice-driven assistants. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 3 (2019). https://doi.org/10.1145/3328934
Sutton, S., Foulkes, P., Kirk, D., Lawson, S.: Voice as a design material: sociophonetic inspired design strategies in human-computer interaction. In: Conference on Human Factors in Computing Systems - Proceedings, pp. 1–14 (2019)
Google Scholar
Schmitt, A., Zierau, N., Janson, A., Leimeister, J.: Voice as a contemporary frontier of interaction design. In: European Conference On Information Systems (ECIS) (2021)
Google Scholar
Adell, J., Escudero, D., Bonafonte, A.: Production of filled pauses in concatenative speech synthesis based on the underlying fluent sentence. Speech Commun. 54, 459–476 (2012). https://www.sciencedirect.com/science/article/pii/S0167639311001580
Betz, S., Wagner, P., Schlangen, D.: Micro-structure of disfluencies: basics for conversational speech synthesis. In: Interspeech 2015, pp. 2222–2226 (2015). https://www.isca-speech.org/archive/interspeech_2015/betz15_interspeech.html
Dall, R., Tomalin, M., Wester, M.: Synthesising filled pauses: representation and datamixing. In: Proceedings of the 9th ISCA Workshop On Speech Synthesis Workshop (SSW 9), pp. 7–13 (2016)
Google Scholar
Betz, S., Carlmeyer, B., Wagner, P., Wrede, B.: Interactive hesitation synthesis: modelling and evaluation. Multimodal Technol. Interact. 2, 9 (2018). http://www.mdpi.com/2414-4088/2/1/9
Carlmeyer, B., Betz, S., Wagner, P., Wrede, B., Schlangen, D.: The hesitating robot - implementation and first impressions. In: Companion of the 2018 ACM/IEEE International Conference on Human-Robot Interaction, pp. 77–78 (2018). https://doi.org/10.1145/3173386.3176992
Székely, É., Henter, G., Beskow, J., Gustafson, J.: How to train your fillers: uh and um in spontaneous speech synthesis. In: 10th ISCA Workshop On Speech Synthesis (SSW 10) (2019)
Google Scholar
Zonca, J., Folsø, A., Sciutti, A.: The role of reciprocity in human-robot social influence. IScience 24, 103424 (2021). https://www.sciencedirect.com/science/article/pii/S258900422101395X
Cohn, M., Liang, K., Sarian, M., Zellou, G., Yu, Z.: Speech rate adjustments in conversations with an Amazon Alexa socialbot. Front. Commun. 6, 1–8 (2021)
Article Google Scholar
Zellou, G., Cohn, M., Kline, T.: The influence of conversational role on phonetic alignment toward voice-AI and human interlocutors. Lang. Cogn. Neurosci. 1–15 (2021). https://doi.org/10.1080/23273798.2021.1931372
Wudarczyk, O., et al.: Robots facilitate human language production. Sci. Rep. 11, 16737 (2021)
Google Scholar
Niculescu, A., van Dijk, B., Nijholt, A., Li, H., See, S.L.: Making social robots more attractive: the effects of voice pitch, humor and empathy. Int. J. Soc. Robot. 5(2), 171–191 (2012). https://doi.org/10.1007/s12369-012-0171-x
Article Google Scholar
Kühne, K., Fischer, M., Zhou, Y.: The human takes it all: humanlike synthesized voices are perceived as less eerie and more likable. Evidence from a subjective ratings study. Front. Neurorobot. 14 (2020). https://www.frontiersin.org/article/10.3389/fnbot.2020.593732
Arnold, J., Tanenhaus, M., Altmann, R., Fagnano, M.: The old and thee, uh, new: disfluency and reference resolution. Psychol. Sci. 15, 578–582 (2004)
Article Google Scholar
Arnold, J., Tanenhaus, M.: Disfluency Effects in Comprehension: How New Information Can Become Accessible. The Processing and Acquisition of Reference (2011)
Google Scholar
MacGregor, L., Corley, M., Donaldson, D.: Listening to the sound of silence: disfluent silent pauses in speech have consequences for listeners. Neuropsychologia 48, 3982–3992 (2010). https://linkinghub.elsevier.com/retrieve/pii/S0028393210004148
Corley, M., MacGregor, L., Donaldson, D.: It’s the way that you, ER, say it: hesitations in speech affect language comprehension. Cognition 105, 658–668 (2007). https://www.sciencedirect.com/science/article/pii/S0010027706002186
Collard, P., Corley, M., MacGregor, L., Donaldson, D.: Attention orienting effects of hesitations in speech: evidence from ERPs. J. Exp. Psychol. Learn. Mem. Cogn. 34, 696–702 (2008). http://doi.apa.org/getdoi.cfm?doi=10.1037/0278-7393.34.3.696
Fraundorf, S., Watson, D.: The disfluent discourse: effects of filled pauses on recall. J. Mem. Lang. 65, 161–175 (2011)
Article Google Scholar
Bosker, H., Tjiong, J., Quené, H., Sanders, T., De Jong, N.: Both native and non-native disfluencies trigger listeners’ attention. In: The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015) (2015)
Google Scholar
Muhlack, B., et al.: Revisiting recall effects of filler particles in German and English. In: Proceedings of Interspeech 2021 (2021)
Google Scholar
Al Moubayed, S., Beskow, J., Skantze, G., Granström, B.: Furhat: a back-projected human-like robot head for multiparty human-machine interaction. In: Esposito, A., Esposito, A.M., Vinciarelli, A., Hoffmann, R., Müller, V.C. (eds.) Cognitive Behavioural Systems. LNCS, vol. 7403, pp. 114–130. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34584-5_9
Chapter Google Scholar
Skantze, G.: Turn-taking in conversational systems and human-robot interaction: a review. Comput. Speech Lang. 67, 101178 (2021)
Google Scholar
Cominelli, L., et al.: Promises and trust in human-robot interaction. Sci. Rep. 11, 9687 (2021)
Google Scholar
Zhao, Y., Jurafsky, D.: A preliminary study of Mandarin filled pauses (2005)
Google Scholar
Bates, D., Mächler, M., Bolker, B., Walker, S.: Fitting linear mixed-effects models using LME4. J. Stat. Softw. 67, 1–48 (2015)
Article Google Scholar
Venables, W., Ripley, B., Venables, W.: Modern Applied Statistics with S. Springer, Cham (2002).OCLC: ocm49312402
Book Google Scholar
Voskuilen, C., Ratcliff, R., Fennell, A., McKoon, G.: Diffusion models of memory and decision making. Learn. Mem. Comprehensive Reference 227–241 (2017). https://linkinghub.elsevier.com/retrieve/pii/B9780128093245210456
Corley, M., Hartsuiker, R.: Why um helps auditory word recognition: the temporal delay hypothesis. Plos One 6, e19792 (2011). https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0019792
Van Engen, K., Peelle, J.: Listening effort and accented speech. Front. Hum. Neurosci. 8 (2014). https://www.frontiersin.org/article/10.3389/fnhum.2014.00577
Carlmeyer, B., Schlangen, D., Wrede, B.: Look at me: self-interruptions as attention booster? In: Proceedings of the Fourth International Conference on Human Agent Interaction, pp. 221–224 (2016). https://dl.acm.org/doi/10.1145/2974804.298048
Carlmeyer, B., Schlangen, D., Wrede, B.: Exploring self-interruptions as a strategy for regaining the attention of distracted users. In: Proceedings of the 1st Workshop on Embodied Interaction with Smart Environments - EISE 2016, pp. 1–6 (2016). http://dl.acm.org/citation.cfm?doid=3008028.3008029

Download references

Acknowledgement

We thank Albert Chau, Sarah Chen, Yitian Hong, and Xiaofu Zhang for their assistance with the experiment.

Author information

Authors and Affiliations

The Hong Kong Polytechnic University, Hung Hom, Hong Kong
Xinyi Chen, Shiyue Li & Yao Yao
Radboud University, Nijmegen, Netherlands
Andreas Liesenfeld

Authors

Xinyi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Liesenfeld
View author publications
You can also search for this author in PubMed Google Scholar
Shiyue Li
View author publications
You can also search for this author in PubMed Google Scholar
Yao Yao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yao Yao .

Editor information

Editors and Affiliations

Coventry University, Coventry, UK
Don Harris
Cranfield University, Cranfield, UK
Wen-Chin Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, X., Liesenfeld, A., Li, S., Yao, Y. (2022). Effects of Filled Pauses on Memory Recall in Human-Robot Interaction in Mandarin Chinese. In: Harris, D., Li, WC. (eds) Engineering Psychology and Cognitive Ergonomics. HCII 2022. Lecture Notes in Computer Science(), vol 13307. Springer, Cham. https://doi.org/10.1007/978-3-031-06086-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-06086-1_1
Published: 16 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06085-4
Online ISBN: 978-3-031-06086-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Effects of Filled Pauses on Memory Recall in Human-Robot Interaction in Mandarin Chinese