Skip to main content

Effects of Filled Pauses on Memory Recall in Human-Robot Interaction in Mandarin Chinese

  • Conference paper
  • First Online:
Engineering Psychology and Cognitive Ergonomics (HCII 2022)

Abstract

In recent years, voice-AI systems have seen significant improvements in intelligibility and naturalness, but the human experience when talking to a machine is still remarkably different from the experience of talking to a fellow human. In this paper, we explore one dimension of such differences, i.e., the occurrence of disfluency in machine speech and how it may impact human listeners’ processing and memory of linguistic information. We conducted a human-machine conversation task in Mandarin Chinese using a humanoid social robot (Furhat), with different types of machine speech (pre-recorded natural speech vs. synthesized speech, fluent vs. disfluent). During the task, the human interlocutor was tested in terms of how well they remembered the information presented by the robot. The results showed that disfluent speech (surrounded by “um”/“uh”) did not benefit memory retention both in pre-recorded speech and in synthesized speech. We discuss the implications of current findings and possible directions of future work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ammari, T., Kaye, J., Tsai, J., Bentley, F.: Music, search, and IoT: how people (really) use voice assistants. ACM Trans. Comput.-Hum. Interact. 26 (2019). https://doi.org/10.1145/3311956

  2. Kopp, S., Krämer, N.: Revisiting human-agent communication: the importance of joint co-construction and understanding mental states. Front. Psychol. 12 (2021). https://www.frontiersin.org/articles/10.3389/fpsyg.2021.580955/full

  3. Dingemanse, M.: Between sound and speech: liminal signs in interaction. Res. Lang. Soc. Interact. 53, 188–196 (2020)

    Article  Google Scholar 

  4. Shriberg, E.: Spontaneous speech: how people really talk and why engineers should care. In: Interspeech 2005, pp. 1781–1784 (2005). https://www.isca-speech.org/archive/interspeech_2005/shriberg05_interspeech.html

  5. Mitra, V., et al.: Analysis and tuning of a voice assistant system for dysfluent speech. In: Interspeech 2021, pp. 4848–4852 (2021). https://www.isca-speech.org/archive/interspeech_2021/mitra21_interspeech.html

  6. Wu, J., Ahuja, K., Li, R., Chen, V., Bigham, J.: ScratchThat: supporting command-agnostic speech repair in voice-driven assistants. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 3 (2019). https://doi.org/10.1145/3328934

  7. Sutton, S., Foulkes, P., Kirk, D., Lawson, S.: Voice as a design material: sociophonetic inspired design strategies in human-computer interaction. In: Conference on Human Factors in Computing Systems - Proceedings, pp. 1–14 (2019)

    Google Scholar 

  8. Schmitt, A., Zierau, N., Janson, A., Leimeister, J.: Voice as a contemporary frontier of interaction design. In: European Conference On Information Systems (ECIS) (2021)

    Google Scholar 

  9. Adell, J., Escudero, D., Bonafonte, A.: Production of filled pauses in concatenative speech synthesis based on the underlying fluent sentence. Speech Commun. 54, 459–476 (2012). https://www.sciencedirect.com/science/article/pii/S0167639311001580

  10. Betz, S., Wagner, P., Schlangen, D.: Micro-structure of disfluencies: basics for conversational speech synthesis. In: Interspeech 2015, pp. 2222–2226 (2015). https://www.isca-speech.org/archive/interspeech_2015/betz15_interspeech.html

  11. Dall, R., Tomalin, M., Wester, M.: Synthesising filled pauses: representation and datamixing. In: Proceedings of the 9th ISCA Workshop On Speech Synthesis Workshop (SSW 9), pp. 7–13 (2016)

    Google Scholar 

  12. Betz, S., Carlmeyer, B., Wagner, P., Wrede, B.: Interactive hesitation synthesis: modelling and evaluation. Multimodal Technol. Interact. 2, 9 (2018). http://www.mdpi.com/2414-4088/2/1/9

  13. Carlmeyer, B., Betz, S., Wagner, P., Wrede, B., Schlangen, D.: The hesitating robot - implementation and first impressions. In: Companion of the 2018 ACM/IEEE International Conference on Human-Robot Interaction, pp. 77–78 (2018). https://doi.org/10.1145/3173386.3176992

  14. Székely, É., Henter, G., Beskow, J., Gustafson, J.: How to train your fillers: uh and um in spontaneous speech synthesis. In: 10th ISCA Workshop On Speech Synthesis (SSW 10) (2019)

    Google Scholar 

  15. Zonca, J., Folsø, A., Sciutti, A.: The role of reciprocity in human-robot social influence. IScience 24, 103424 (2021). https://www.sciencedirect.com/science/article/pii/S258900422101395X

  16. Cohn, M., Liang, K., Sarian, M., Zellou, G., Yu, Z.: Speech rate adjustments in conversations with an Amazon Alexa socialbot. Front. Commun. 6, 1–8 (2021)

    Article  Google Scholar 

  17. Zellou, G., Cohn, M., Kline, T.: The influence of conversational role on phonetic alignment toward voice-AI and human interlocutors. Lang. Cogn. Neurosci. 1–15 (2021). https://doi.org/10.1080/23273798.2021.1931372

  18. Wudarczyk, O., et al.: Robots facilitate human language production. Sci. Rep. 11, 16737 (2021)

    Google Scholar 

  19. Niculescu, A., van Dijk, B., Nijholt, A., Li, H., See, S.L.: Making social robots more attractive: the effects of voice pitch, humor and empathy. Int. J. Soc. Robot. 5(2), 171–191 (2012). https://doi.org/10.1007/s12369-012-0171-x

    Article  Google Scholar 

  20. Kühne, K., Fischer, M., Zhou, Y.: The human takes it all: humanlike synthesized voices are perceived as less eerie and more likable. Evidence from a subjective ratings study. Front. Neurorobot. 14 (2020). https://www.frontiersin.org/article/10.3389/fnbot.2020.593732

  21. Arnold, J., Tanenhaus, M., Altmann, R., Fagnano, M.: The old and thee, uh, new: disfluency and reference resolution. Psychol. Sci. 15, 578–582 (2004)

    Article  Google Scholar 

  22. Arnold, J., Tanenhaus, M.: Disfluency Effects in Comprehension: How New Information Can Become Accessible. The Processing and Acquisition of Reference (2011)

    Google Scholar 

  23. MacGregor, L., Corley, M., Donaldson, D.: Listening to the sound of silence: disfluent silent pauses in speech have consequences for listeners. Neuropsychologia 48, 3982–3992 (2010). https://linkinghub.elsevier.com/retrieve/pii/S0028393210004148

  24. Corley, M., MacGregor, L., Donaldson, D.: It’s the way that you, ER, say it: hesitations in speech affect language comprehension. Cognition 105, 658–668 (2007). https://www.sciencedirect.com/science/article/pii/S0010027706002186

  25. Collard, P., Corley, M., MacGregor, L., Donaldson, D.: Attention orienting effects of hesitations in speech: evidence from ERPs. J. Exp. Psychol. Learn. Mem. Cogn. 34, 696–702 (2008). http://doi.apa.org/getdoi.cfm?doi=10.1037/0278-7393.34.3.696

  26. Fraundorf, S., Watson, D.: The disfluent discourse: effects of filled pauses on recall. J. Mem. Lang. 65, 161–175 (2011)

    Article  Google Scholar 

  27. Bosker, H., Tjiong, J., Quené, H., Sanders, T., De Jong, N.: Both native and non-native disfluencies trigger listeners’ attention. In: The 7th Workshop on Disfluency in Spontaneous Speech (DiSS 2015) (2015)

    Google Scholar 

  28. Muhlack, B., et al.: Revisiting recall effects of filler particles in German and English. In: Proceedings of Interspeech 2021 (2021)

    Google Scholar 

  29. Al Moubayed, S., Beskow, J., Skantze, G., Granström, B.: Furhat: a back-projected human-like robot head for multiparty human-machine interaction. In: Esposito, A., Esposito, A.M., Vinciarelli, A., Hoffmann, R., Müller, V.C. (eds.) Cognitive Behavioural Systems. LNCS, vol. 7403, pp. 114–130. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34584-5_9

    Chapter  Google Scholar 

  30. Skantze, G.: Turn-taking in conversational systems and human-robot interaction: a review. Comput. Speech Lang. 67, 101178 (2021)

    Google Scholar 

  31. Cominelli, L., et al.: Promises and trust in human-robot interaction. Sci. Rep. 11, 9687 (2021)

    Google Scholar 

  32. Zhao, Y., Jurafsky, D.: A preliminary study of Mandarin filled pauses (2005)

    Google Scholar 

  33. Bates, D., Mächler, M., Bolker, B., Walker, S.: Fitting linear mixed-effects models using LME4. J. Stat. Softw. 67, 1–48 (2015)

    Article  Google Scholar 

  34. Venables, W., Ripley, B., Venables, W.: Modern Applied Statistics with S. Springer, Cham (2002).OCLC: ocm49312402

    Book  Google Scholar 

  35. Voskuilen, C., Ratcliff, R., Fennell, A., McKoon, G.: Diffusion models of memory and decision making. Learn. Mem. Comprehensive Reference 227–241 (2017). https://linkinghub.elsevier.com/retrieve/pii/B9780128093245210456

  36. Corley, M., Hartsuiker, R.: Why um helps auditory word recognition: the temporal delay hypothesis. Plos One 6, e19792 (2011). https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0019792

  37. Van Engen, K., Peelle, J.: Listening effort and accented speech. Front. Hum. Neurosci. 8 (2014). https://www.frontiersin.org/article/10.3389/fnhum.2014.00577

  38. Carlmeyer, B., Schlangen, D., Wrede, B.: Look at me: self-interruptions as attention booster? In: Proceedings of the Fourth International Conference on Human Agent Interaction, pp. 221–224 (2016). https://dl.acm.org/doi/10.1145/2974804.298048

  39. Carlmeyer, B., Schlangen, D., Wrede, B.: Exploring self-interruptions as a strategy for regaining the attention of distracted users. In: Proceedings of the 1st Workshop on Embodied Interaction with Smart Environments - EISE 2016, pp. 1–6 (2016). http://dl.acm.org/citation.cfm?doid=3008028.3008029

Download references

Acknowledgement

We thank Albert Chau, Sarah Chen, Yitian Hong, and Xiaofu Zhang for their assistance with the experiment.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yao Yao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, X., Liesenfeld, A., Li, S., Yao, Y. (2022). Effects of Filled Pauses on Memory Recall in Human-Robot Interaction in Mandarin Chinese. In: Harris, D., Li, WC. (eds) Engineering Psychology and Cognitive Ergonomics. HCII 2022. Lecture Notes in Computer Science(), vol 13307. Springer, Cham. https://doi.org/10.1007/978-3-031-06086-1_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-06086-1_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-06085-4

  • Online ISBN: 978-3-031-06086-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics