Skip to main content
Log in

A computational model for the emergence of turn-taking behaviors in user-agent interactions

  • Original Paper
  • Published:
Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Abstract

We propose a computational model that endows conversational agents with the capability to coordinate their speaking turns (turn-taking management) in the context of mixed-initiative two-party dialogs. In human conversations, participants are continuously adjusting their verbal and non-verbal productions for ensuring the effective coordination of speaking turns. In our model, the decision making is a continuous process based on the intrinsic current goal of the agent with respect to turn-taking, namely its motivation to keep-or to leave-its current role (speaker or listener), and on its perception of the intentions of its partner. Concurrently, the agent is also producing signals indicating its willingness to maintain or leave its current role. Our model is based on two models from cognitive psychology: the drift-diffusion model and the theory of behavioral dynamics. After presenting simulations showing how our model makes the coordination emerge from the interactions, we propose a SAIBA-Compliant architecture, named BeAware, created to support the implementation of our model. Finally, using our model, we investigate how an agent’s turn-taking strategy may impact the user’s experience and the effectiveness of the coordination.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25

Similar content being viewed by others

References

  1. Al Moubayed S, Lehman J (2015) Regulating turn-taking in multi-child spoken Interaction. In: Brinkman WP, Broekens J, Heylen D (eds) Intelligent virtual agents. Springer, Berlin, pp 363–374

    Chapter  Google Scholar 

  2. Bailly G, Gouvernayre C (2012) Pauses and respiratory markers of the structure of book reading. In: 13th Annual conference of the international speech communication association (InterSpeech 2012), Portland

  3. Balentine BE, Ayer CM, Miller CL, Scott BL (1997) Debouncing the speech button: a sliding capture window device for synchronizing turn-taking. Int J Speech Technol 2(1):7–19

    Article  Google Scholar 

  4. Baumann T, Schlangen D (2012) INPRO_iSS: a component for just-in-time incremental speech synthesis. In: Proceedings of the ACL 2012 system demonstrations, association for computational linguistics, Stroudsburg, pp 103–108

  5. Bevacqua E, Pammi S, Hyniewska SJ, Schröder M, Pelachaud C (2010) Multimodal backchannels for embodied conversational agents. In: Proceedings intelligent virtual agents 2010 conference, Philadelphia, pp 194–200

  6. Bevacqua E, Stanković I, Maatallaoui A, Nédélec A, De Loor P (2014) Effects of coupling in human-virtual agent body interaction. In: Proceeedings of intelligent virtual agents 2014 conference, pp 54–63

  7. Beňuš v, Gravano A, Hirschberg J (2011) Pragmatic aspects of temporal accommodation in turn-taking. J Pragmat 43(12):3001–3027

    Article  Google Scholar 

  8. Bogacz R, Brown E, Moehlis J, Holmes P, Cohen J (2006) The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks. Psychol Rev 113(4):700–765

    Article  Google Scholar 

  9. Bohus D, Horvitz E (2010) Facilitating multiparty dialog with gaze, gesture, and speech. In: International conference on multimodal interfaces and the workshop on machine learning for multimodal interaction, ICMI-MLMI ’10. ACM, New York, pp 1–8

  10. Bohus D, Horvitz E (2011) Decisions about turns in multiparty conversation: from perception to action. In: Proceedings of the 13th international conference on multimodal interfaces, pp 153–160

  11. Bunt H (2006) Dimensions in dialogue act annotation. Proc LREC 6:919–924

    Google Scholar 

  12. Buschmeier H, Kopp S (2014) When to elicit feedback in dialogue: towards a model based on the information needs of speakers. In: Proceedings of the 14th international conference on intelligent virtual agents

  13. Cafaro A, Glas N, Pelachaud C (2016) The effects of interrupting behavior on interpersonal attitude and engagement in dyadic interactions. In: Proceedings of the 2016 international conference on autonomous agents and multiagent systems, international foundation for autonomous agents and multiagent systems, pp 911–920

  14. Cassell J, Bickmore T, Billinghurst M, Campbell L, Chang K, Vilhjlmsson H, Yan H (1999) Embodiment in conversational interfaces. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 520–527

  15. Clancy B, McCarthy M (2015) Co-constructed turn-taking. Corpus pragmatics. Cambridge University Press, Cambridge, pp 430–453

    Chapter  Google Scholar 

  16. Clark HH (1996) Using language. Cambridge University Press, Cambridge

    Book  Google Scholar 

  17. Clavel C, Cafaro A, Campano S, Pelachaud C (2016) Fostering user engagement in face-to-face human–agent interactions: a survey. In: Esposito A, Jain LC (eds) Toward robotic socially believable behaving systems-volume II, vol 106. Springer, Berlin, pp 93–120

    Chapter  Google Scholar 

  18. Cutler A, Pearson M (1985) On the analysis of prosodic turn-taking cues. In: Johns-Lewis C (ed) Intonation in discourse. Croom Helm, London, pp 139–155

    Google Scholar 

  19. De Ruiter JP, Mitterer H, Enfield NJ (2006) Projecting the end of a speaker’s turn: a cognitive cornerstone of conversation. Language 82(3):515–535

    Article  Google Scholar 

  20. De Vault D, Sagae K, Traum D (2011) Incremental interpretation and prediction of utterance meaning for interactive dialogue. Dialogue Discourse 2(1):143–170

    Article  Google Scholar 

  21. De Vault D, Mell J, Gratch J (2015) Toward natural turn-taking in a virtual human negotiation agent. In: AAAI Spring symposium on turn-taking and coordination in human–machine interaction, Stanford

  22. Duncan S (1972) Some signals and rules for taking speaking turns in conversations. J Personal Soc Psychol 23(2):283–292

    Article  Google Scholar 

  23. Eyben F, Weninger F, Gross F, Schuller B (2013) Recent Developments in openSMILE, the munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM international conference on multimedia, pp 835–838

  24. Ferrer L, Shriberg E, Stolcke A (2002) Is the speaker done yet? Faster and more accurate end-of-utterance detection using prosody. In: Interspeech

  25. Ford C, Thompson S (1996) Interactional units in conversation: syntactic, intonational, and pragmatic resources for the management of turns. In: Ochs E, Schegloff E, Thompson S (eds) Interaction and grammar. Studies in interactional sociolinguistics, Cambridge University Pres, Cambridge, pp 134–184

    Chapter  Google Scholar 

  26. Fowler CA, Richardson MJ, Marsh KL, Shockley KD (2008) Language use, coordination, and the emergence of cooperative action. In: Fuchs A, Jirsa VK (eds) Coordination: neural, behavioral and social dynamics. Springer, Berlin, pp 261–279

    Chapter  Google Scholar 

  27. French P, Local J (1983) Turn-competitive incomings. J Pragmat 7(1):17–38

    Article  Google Scholar 

  28. Goldberg JA (1990) Interrupting the discourse on interruptions: an analysis in terms of relationally neutral, power- and rapport-oriented acts. J Pragmat 14(6):883–903

    Article  Google Scholar 

  29. Gravano A, Hirschberg J (2011) Turn-taking cues in task-oriented dialogue. Comput Speech Lang 25(3):601–634

    Article  Google Scholar 

  30. Haken H, JaS Kelso, Bunz H (1985) A theoretical model of phase transitions in human hand movements. Biol Cybern 51(5):347–356

    Article  MathSciNet  MATH  Google Scholar 

  31. Heldner M, Edlund J (2010) Pauses, gaps and overlaps in conversations. J Phon 38(4):555–568

    Article  Google Scholar 

  32. Hjalmarsson A (2011) The additive effect of turn-taking cues in human and synthetic voice. Speech Commun 53(1):23–35

    Article  Google Scholar 

  33. Huang L, Morency LP, Gratch J (2011) A multimodal end-of-turn prediction model: learning from parasocial consensus sampling. In: The 10th international conference on autonomous agents and multiagent systems-vol 3, AAMAS’11, Richland, pp 1289–1290

  34. Jégou M, Lefebvre L, Chevaillier P (2015) A continuous model for the management of turn-taking in user-agent spoken interactions based on the variations of prosodic signals. In: Proceedings intelligent virtual agents 2015 conference, lecture notes in computer science, vol 9238. Springer, Berlin, pp 389–398

  35. Jonsdottir GR, Thórisson KR (2013) A distributed architecture for real-time dialogue and on-task learning of efficient co-operative turn-taking. In: Campbell N (ed) Coverbal synchrony in human–machine interaction. CRC Press, Boca Raton, pp 293–323

    Chapter  Google Scholar 

  36. Kelso JAS (2013) Coordination dynamics. In: Meyers R (ed) Encyclopedia of complexity and systems science. Springer, New York

    Google Scholar 

  37. Kendon A (1967) Some functions of gaze-direction in social interaction. Acta Psychol 26:22–63

    Article  Google Scholar 

  38. de Kok I, Heylen D (2009) Multimodal end-of-turn prediction in multi-party meetings. In: Proceedings of the 2009 international conference on multimodal interfaces, ICMI-MLMI ’09. ACM, New York, pp 91–98

  39. Kopp S, Buschmeier H (2014) A dynamic minimal model of the listener for feedback-based dialogue coordination. In: Proceedings of the 18th workshop on the semantics and pragmatics of dialogue, Edinburgh, pp 17–25

  40. Kopp S, van Welbergen H, Yaghoubzadeh R, Buschmeier H (2014) An architecture for fluid real-time conversational agents: integrating incremental output generation and input processing. J Multimodal User Interfaces 8(1):97–108

    Google Scholar 

  41. Kronlid F (2006) Turn taking for artificial conversational agents. In: Klusch M, Rovatsos M, Payne TR (eds) Cooperative information agents X. Springer, Berlin, pp 81–95

    Chapter  Google Scholar 

  42. Kurtić E, Brown GJ, Wells B (2013) Resources for turn competition in overlapping talk. Speech Commun 55(5):721–743

    Article  Google Scholar 

  43. Leßmann N, Kranstedt A, Wachsmuth I (2004) Towards a cognitively motivated processing of turn-taking signals for the embodied conversational agent max. In: Proceedings of the workshop embodied conversational agents: balanced perception and action. ACM Press, New-York, 19–23 August, p–65

  44. Levitan R, Beňuš S, Gravano A, Hirschberg J (2015) Entrainment and turn-taking in human-human dialogue. In: 2015 AAAI spring symposium series

  45. ter Maat M, Heylen D (2009) Turn management or impression management? In: Proceedings intelligent virtual agents 2009 conference. Springer, Berlin, pp 467–473

  46. Magyari L, de Ruiter JP (2012) Prediction of turn-ends based on anticipation of upcoming words. Front Psychol 3:376

    Article  Google Scholar 

  47. McFarland DH (2001) Respiratory markers of conversational interaction. J Speech Lang Hear Res 44:128–143

    Article  Google Scholar 

  48. Mondada L (2007) Multimodal resources for turn-taking: pointing and the emergence of possible next speakers. Discourse Stud 9(2):194–225

    Article  Google Scholar 

  49. Mutlu B, Forlizzi J, Hodgins J (2006) A storytelling robot: modeling and evaluation of human-like gaze behavior. In: 6th IEEE-RAS international conference on humanoid robots, pp 518–523

  50. Novick D, Hansen B, Ward K (1996) Coordinating turn-taking with gaze. In: Proceedings of the fourth international conference on spoken language, ICSLP 96, vol 3, pp 1888–1891

  51. OConnell DC, Kowal S (2008) Turn-taking. In: Communicating with one another, cognition and language: a series in psycholinguistics. Springer, New York, pp 1–13

  52. O’Connell DC, Kowal S, Kaltenbacher E (1990) Turn-taking: a critical analysis of the research tradition. J Psycholinguist Res 19(6):345–373

    Article  Google Scholar 

  53. Oertel C, Wlodarczak M, Edlund J, Wagner P, Gustafson J (2013) Gaze patterns in turn-taking. In: 13th annual conference of the international speech communication association (Interspeech 2012)

  54. Padilha E, Carletta J (2002) A simulation of small group discussion. In: Proceedings of EDILOG, pp 117–124

  55. Paek T, Horvitz E, Ringger EK (2000) Continu-ous listening for unconstrained spoken dialog. In: Proceedings interspeech 2000, pp 138–141

  56. Ratcliff R (1978) A theory of memory retrieval. Psychol Rev 85(2):59–108

    Article  Google Scholar 

  57. Ratcliff R (1980) A note on modeling accumulation of information when the rate of accumulation changes over time. J Math Psychol 21(2):178–184

    Article  MATH  Google Scholar 

  58. Raux A, Eskenazi M (2012) Optimizing the turn-taking behavior of task-oriented spoken dialog systems. ACM Trans Speech Lang Process 9(1):1–23

    Article  Google Scholar 

  59. Ravenet B, Cafaro A, Biancardi B, Ochs M, Pelachaud C (2015) Conversational behavior reflecting interpersonal attitudes in small group interactions. In: Proceedings of intelligent virtual agents 2015 conference, vol 9238. Springer, Berlin, p 375

  60. Reidsma D, de Kok I, Neiberg D, Pammi SC, van Straalen B, Truong K, van Welbergen H (2011) Continuous interaction with a virtual human. J Multimodal User Interfaces 4(2):97–118

    Article  Google Scholar 

  61. Riest C, Jorschick AB, de Ruiter JP (2015) Anticipation in turn-taking: mechanisms and information sources. Lang Sci 6:89

    Google Scholar 

  62. Rio KW, Rhea CK, Warren WH (2014) Follow the leader: visual control of speed in pedestrian following. J Vis 14(2):4

    Article  Google Scholar 

  63. Sacks H, Schegloff EA, Jefferson G (1974) A simplest systematics for the organization of turn-taking for conversation. Language 50(4):696–735

    Article  Google Scholar 

  64. Schegloff EA (2000) Overlapping talk and the organization of turn-taking for conversation. Lang soc 29(01):1–63

    Article  Google Scholar 

  65. Schlangen D (2006) From reaction to prediction: experiments with computational models of turn-taking. In: Proceedings of interspeech 2006, panel on prosody of dialogue acts and turn-taking

  66. Selfridge E, Arizmendi I, Heeman P, Williams J (2013) Continuously predicting and processing barge-in during a live spoken dialogue task. In: Proceedings of the SIGDIAL 2013 conference, pp 384–393

  67. Selfridge EO, Heeman PA (2009) A bidding approach to turn-taking. In: 1st International workshop on spoken dialogue systems

  68. Skantze G, Hjalmarsson A (2010) Towards incremental speech generation in dialogue systems. In: Proceedings of SIGDIAL 2010, pp 1–8

  69. Skantze G, Hjalmarsson A, Oertel C (2014) Turn-taking, feedback and joint attention in situated human–robot interaction. Speech Commun 65:50–66. https://doi.org/10.1016/j.specom.2014.05.005

    Article  Google Scholar 

  70. Stivers T, Enfield NJ, Brown P, Englert C, Hayashi M, Heinemann T, Hoymann G, Rossano F, Ruiter JPd, Yoon KE, Levinson SC (2009) Universals and cultural variation in turn-taking in conversation. Proc Natl Acad Sci 106(26):10587–10592

    Article  Google Scholar 

  71. Ter Maat M, Truong KP, Heylen D (2010) How turn-taking strategies influence users impressions of an agent. In: Proceedings of intelligent virtual agents 2010 conference, pp 441–453

  72. Thórisson KR (1999) A mind model for multimodal communicative creatures and humanoids. Int J Appl Artif Intell 13(4):449–486

    Article  Google Scholar 

  73. Thórisson KR (2002) Natural turn-taking needs no manual: computational theory and model, from perception to action. In: Granström B, House D, Karlsson I (eds) Multimodality in language and speech systems. Text, speech and language technology, vol 19. Springer, Dordrecht

  74. Thórisson KR, Gislason O, Jonsdottir GR, Thórisson HT (2010) A multiparty multimodal architecture for realtime turntaking. In: Proceedings of intelligent virtual agents 2010 conference. Springer, Berlin, pp 350–356

  75. Torreira F, Bögels S, Levinson SC (2015) Breathing for answering: the time course of response planning in conversation. Front Psychol 6:284

    Article  Google Scholar 

  76. Ward NG, Rivera AG, Ward K, Novick DG (2005) Root causes of lost time and user stress in a simple dialog system. In: Proceedings of interspeech 2005 conference

  77. Warren WH (2006) The dynamics of perception and action. Psychol Rev 113(2):358–389

    Article  MathSciNet  Google Scholar 

  78. Wilson M, Wilson TP (2005) An oscillator model of the timing of turn-taking. Psychon Bull Rev 12(6):957–968

    Article  Google Scholar 

  79. Wilson TP, Zimmerman DH (1986) The structure of silence between turns in two party conversation. Discourse Process 9(4):375–390

    Article  Google Scholar 

  80. Witt S (2014) Modeling user response timings in spoken dialog systems. Int J Speech Technol 18(2):231–243

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mathieu Jégou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jégou, M., Chevaillier, P. A computational model for the emergence of turn-taking behaviors in user-agent interactions. J Multimodal User Interfaces 12, 199–223 (2018). https://doi.org/10.1007/s12193-018-0265-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12193-018-0265-3

Keywords

Navigation