Abstract
Turn-taking is a preverbal skill whose mastering constitutes an important precondition for many social interactions and joint actions. However, the cognitive mechanisms supporting turn-taking abilities are still poorly understood. Here, we propose a computational analysis of turn-taking in terms of two general mechanisms supporting joint actions: action prediction (e.g., recognizing the interlocutor’s message and predicting the end of turn) and signaling (e.g., modifying one’s own speech to make it more predictable and discriminable). We test the hypothesis that in a simulated conversational scenario dyads using these two mechanisms can recognize the utterances of their co-actors faster, which in turn permits them to give and take turns more efficiently. Furthermore, we discuss how turn-taking dynamics depend on the fact that agents cannot simultaneously use their internal models for both action (or messages) prediction and production, as these have different requirements—or, in other words, they cannot speak and listen at the same time with the same level of accuracy. Our results provide a computational-level characterization of turn-taking in terms of cognitive mechanisms of action prediction and signaling that are shared across various interaction and joint action domains.
Similar content being viewed by others
References
Bell A, Jurafsky D, Fosler-Lussier E, Girand C, Gregory M, Gildea D (2003) Effects of disfluencies, predictability, and utterance position on word form variation in english conversation. The Journal of the Acoustical Society of America 113(2):1001–1024
Berkes P, Orban G, Lengyel M, Fiser J (2011) Spontaneous cortical activity reveals hallmarks of an optimal internal model of the environment. Science 331(6013):83–87. doi:10.1126/science.1195870
Bishop CM (2006) Pattern Recognition and Machine Learning. Springer
Brand RJ, Baldwin DA, Ashburn LA (2002) Evidence for motionese: modifications in mothers’ infant-directed action. Developmental Science 5(1):72–83. doi:10.1111/1467-7687.00211
Brand RJ, Shallcross WL (2008) Infants prefer motionese to adult-directed action. Dev Sci 11(6):853–861. doi:10.1111/j.1467-7687.2008.00734.x
Brown PM, Dell GS (1987) Adapting production to comprehension: The explicit mention of instruments. Cognitive Psychology 19(4):441–472
Buesing L, Bill J, Nessler B, Maass W (2011) Neural dynamics as sampling: a model for stochastic computation in recurrent networks of spiking neurons. PLoS Comput Biol 7(11):e1002,211
Butz MV (2016) Toward a Unified Sub-symbolic Computational Theory of Cognition. Front Psychol 7:925. doi:10.3389/fpsyg.2016.00925
Candidi M, Curioni A, Donnarumma F, Sacheli LM, Pezzulo G (2015) Interactional leader-follower sensorimotor communication strategies during repetitive joint actions. J Royal Soc Interface 12(110). doi:10.1098/rsif.2015.0644
Casillas M (2014) Turn-taking. In: Pragmatic development in first language acquisition, pp. 53–70. Benjamins
Chatzis SP, Demiris Y (2011) Echo state gaussian process. IEEE Trans Neural Netw 22(9):1435–1445. doi:10.1109/TNN.2011.2162109
Cisek P, Kalaska JF (2005) Neural correlates of reaching decisions in dorsal premotor cortex: specification of multiple direction choices and final selection of action. Neuron 45(5):801–814
Cisek P, Kalaska JF (2010) Neural mechanisms for interacting with a world full of action choices. Annu Rev Neurosci 33:269–298. doi:10.1146/annurev.neuro.051508.135409
Clark HH (1996) Using Language. Cambridge University Press
Clark HH, Murphy GL (1982) Audience Design in Meaning and Reference. In: J.F. LeNy, W. Kintsch (eds.) Language and Comprehension, Advances in Psychology, vol. 9, pp. 287–299. North-Holland, Amsterdam (1982). doi:10.1016/s0166-4115(09)60059-5
Csibra G, Gergely G (2009) Natural pedagogy. Trends Cogn Sci 13(4):148–153. doi:10.1016/j.tics.2009.01.005
D’Ausilio A, Badino L, Li Y, Tokay S, Craighero L, Canto R, Aloimonos Y, Fadiga L (2012) Leadership in orchestra emerges from the causal relationships of movement kinematics. PLoS One 7(5):e35,757. doi:10.1371/journal.pone.0035757
De Ruiter JP, Mitterer H, Enfield NJ (2006) Projecting the end of a speaker’s turn: A cognitive cornerstone of conversation. Language 82(3):515–535
Demiris Y, Khadhouri B (2005) Hierarchical attentive multiple models for execution and recognition (hammer). Robotics and Autonomous Systems Journal 54:361–369
Dindo H, Donnarumma F, Chersi F, Pezzulo G (2015) The intentional stance as structure learning: a computational perspective on mindreading. Biol Cybern 109(4–5):453–467. doi:10.1007/s00422-015-0654-6
Dindo H, Zambuto D, Pezzulo G (2011) Motor simulation via coupled internal models using sequential monte carlo. Proceedings of IJCAI 2011:2113–2119
Donnarumma F, Costantini M, Ambrosini E, Friston K, Pezzulo G (2017) Action perception as hypothesis testing. Cortex 89:45–60. doi:10.1016/j.cortex.2017.01.016
Donnarumma F, Dindo H, Pezzulo G (2017) Sensorimotor coarticulation in the execution and recognition of intentional actions. Front Psychol 8:237. doi:10.3389/fpsyg.2017.00237
Donnarumma F, Maisto D, Pezzulo G (2016) Problem solving as probabilistic inference with subgoaling: Explaining human successes and pitfalls in the tower of hanoi. PLoS Comput Biol 12(4):e1004,864. doi:10.1371/journal.pcbi.1004864
Donnarumma F, Prevete R, Chersi F, Pezzulo G (2015) A Programmer-Interpreter neural network architecture for prefrontal cognitive control. J Neural System 25(6):1550017. doi:10.1142/S0129065715500173
Doucet A, De Freitas N, Gordon N (2001) An introduction to sequential monte carlo methods. In: Sequential Monte Carlo methods in practice, pp. 3–14. Springer
Doucet A, Godsill S, Andrieu C (2000) On sequential monte carlo sampling methods for bayesian filtering. Statistics and computing 10(3):197–208
Doya K, Ishii S, Pouget A, Rao RPN (eds) (2007) Bayesian Brain: Probabilistic Approaches to Neural Coding, 1st edn. The MIT Press
Duncan S (1972) Some signals and rules for taking speaking turns in conversations. Journal of personality and social psychology 23(2):283
Ferreira VS, Dell GS (2000) Effect of ambiguity and lexical availability on syntactic and lexical production. Cognitive psychology 40(4):296–340
Fiser J, Berkes P, Orbán G, Lengyel M (2010) Statistically optimal perception and learning: from behavior to neural representations. Trends Cogn Sci 14(3):119–130. doi:10.1016/j.tics.2010.01.003
Flanagan JR, Vetter P, Johansson RS, Wolpert DM (2003) Prediction precedes control in motor learning. Curr Biol 13(2):146–150
Frank MC, Goodman ND, Tenenbaum JB (2009) Using speakers’ referential intentions to model early cross-situational word learning. Psychological Science 20(5):578–585. doi:10.1111/j.1467-9280.2009.02335.x
Friston K (2008) Hierarchical models in the brain. PLoS Computational Biology 4(11):e1000,211
Friston K (2010) The free-energy principle: a unified brain theory? Nat Rev Neurosci 11(2):127–138. doi:10.1038/nrn2787
Friston K, FitzGerald T, Rigoli F, Schwartenbeck P, O’Doherty J, Pezzulo G (2016) Active inference and learning. Neurosci Biobehav Rev 68:862–879. doi:10.1016/j.neubiorev.2016.06.022
Friston K, FitzGerald T, Rigoli F, Schwartenbeck P, Pezzulo G (2016) Active inference: A process theory. Neural Comput 29(1):1–49 doi:10.1162/NECO_a_00912
Friston K, Frith C (2015) A duet for one. Consciousness and cognition
Friston K, Mattout J, Kilner J (2011) Action understanding and active inference. Biol Cybern 104(1–2):137–160. doi:10.1007/s00422-011-0424-z
Friston KJ, Daunizeau J, Kilner J, Kiebel SJ (2010) Action and behavior: a free-energy formulation. Biol Cybern 102(3):227–260. doi:10.1007/s00422-010-0364-z
Friston KJ, Frith CD (2015) Active inference, communication and hermeneutics. Cortex 68, 129-143. http://dx.doi.org/10.1016/j.cortex.2015.03.025
Gambi C, Pickering MJ (2011) A cognitive architecture for the coordination of utterances. Front Psychol 2:275. doi:10.3389/fpsyg.2011.00275
Garrod S, Pickering MJ (2004) Why is conversation so easy? Trends Cogn Sci 8(1):8–11
Garrod S, Pickering MJ (2015) The use of content and timing to predict turn transitions. Frontiers in psychology 6:751
Glenberg AM, Gallese V (2011) Action-based language: A theory of language acquisition, comprehension, and production. Cortex 48(7):905–922. doi:10.1016/j.cortex.2011.04.010
Heldner M (2011) Detection thresholds for gaps, overlaps, and no-gap-no-overlaps. J Acoust Soc Am 130(1):508–513
Heldner M, Edlund J (2010) Pauses, gaps and overlaps in conversations. Journal of Phonetics 38(4):555–568
Heldner M, Edlund J, Hjalmarsson A, Laskowski K (2011) Very short utterances and timing in turn-taking. In: INTERSPEECH, pp. 2837–2840
von Hofsten C (2004) An action perspective on motor development. Trends in Cognitive Science 8(6):266–272
Ivry RB, Richardson TC (2002) Temporal control and coordination: the multiple timer model. Brain and cognition 48(1):117–132
Jaffe J, Beebe B, Feldstein S, Crown CL, Jasnow MD, Rochat P, Stern DN (2001) Rhythms of dialogue in infancy: Coordinated timing in development. Monographs of the society for research in child development pp. i–149
Jeannerod M (2006) Motor Cognition. Oxford University Press
Jerde TE, Soechting JF, Flanders M (2003) Coarticulation in fluent fingerspelling. J Neurosci 23(6):2383–2393
Jonsdottir GR, Thorisson KR, Nivel E (2008) Learning smooth, human-like turntaking in realtime dialogue. In: In Proceedings of Intelligent Virtual Agents (IVA 08, pp. 162–175. Springer
Kawato M (1999) Internal models for motor control and trajectory planning. Current Opinion in Neurobiology 9:718–27
Keitel A, Daum MM (2015) The use of intonation for turn anticipation in observed conversations without visual signals as source of information. Front Psychol 6:108. doi:10.3389/fpsyg.2015.00108
Keller PE, Knoblich G, Repp BH (2007) Pianists duet better when they play with themselves: on the possible role of action simulation in synchronization. Conscious Cogn 16(1):102–111. doi:10.1016/j.concog.2005.12.004
Kilner J, Paulignan Y, Blakemore S (2003) An interference effect of observed biological movement on action. Current Biology 13:522–525
Kilner JM, Friston KJ, Frith CD (2007) Predictive coding: An account of the mirror neuron system. Cognitive Processing 8(3):159–166
Kording K, Wolpert D (2006) Bayesian decision theory in sensorimotor control. Trends Cogn. Sci. 10:319–326
Kose-Bagci H, Dautenhahn K, Nehaniv CL (2008) Emergent dynamics of turn-taking interaction in drumming games with a humanoid robot. In: Robot and Human Interactive Communication, 2008. RO-MAN 2008. The 17th IEEE International Symposium on, pp. 346–353. IEEE
Kroger BJ, Kopp S, Lowit A (2009) A model for production, perception, and acquisition of actions in face-to-face communication. Cogn Process. doi:10.1007/s10339-009-0351-2
Kuhl PK, Andruski JE, Chistovich IA, Chistovich LA, Kozhevnikova EV, Ryskina VL, Stolyarova EI, Sundberg U, Lacerda F (1997) Cross-language analysis of phonetic units in language addressed to infants. Science 277(5326):684–686
Leibfried F, Grau-Moya J, Braun DA (2015) Signaling equilibria in sensorimotor interactions. Cognition 141, 73–86. http://dx.doi.org/10.1016/j.cognition.2015.03.008
Levinson SC (2006) On the human “interaction engine”. In: Enfield NJ, Levinson SC (eds) Roots of human sociality: Culture, cognition and interaction. Berg, Oxford, pp 39–69
Levinson SC (2016) Turn-taking in human communication-origins and implications for language processing. Trends in cognitive sciences 20(1):6–14
Lieberman P (1963) Some effects of semantic and grammatical context on the production and perception of speech. Language and speech 6(3):172–187
Lindblom B (1990) Explaining phonetic variation: A sketch of the h&h theory. In: Speech production and speech modelling, pp. 403–439. Springer
Magyari L, de Ruiter JP (2012) Prediction of turn-ends based on anticipation of upcoming words. Front Psychol 3:376. doi:10.3389/fpsyg.2012.00376
Maisto D, Donnarumma F, Pezzulo G (2016) Nonparametric problem-space clustering: Learning efficient codes for cognitive control tasks. Entropy 18(2):61
Moore RK (2007) Presence: A human-inspired architecture for speech-based human-machine interaction. IEEE Trans. Computers 56(9):1176–1188
Mörtl A, Lorenz T, Vlaskamp BN, Gusrialdi A, Schubö A, Hirche S (2012) Modeling inter-human movement coordination: synchronization governs joint task dynamics. Biological Cybernetics 106(4–5):241–259. doi:10.1007/s00422-012-0492-8
Murphy KP (2002) Dynamic bayesian networks: representation, inference and learning. Ph.D. thesis, UC Berkeley, Computer Science Division
Noordzij ML, Newman-Norlund SE, de Ruiter JP, Hagoort P, Levinson SC, Toni I (2009) Brain mechanisms underlying human communication. Front Hum Neurosci 3:14. doi:10.3389/neuro.09.014.2009
Noordzij ML, Newman-Norlund SE, de Ruiter JP, Hagoort P, Levinson SC, Toni I (2010) Neural correlates of intentional communication. Front Neurosci 4:188. doi:10.3389/fnins.2010.00188
Ognibene D, Demiris Y (2013) Towards active event perception. In: Proceedings of the 23rd International Joint Conference of Artificial Intelligence (IJCAI 2013)
Ortega PA, Braun DA (2013) Thermodynamics as a theory of decision-making with information-processing costs. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Science 469(2153)
Pezzulo G (2011) Shared representations as coordination tools for interactions. Review of Philosophy and Psychology 2(2):303–333
Pezzulo G (2012) The interaction engine: a common pragmatic competence across linguistic and non-linguistic interactions. IEEE Transactions on Autonomous Mental Development 4(2):105–123
Pezzulo G (2013) Studying mirror mechanisms within generative and predictive architectures for joint action. Cortex 49:2968–2969
Pezzulo G (2017) Tracing the roots of cognition in predictive processing. In: Metzinger T, Wiese W (Eds) Philosophy and Predictive Processing: 20. Frankfurt am Main: MIND Group
Pezzulo G, Cisek P (2016) Navigating the affordance landscape: Feedback control as a process model of behavior and cognition. Trends Cogn Sci 20(6):414–424. doi:10.1016/j.tics.2016.03.013
Pezzulo G, Dindo H (2011) What should i do next? using shared representations to solve interaction problems. Experimental Brain Research 211(3):613–630
Pezzulo G, Dindo H (2013) Intentional strategies that make co-actors more predictable: the case of signaling. Behavioral and Brain Sciences 36(4):43–44
Pezzulo G, Donnarumma F, Dindo H (2013) Human sensorimotor communication: A theory of signaling in online social interactions. PLoS ONE 8(11):e79,876
Pezzulo G, Iodice P, Donnarumma F, Dindo H, Knoblich G (2017) Avoiding accidents at the champagne reception: A study of joint lifting and balancing. Psychol Sci. doi:10.1177/0956797616683015
Pezzulo G, Iodice P, Ferraina S, Kessler K (2013) Shared action spaces: a basis function framework for social re-calibration of sensorimotor representations supporting joint action. Front Hum Neurosci 7:800. doi:10.3389/fnhum.2013.00800
Pickering MJ, Garrod S (2007) Do people use language production to make predictions during comprehension? Trends in Cognitive Sciences 11(3):105–110
Pickering MJ, Garrod S (2013) An integrated theory of language production and comprehension. Behavioral and Brain Sciences
Pulvermüller F, Fadiga L (2010) Active perception: sensorimotor circuits as a cortical basis for language. Nature Reviews Neuroscience 11(5):351–360
Revel A, Andry P (2009) Emergence of structured interactions: From a theoretical model to pragmatic robotics. Neural networks 22(2):116–125
Sacheli LM, Tidoni E, Pavone EF, Aglioti SM, Candidi M (2013) Kinematics fingerprints of leader and follower role-taking during cooperative joint actions. Exp Brain Res. doi:10.1007/s00221-013-3459-7
Sacks H, Schegloff EA, Jefferson G (1974) A simplest systematics for the organisation of turn-taking for conversation. Language 50:696–735
Sanborn AN (2015) Types of approximation for probabilistic cognition: sampling and variational. Brain and cognition
Sanborn AN, Chater N (2016) Bayesian brains without probabilities. Trends in Cognitive Sciences 20(12):883–893
Schegloff EA (2006) Interaction: The infrastructure for social institutions, the natural ecological niche for language, and the arena in which culture is enacted. In: Enfield NJ, Levinson SC (eds) Roots of Human Sociality: Culture. Cognition and Interaction. Oxford, Berg
Sebanz N, Bekkering H, Knoblich G (2006) Joint action: bodies and minds moving together. Trends Cogn Sci 10(2):70–76. doi:10.1016/j.tics.2005.12.009
Sebanz N, Knoblich G (2009) Prediction in joint action: What, when, and where. Topics in Cognitive Science 1:353–367
Shafto P, Goodman ND, Griffiths TL (2014) A rational account of pedagogical reasoning: teaching by, and learning from, examples. Cogn Psychol 71:55–89. doi:10.1016/j.cogpsych.2013.12.004
Stivers T, Enfield NJ, Brown P, Englert C, Hayashi M, Heinemann T, Hoymann G, Rossano F, de Ruiter JP, Yoon KE, Levinson SC (2009) Universals and cultural variation in turn-taking in conversation. Proc Natl Acad Sci U S A 106(26):10587–10592. doi:10.1073/pnas.0903616106
Stoianov I, Genovesio A, Pezzulo G (2016) Prefrontal goal-codes emerge as latent states in probabilistic value learning. Journal of Cognitive Neuroscience 28(1):140–157
Tenenbaum JB, Kemp C, Griffiths TL, Goodman ND (2011) How to grow a mind: statistics, structure, and abstraction. Science 331(6022):1279–1285. doi:10.1126/science.1192788
Thorisson KR (2002) Multimodality in language and speech systems. In: B. Granström, D. House, I. Karlsson (eds.) Multimodality in Language and Speech Systems, chap. Natural turn-taking needs no manual: computational theory and model, from perception to actions, pp. 173–207. Kluwer Academic Publishers, Dordrecht, The Netherlands. http://xenia.media.mit.edu/%7Ekris/ftp/CompModTurnTak
Vesper C, Richardson MJ (2014) Strategic communication and behavioral coupling in asymmetric joint action. Exp Brain Res. doi:10.1007/s00221-014-3982-1
Vesper C, van der Wel RPRD, Knoblich G, Sebanz N (2011) Making oneself predictable: reduced temporal variability facilitates joint action coordination. Exp Brain Res 211(3–4):517–530. doi:10.1007/s00221-011-2706-z
Vul E, Goodman N, Griffiths TL, Tenenbaum JB (2014) One and done? optimal decisions from very few samples. Cognitive science 38(4):599–637
Warlaumont AS, Richards JA, Gilkerson J, Oller DK (2014) A social feedback loop for speech development and its reduction in autism. Psychological science p. 0956797614531023
Wilson M, Wilson TP (2005) An oscillator model of the timing of turn-taking. Psychon Bull Rev 12(6):957–968
Wlodarczak M, Simko J, Wagner P (2013) Pitch and duration as a basis for entrainment of overlapped speech onsets. Proceedings of Interspeech 2013
Wolpert DM, Doya K, Kawato M (2003) A unifying computational framework for motor control and social interaction. Philos Trans R Soc Lond B Biol Sci 358(1431):593–602. doi:10.1098/rstb.2002.1238
Wolpert DM, Kawato M (1998) Multiple paired forward and inverse models for motor control. Neural Networks 11(7–8):1317–1329
Acknowledgements
The authors want to thank two anonymous reviewers for useful comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
The research leading to these results has received funding from the European Union Seventh Framework Programme, Grant No. FP7-270108 (Goal-Leaders) to GP. The GEFORCE Titan used for this research was donated by the NVIDIA Corporation.
Rights and permissions
About this article
Cite this article
Donnarumma, F., Dindo, H., Iodice, P. et al. You cannot speak and listen at the same time: a probabilistic model of turn-taking. Biol Cybern 111, 165–183 (2017). https://doi.org/10.1007/s00422-017-0714-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00422-017-0714-1