Abstract
The concept of action as basic motor control unit for goal-directed movement behavior has been used primarily for private or non-communicative actions like walking, reaching, or grasping. In this paper, literature is reviewed indicating that this concept can also be used in all domains of face-to-face communication like speech, co-verbal facial expression, and co-verbal gesturing. Three domain-specific types of actions, i.e. speech actions, facial actions, and hand-arm actions, are defined in this paper and a model is proposed that elucidates the underlying biological mechanisms of action production, action perception, and action acquisition in all domains of face-to-face communication. This model can be used as theoretical framework for empirical analysis or simulation with embodied conversational agents, and thus for advanced human–computer interaction technologies.
Similar content being viewed by others
References
Abbs JH (1979) Speech motor equivalence: the need for a multi-level control model. In: Proceedings of the ninth international congress of phonetic sciences, Institute of Phonetics, Copenhagen, pp 318–324
Alibali MW, Heat DC, Myers HJ (2001) Effects of visibility between speaker and listener on gesture production. J Memory Lang 44:169–188
Allwood J (1976) Linguistic communication as action and cooperation. Gothenburg monographs in linguistics 2. Göteborg University, Department of Linguistics, Göteborg
Ambadar Z, Schooler J, Cohn JF (2005) Deciphering the enigmatic face: the importance of facial dynamics to interpreting subtle facial expressions. Psychol Sci 16:403–410
Arbib MA, Billard A, Iacoboni M, Oztop E (2000) Synthetic brain imaging: grasping, mirror neurons and imitation. Neural Netw 13:975–997
Bailly G (1997) Learning to speak: sensory-motor control of speech movements. Speech Commun 22:251–267
Bassili JN (1978) Facial motion in the perception of faces and of emotional expression. J Exp Psychol Hum Percept Perform 4:373–379
Bergmann K, Kopp S (2009) Increasing the expressiveness of virtual agents—autonomous generation of speech and gesture for spatial description tasks. In: Proceedings of 8th international conference on autonomous agents and multiagent systems (AAMAS 2009), pp 361–368
Bernstein N (1967) The coordination and regulation of movement. Pergamon, London
Blakemore SJ, Decety J (2001) From the perception of action to the understanding of intention. Nat Rev Neurosci 2:561–567
Brass M, Schmitt RM, Spengler S, Gergely G (2007) Investigating action understanding: inferential processes versus action simulation. Curr Biol 17:2117–2121
Browman C, Goldstein L (1989) Articulatory gestures as phonological units. Phonology 6:201–251
Browman C, Goldstein L (1992) Articulatory phonology: an overview. Phonetica 49:155–180
Cheng S, Sabes PN (2006) Modeling sensorimotor learning with linear dynamical systems. Neural Comput 18:760–793
Cohn JF (2007) Foundations of human computing: facial expression and emotion. In: Huang TS, Nijholt A, Pantic M, Pentland A (eds) Artificial intelligence for human computing (LNAI 4451. Springer, Berlin, pp 1–16
Cohn JF, Ambadar Z, Ekman P (2007) Observer-based measurement of facial expression with the facial action coding system. In: Coan JA, Allen JJB (eds) Handbook of emotion elicitation and assessment. Oxford University Press, New York, pp 203–221
Cooper F, Delattre P, Liberman A, Borst J, Gerstman L (1952) Some experiments on the perception of synthetic speech sounds. J Acoust Soc Am 24:597–606
Cunnington R, Windischberger C, Robinson S, Moser E (2006) The selection of intended actions and the observation of others’ actions: a time-resolved fMRI study. NeuroImage 29:1294–1302
Dang J, Honda K (2004) Construction and control of a physiological articulatory model. J Acoust Soc Am 115:853–870
De la Torre F, Campoy J, Ambadar Z, Cohn JF (2007) Temporal segmentation of facial behavior. In: Proceedings of the IEEE 11th international conference on computer vision (ICCV 2007), Rio de Janeiro, Brazil, pp 1–8
De Ruiter JP (1998) Gesture and gesture production. Doctoral dissertation at Catholic University of Nijmegen, The Netherlands (now called Radboud University Nijmegen)
Demiris Y, Dearden A (2005) From motor babbling to hierarchical learning by imitation: a robot developmental pathway. In: Berthouze L, Kaplan F, Kozima H, Yano H, Konczak J, Metta G, Nadel J, Sandini G, Stojanov G, Balkenius C (eds) Proceedings of the fifth international workshop on epigenetic robotics: modeling cognitive development in robotic systems, Lund University Cognitive Studies 123, Lund, Sweden, pp 31–37
Der R, Martinus G (2006) From motor babbling to purposive actions: emerging self-exploration in a dynamical systems approach to early robot development. In: S Nolfi, G Baldassarre, R Calabretta, JCT Hallam, D Marocco, JA Meyer, O Miglino, D Parisi (eds) From animals to Animats 9. Proceedings of the 9th international conference on simulation of adaptive behavior (SAB 2006, Rome, Italy) LNAI 4905, Springer, Heidelberg, pp 406–421
Desmurget M, Grafton ST (2000) Forward modeling allows feedback control for fast reaching movements. Trends Cogn Sci 4:423–431
Diehl RL, Lotto AJ, Holt LL (2004) Speech perception. Annu Rev Psychol 55:149–179
Ekman P, Friesen WV (1976) Measuring facial movement. Env Psychol Nonverbal Behav 1:56–75
Ekman P, Friesen WV (1978) Facial action coding system. Consulting Psychologists Press, Palo Alto
Ekman P, Oster H (1979) Facial expressions of emotion. Annu Rev Psychol 30:527–554
Fadiga L, Craighero L (2004) Electrophysiology of action representation. J Clin Neurophysiol 21:157–168
Feldman AG (1986) Once more on equilibrium point hypothesis for motor control. J Mot Behav 18:17–54
Field TM, Woodson R, Greenberg R, Cohen D (1984) Discrimination and imitation of facial expressions by neonates. In: Chess S, Thomas A (eds) Annual progress in child psychiatry and child development. Brunner, Mazel, New York
Flash T, Hogan KN (1985) The coordinate of arm movements: an experimentally confirmed mathematical model. J Neurosci 5:1688–1703
Fowler CA, Turvey MT (1981) Immediate compensation in bite-block speech. Phonetica 37:306–326
Gallese V (2000) The inner sense of action: agency and motor representations. J Conscious Stud 7:23–40
Girin L, Schwartz JL, Feng G (2001) Audio-visual enhancement of speech in noise. J Acoust Soc Am 109:3007–3020
Goldstein L, Byrd D, Saltzman E (2006) The role of vocal tract action units in understanding the evolution of phonology. In: Arbib MA (ed) Action to language via the mirror neuron system. Cambridge University Press, Cambridge, pp 215–249
Goldstein L, Pouplier M, Chen L, Saltzman L, Byrd D (2007) Dynamic action units slip in speech production errors. Cognition 103:386–412
Grafton ST, Hamilton AF (2007) Evidence for a distributed hierarchy of action representation in the brain. Hum Mov Sci 26:590–616
Grosjean M, Shiffrar M, Knoblich G (2007) Fitts’ law holds for action perception. Psychol Sci 18:95–99
Guenther FH (2006) Cortical interaction underlying the production of speech sounds. J Commun Disord 39:350–365
Guenther FH, Hampson M, Johnson D (1998) A theoretical investigation of reference frames for the planning of speech movements. Psychol Rev 105:611–633
Guenther FH, Ghosh SS, Tourville JA (2006) Neural modeling and imaging of the cortical interactions underlying syllable production. Brain Lang 96:280–301
Guidetti M, Nicoladis E (2008) Introduction to special issue: gestures and communicative development. First Language 28:107–115
Hickok G, Poeppel D (2007) Towards a functional neuroanatomy of speech perception. Trends Cogn Sci 4:131–138
Hogan N (1984) An organizing principle for a class of voluntary movements. J Neurosci 4:2745–2754
Houde JF, Jordan MI (2002) Sensorimotor adaptation of speech I: compensation and adaptation. J Speech Lang Hear Res 45:295–310
Iacoboni M (2005) Neural mechanisms of imitation. Curr Opin Neurobiol 15:632–637
Indefrey W, Level PJM (2004) The spatial and temporal signatures of word production components. Cognition 92:101–144
Ito T, Gomi H, Honda M (2004) Dynamical simulation of speech cooperative articulation by muscle linkages. Biol Cybern 91:275–282
Jahanshahi M, Frith CD (1998) Willed action and its impairments. Cogn Neuropsychol 15:483–533
Jastorff J, Kourtzi Z, Giese MA (2006) Learning to discriminate complex movements: biological versus artificial trajectories. J Vis 6:791–804
Jeannerad M (2001) Neural simulation of action: a unifying mechanism for motor cognition. NeuroImage 14:S103–S109
Jeannerod M (1999) The 25th Bartlett lecture: to act or not to act: perspectives on the representation of actions. Q J Exp Psychol 52A:1–29
Jordan MI (1995) Computational aspects of motor control and motor learning. In: Heuer H, Prinz W, Keele SW, Bridgeman B (eds) Handbook of perception and action: motor skills. Academic Press, London, pp 71–120
Kawato M (1999) Internal models for motor control and trajectory planning. Curr Opin Neurobiol 9:718–727
Kawato M, Maeda Y, Uno Y, Suzuki R (1990) Trajectory formation of arm movement by cascade neural network model based on minimum torque-change criterion. Biol Cybern 62:275–288
Kelso JAS, Tuller BT, Vatikiotis-Baetson E, Fowler CA (1984) Functionally specific articulatory cooperation following jaw perturbations during speech: evidence for coordinative structures. J Exp Psychol Hum Percept Perform 10:812–832
Kelso JAS, Saltzman E, Tuller B (1986) The dynamical perspective on speech production: data and theory. J Phon 14:29–59
Kendon A (2004) Gesture: visible action as utterance. Cambridge University Press, New York
Kohler E, Keysers C, Umilta MA, Fogassi L, Gallese V, Rizzolatti G (2002) Hearing sounds, understanding actions: action representation in mirror neurons. Science 297:846–848
Kopp S (to appear) Social resonance and embodied coordination in face-to-face conversational with artificial interlocutors, speech communication (special issue on speech and face-to-face communication)
Kopp S, Wachsmuth I (2004) Synthesizing multimodal utterances for conversational agents. J Comput Anim Virtual Worlds 15:39–51
Kopp S, Krenn B, Marsella S, Marshall AN, Pelachaud C, Pirker H, Thórisson KR, Vilhjálmsson H (2006) Towards a common framework for multimodal generation: the behavior markup language. In: Gratch J, Young M, Aylett R, Ballin D, Olivier P (eds) Intelligent virtual agents (LNCS 4133. Springer, Berlin, pp 205–217
Kopp S, Tepper P, Ferriman K, Cassell J (2007) Trading spaces—how humans and humanoids use speech and gesture to give directions. In: Nishida T (ed) Conversational informatics. Wiley, Oxford, pp 133–160
Kopp S, Allwood J, Ahlsen E, Grammer K, Stocksmeier T (2008) Modeling embodied feedback in a virtual human. In: Wachsmuth I, Knoblich G (eds) Modeling communication with robots and virtual humans (LNAI 4930. Springer, Berlin, pp 18–37
Kröger BJ (1993) A gestural production model and its application to reduction in German. Phonetica 50:213–233
Kröger BJ, Birkholz P (2007) A gesture-based concept for speech movement control in articulatory speech synthesis. In: Esposito A, Faundez-Zanuy M, Keller E, Marinaro M (eds) Verbal and nonverbal communication behaviours, LNAI 4775. Springer, Berlin, pp 174–189
Kröger BJ, Schröder G, Opgen-Rhein C (1995) A gesture-based dynamic model describing articulatory movement data. J Acoust Soc Am 98:1878–1889
Kröger BJ, Kannampuzha J, Neuschaefer-Rube C (2009a) Towards a neurocomputational model of speech production and perception. Speech Commun 51:793–809
Kröger BJ, Kannampuzha J, Lowit A, Neuschaefer-Rube C (2009b) Phonetotopy within a neurocomputational model of speech production and speech acquisition. In: Fuchs S, Loevenbruck H, Pape D, Perrier P (eds) Some aspects of speech and the brain. Peter Lang, Frankfurt, pp 59–90
Kurowski K, Blumstein SE (1984) Perceptual integration of the murmur and formant transitions for place of articulation in nasal consonants. J Acoust Soc Am 73:383–390
Latash ML (2008) Evolution of motor control: from reflexes and motor programs to the equilibrium-point hypothesis. J Hum Kinet 19:3–24
Latash ML, Gorniak S, Zatsiorsky VM (2008) Hierarchies of synergies in human movements. Kinesiology 40:29–38
Lestou V, Pollick FE, Kourtzi Z (2008) Neural substrates for action understanding at different description levels in the human brain. J Cogn Neurosci 20:324–341
Levelt WJM, Richardson G, Heij WL (1985) Pointing and voicing in deictic expressions. J Memory Lang 24:133–164
Levelt WJM, Roelofs A, Meyer AS (1999) A theory of lexical access in speech production. Behav Brain Sci 22:1–38
Liberman AM, Mattingly IG (1985) The motor theory of speech perception revised. Cognition 21:1–36
Lindblom B (1963) Spectrographic study of vowel reduction. J Acoust Soc Am 35:1773–1779
Lindblom B (1983) Economy of speech gestures. In: McNeilage PF (ed) The production of speech. Springer, New York, pp 217–245
McNeill D (1992) Hand and mind: what gestures reveal about thought. University of Chicago Press, Chicago
Meltzoff AN, Moore MK (1977) Imitation of facial and manual gestures by human neonates. Science 198:75–78
Meltzoff AN, Moore MK (1989) Imitation in newborn infants: exploring the range of gestures imitted and the underlying mechanisms. Dev Psychol 25:954–962
Nasir SM, Ostry DJ (2006) Somatosensory precision in speech production. Curr Biol 16:1918–1923
Nasir SM, Ostry DJ (2008) Speech motor learning in profoundly deaf adults. Nat Neurosci 11:1217–1222
Nearey T, Assmann P (1986) Modeling the role of inherent spectral change in vowel identification. J Acoust Soc Am 80:1297–1308
Neel AT (2004) Formant detail needed for vowel identification. Acoust Res Lett Online 5:125–131
Nelson WL (1983) Physical principles for economics of skilled movements. Biol Cybern 46:135–147
Nowak DA, Topka H, Timmann D, Boecker H, Hermsdörfer J (2007) The role of the cerebellum or predictive control of grasping. Cerebellum 6:7–17
Pantic M, Rothkrantz LJM (2000) Expert system for automatic analysis of facial expressions. Image Vis Comput 18:881–905
Payan Y, Perrier P (1997) Synthesis of V-V sequences with a 2D biomechanical tongue model controlled by the equilibrium point hypothesis. Speech Commun 22:185–205
Perkell J, Matthies M, Lane H, Guenther F, Wilhelms-Tricarico R, Wozniak J, Guiod P (1997) Speech motor control: acoustic goals, saturation effects, auditory feedback and internal models. Speech Commun 22:227–249
Perrier P (2005) Control and representation in speech production. ZAS Pap Linguist 40:109–132
Perrier P, Ma L (2008) Speech planning for VCV sequences: influence of the planned sequence. In: Proceedings of the 8th international seminar on speech production, Strasbourg, France, pp 69–72
Perrier P, Ostry DJ, Laboissiere R (1996) The equilibrium point hypothesis and its application to speech motor control. J Speech Hear Res 39:365–378
Perrier P, Payan Y, Zandipour M, Perkell J (2003) Influence of tongue biomechanics on speech movements during the production of velar stop consonants: a modeling study. J Acoust Soc Am 114:1582–1599
Poizner H, Bellugi U, Lutes-Driscoll V (1981) Perception of American sign language in dynamic point-light displays. J Exp Psychol Hum Percept Perform 7:430–440
Purcell DW, Munhall KG (2006) Adaptive control of vowel formant frequency: evidence from real-time formant manipulation. J Acoust Soc Am 120:966–977
Rasmussen J, Damsgaard M, Voigt M (2001) Muscle recruitment by the min/max criterion—a comparative numerical study. J Biomech 34:409–415
Rizzolatti G, Craighero L (2004) The mirror neuron system. Annu Rev Neurosci 27:169–192
Rochet-Capellan A, Laboissiere R, Galvan A, Schwartz JL (2008) The speech focus position effect on jaw-finger coordination in a pointing task. J Speech Lang Hear Res 51:1507–1521
Rodrigo MJ, Gonzalez A, de Vega M, Muneton-Ayala M, Rodriguez G (2004) From gestural to verbal deixis: a longitudinal study with Spanish infants and toddlers. First Lang 24:71–90
Rosenblum LD, Johnson JA, Saldana HM (1996) Point-light displays enhance comprehension of speech in noise. J Speech Hear Res 39:1159–1170
Sabes PN (2000) The planning and control of reaching movements. Curr Opin Neurobiol 10:740–746
Sabes PN, Jordan MI (1997) Obstacle avoidance and a perturbation sensitivity model for motor planning. J Neurosci 17:7119–7128
Sadeghipour A, Kopp S (2009) A probabilistic model of motor resonance for embodied gesture perception. In: Proceedings of intelligent virtual agents (IVA09), pp 80–103
Saltzman E (1979) Levels of sensorimotor representation. J Math Psychol 20:91–163
Saltzman E, Byrd D (2000) Task-dynamics of gestural timing: phase windows and multifrequency rhythms. Hum Mov Sci 19:499–526
Saltzman E, Kelso JAS (1987) Skilled actions: a task dynamic approach. Psychol Rev 94:84–106
Saltzman E, Munhall KG (1989) A dynamic approach to gestural patterning in speech production. Ecol Psychol 1:333–382
Schaal S (1999) Is imitation learning the route to humanoid robots? Trends Cogn Sci 3:233–242
Schmidt KL, Cohn JF (2002) Human facial expressions as adaptations: evolutionary questions in facial expression research. Am J Phys Anthropol 116(S33):3–24
Schmidt KL, Cohn JF, Tian Y (2003) Signal characteristics of spontaneous facial expressions: automatic movement in solitary and social smiles. Biol Psychol 65:49–66
Schmidt KL, Ambadar Z, Cohn JF, Reed LI (2006) Movement differences between deliberate and spontaneous facial expressions: zygomaticus major action in smiling. J Nonverbal Behav 30:37–52
Schmidt KL, Bhattacharya S, Denlinger R (2009) Comparison of deliberate and spontaneous facial movement in smiles and eyebrow raises. J Nonverbal Behav 33:35–45
Scholz JP, SChöner G, Hsu WL, Jeka JJ, Horak F, Martin V (2007) Motor equivalent control of the center of mass in response to support surface perturbations. Exp Brain Res 80:163–179
Schwartz JL, Boe LJ, Abry C (2007) Linking dispersion-focalization theory and the maximum utilization of the available distance features principle in a perception-for-action-control theory. In: Sole MJ (ed) Experimental approaches to phonology. Oxford University Press, Oxford
Shadmehr R, Mussa-Ivaldi FA (1994) Adaptive representation of dynamics during learning of a motor task. J Neurosci 14:3208–3224
Smeets JB, Brenner EA (1999) A new view on grasping. Mot Control 3:237–271
Sober SJ, Sabes PN (2003) Multisensory integration during motor planning. J Neurosci 23:6982–6992
Sober SJ, Sabes PN (2005) Flexible strategies for sensory integration during motor planning. Nat Neurosci 8:490–497
Steels L, Spranger M (2008) The robot in the mirror. Connect Sci 20:337–358
Strange W, Jenkins J, Johnson T (1983) Dynamic specification of coarticulated vowels. J Acoust Soc Am 74:695–705
Summerfield Q (1987) Some preliminaries to a comprehensive account of audio-visual speech perception. In: Dodd B, Campbell R (eds) Hearing by eye: the psychology of lipreading. Lawrence Erlbaum, London, pp 3–51
Tian YL, Kanade T, Cohn JF (2005) Facial expression analysis. In: Li SZ, Jain AK (eds) Handbook of face recognition. Springer, New York, pp 247–275
Todorov E (2004) Optimality principles in sensorimotor control. Nat Neurosci 7:907–915
Todorov E, Ghahramani Z (2003) Unsupervised learning of sensory-motor primitives. In: Proceedings of the 25th annual international conference of the IEEE engineering in medicine and biology society, pp 1750–1753
Todorov E, Jordan MI (1998) Smoothness maximization along a predefined path accurately predicts the speed profiles of complex arm movements. J Neurophysiol 80:696–714
Tomasello M, Carpenter M, Liszkowski U (2007) A new look at infant pointing. Child Dev 78:705–722
Turvey MT (1977) Preliminaries to a theory of action with reference to vision. In: Shaw R, Bransford J (eds) Perceiving, acting and knowing: towards an ecological psychology. Erlbaum, Hillsdale, pp 211–266
Wolpert DM, Flanagan JR (2001) Motor prediction. Curr Biol 11:R729–R732
Wolpert DM, Ghahramani Z, Flanagan JR (2001) Perspectives and problems in motor learning. Trends Cogn Sci 5:487–494
Acknowledgments
This work was supported in part by the Deutsche Forschungsgemeinschaft (DFG) Project Nr. Kr 1439/13-1 and project Nr. Kr 1439/15-1, and by the Deutsche Forschungsgemeinschaft (DFG) in SFB 673 “Alignment in Communication” and the Center of Excellence “Cognitive Interaction Technology” (CITEC).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kröger, B.J., Kopp, S. & Lowit, A. A model for production, perception, and acquisition of actions in face-to-face communication. Cogn Process 11, 187–205 (2010). https://doi.org/10.1007/s10339-009-0351-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10339-009-0351-2