skip to main content
chapter

Using cognitive models to understand multimodal processes: the case for speech and gesture production

Published: 24 April 2017 Publication History

Abstract

Multimodal behavior has been studied for a long time and in many fields, e.g., in psychology, linguistics, communication studies, education, and ergonomics. One of the main motivations has been to allow humans to use technical systems intuitively, in a way that resembles and fosters human users' natural way of interacting and thinking [Oviatt 2013]. This has sparked early work on multimodal human-computer interfaces, including recent approaches to recognize communicative behavior and even subtle multimodal cues by computer systems. Those approaches, for the most part, rest on machine learning techniques applied to large sets of behavioral data. As datasets grow larger in size and coverage, and computational power increases, suitable data-driven techniques are able to detect correlational behavior patterns that support answering questions like which feature( s) to take into account or how to recognize them in specific contexts. However, natural multimodal interaction in humans entails a plethora of behavioral variations and intricacies (e.g., when to act unimodally vs. multimodally, with which specific behaviors or multi-level coordination between them). Possible underlying patterns are hard to detect, even in large datasets, and often such variations are attributed to context-dependencies or individual differences. How they come about is still hard to explain at the behavioral level.
One additional level of explanation that can help to deepen our understanding, and to establish systematic and generalized accounts, involves cognitive processes that lead to a particular multimodal response in a specific situation (e.g., see Chapter 2; [James et al. 2017]). A prominent example is the concept of cognitive load or "mental workload." Many behavioral variations in spoken language have been meaningfully explained in terms of heightened or lowered cognitive load of the speaker. For example, under high cognitive load speakers are found to speak more slowly, to produce more silent or filled pauses, and to utter more repetitions [Jameson et al. 2010]. Likewise, human users distribute information across multiple modalities in order to manage their cognitive limits [Oviatt et al. 2004, Chen et al. 2012]. In studies with elementary school children as well as adults, active manual gesturing was demonstrated to improve memory during a task that required explaining math solutions [Goldin-Meadow et al. 2001]. This effect of gesturing increased with higher task difficulty. Such behavioral phenomena are commonly explained based on cognitive concepts like cognitive load and, further, underlying processes like modality-specific working memories [Baddeley 1992] or competition for cognitive resources [Wickens et al. 1983].
Cognitive theories also provide valuable hints as to how to design multimodal human-machine interaction. To continue with the above example, it has been suggested that certain multimodal interfaces help users to minimize their cognitive load and hence improve their performance. For instance, the physical activity of manual or pen-based gesturing can play a particularly important role in organizing and facilitating people's spatial information processing, which has been shown to reduce cognitive load on tasks involving geometry, maps, and similar areas [Alibali et al. 2000, Oviatt 1997]. Other research revealed that expressively powerful interfaces not only help to cope with interaction problems, but also substantially facilitate human cognition by functioning as "thinking tools" [Oviatt 2013]. It is for these reasons that "Cognitive Science has and will continue to play an essential role in guiding the design of multimodal systems" [Oviatt and Cohen 2015, p. 33].
One particularly explicit form of a cognitive account is a cognitive model. In general, a cognitive model is a simplified, schematic description of cognitive processes for the purposes of understanding or predicting a certain behavior. This approach is markedly pursued in the field of cognitive modeling, which developed at the intersection of cognitive psychology, cognitive science, and artificial intelligence. In these fields, cognitive models are primarily developed within generic cognitive architectures like ACT-R [Anderson et al. 2004] or SOAR [Laird 2012], to name two prominent examples, which capture generally assumed structural and functional properties of the human mind. Yet, the term cognitive model is not restricted to the use of such architectures. It can aptly be used whenever a specific notion of cognitive or mental processes is provided in computational terms that afford simulation-based examination and evaluation. This may include symbolic, hybrid, connectionist or dynamical models [cf. Polk and Seifert 2002], and it has been proposed for single cognitive tasks (e.g., memorizing of items), for the interaction of two or more processes (e.g., visual search and language comprehension), or for making specific behavioral predictions (e.g., driving under the influence of alcohol).
In this chapter, we discuss how computational cognitive models can be useful for the field of multimodal and multisensory interaction. A number of arguments and prospects readily can be identified: from a basic research point of view, a cognitive model can represent a deeper level of understanding in terms of processes and mechanisms that underlie a certain behavior. Such an explanation will have to be hypothetical to some extent, but it potentially bears great scientific value for a number of reasons. First, it is more specific and detailed than most psychological theories (as discussed in Section 6.3). Second, it is predictive rather than purely descriptive, hence affording rigorous evaluation and falsification, e.g., by deriving quantitative predictions in computational simulations and assessing those against empirical data. Finally, cognitive models can provide a common level of description that enables relating and combining insights from different fields of research. For example, they may draw on general findings from working memory or attention to address the question of multimodal or multisensory interaction. From an engineering point of view, cognitive models can help to build better multimodal systems. This holds especially true for cases where data is lacking and inspiration needs to be drawn from theoretical concepts. A computational cognitive model that has proven itself useful in evaluation can provide principles and criteria for the development of algorithms and systems. For example, it may be employed in computational simulations to actively explore distributions of behavioral variations, in order to produce additional training data or help confine a domain to its relevant contingencies, thresholds, or functions. Second, a cognitive model should provide an informed model of the human user and thus can guide the design of multimodal interaction systems. For instance, recognizing and interpreting the user's cognitive processes provides a basis for more adequate system adaptation (a capability that is becoming increasingly important, cf. Section 6.3). Simple assumptions are not realistic here. Cognitive models can be used to formulate and test more detailed and substantiated notions of human processing to improve the design of system algorithms. Likewise, a cognitive model may support identifying situations in which multimodal interaction can enable users to achieve their goals more effectively (e.g., with fewer errors or less cognitive load), where consequences of multimodality for the cognitive processing can be more or less directly observed in the simulations.
Clearly, many of the listed potential benefits of computational cognitive models verge on or even go beyond the current state of established knowledge. Thus, we will focus our discussion on one case of natural multimodal behavior that has been extensively researched in this regard, the use of spontaneous speech and gesture in dialogue. We will start by reviewing speech and gesture as a pervasive case of natural multimodal behavior. We will motivate its relevance for practical multimodal interfaces, virtual characters, or social robotics (Section 6.2). Then we will discuss existing theoretical and computational models of the cognitive underpinnings (Section 6.3), and we will elaborate on one particular cognitive model of speech-gesture production that explicates the role of mental representation and memory processes up to a degree that does afford computational simulation under varying conditions (Section 6.4). Using this model, we demonstrate in Section 6.5 how cognitive modeling can be used to gain a better understanding of multimodal production processes and to inform the design of multimodal interactive systems. As an aid to comprehension, readers are referred to this chapter's Focus Questions and to the Glossary for a definition of terminology.

References

[1]
J. Alexandersson, A. Girenko, D. Spiliotopoulos, V. Petukhova, D. Klakow, D. Koryzis, N. Taatgen, M. Specht, N. Campbell, M. Aretoulaki, A. Stricker, and M. Gardner. 2014. Metalogue: a multiperspective multimodal dialogue system with metacognitive abilities for highly adaptive and flexible dialogue management. In Proceedings of the 10th International Conference on Intelligent Environments, pp. 365--368. Shanghai, China. 253
[2]
M. W. Alibali, S. Kita, and A. Young. 2000. Gesture and the process of speech production: We think, therefore we gesture. Language & Cognitive Processes, 15:593--613. 240, 248, 264
[3]
J. R. Anderson. 1993. Rules of the Mind. Lawrence Erlbaum Associates, Hillsdale, NJ. 252
[4]
J. R. Anderson. 2007. How can the human mind occur in the physical universe? Oxford University Press, USA. 252
[5]
J. R. Anderson, D. Bothell, M. D. Byrne, S. Douglass, C. Lebiere, and Y. Qin. 2004. An integrated theory of the mind. Psychological Review, 111(4):1036--1060. 241, 242, 252, 256, 611
[6]
A. D. Baddeley. 1992. Working Memory. Science, New Series. 255(5044): 556--559. 240
[7]
A. D. Baddeley. 1999. Essentials of Human Memory. Psychology Press, Hove, UK.
[8]
A. D. Baddeley. 2000. The episodic buffer: a new component of working memory? Trends in Cognitive Sciences, 4:417--423. 251
[9]
A. D. Baddeley. 2003. Working memory: Looking back and looking forward. Nature Reviews Neuroscience, 4(10):829--839. 251
[10]
A. D. Baddeley and G. J. Hitch. 1974.Working memory. In G.H. Bower, editor, The Psychology of Learning and Motivation: Advances in Research and Theory, vol. 8, pp. 47--89. Academic Press, New York. 250
[11]
Bangerter. 2004. Using pointing and describing to achieve joint focus of attention in dialogue. Psychological Science,15(6), 415--419. 247
[12]
P. Barrouillet, S. Bernardin, and V. Camos. 2004. Time constraints and resource sharing in adults' working memory spans. Journal of Experimental Psychology: General, 133:83--100. 251
[13]
J. Bavelas, C. Kenwood, T. Johnson, and B. Philips. 2002. An experimental study of when and how speakers use gestures to communicate. Gesture, 2(1):1--17. 264
[14]
K. Bergmann and S. Kopp. 2006. Verbal or visual: How information is distributed across speech and gesture in spatial dialog. In D. Schlangen and R. Fernandez, editors, Proceedings of the 10th Workshop on the Semantics and Pragmatics of Dialogue, pp. 90--97. Universitätsverlag, Potsdam. 248, 264
[15]
K. Bergmann and S. Kopp. 2008. Multimodal content representation for speech and gesture production. In M. Theune, I. van der Sluis, Y. Bachvarova and E. André, editors, Symposium at the AISB Annual Convention: Multimodal Output Generation, pp. 61--68. Aberdeen, UK. 242, 621
[16]
K. Bergmann and S. Kopp. 2009. GNetIc: Using Bayesian decision networks for iconic gesture generation. In Z. Ruttkay, M. Kipp, A. Nijholt, and H. Vilhjálmsson, editors, Proceedings of the 9th International Conference on Intelligent Virtual Agents, pp. 76--89. Springer, Berlin/Heidelberg, Germany. 246, 255, 258
[17]
K. Bergmann and S. Kopp. 2012. Gestural alignment in natural dialogue. In R.P. Cooper, D. Peebles, and N. Miyake, editors, Proceedings of the 34th Annual Conference of the Cognitive Science Society, pp. 1326--1331. Cognitive Science Society, Austin, TX. 245
[18]
K. Bergmann, S. Kahl, and S. Kopp. 2013. Modeling the semantic coordination of speech and gesture under cognitive and linguistic constraints. In R. Aylett, B. Krenn, C. Pelachaud, and H. Shimodaira, editors, Proceedings of the 13th International Conference on Intelligent Virtual Agents, pp. 203--216. Springer, Berlin/Heidelberg. 245, 254, 265, 267
[19]
L. Breslow, A. Harrison, and J. Trafton. 2010. Linguistic spatial gestures. In D. Salvucci and G. Gunzelmann, editors, Proceedings of the 10th International Conference on Cognitive Modeling, pp. 13--18. Drexel University, Philadelphia, PA. 253
[20]
H. Buschmeier, K. Bergmann, and S. Kopp. 2009. An alignment-capable microplanner for natural language generation. Proceedings of the 12th European Workshop on Natural Language Generation, pp. 82--89. Athens, Greece. 255
[21]
O. Butterworth and G. W. Beattie. 1978. Gesture and silence as indicators of planning in speech. In R.N. Campbell and P.T. Smith, editors, Recent Advances in the Psychology of Language: Formal and Experimental Approaches, vol. 4B, pp. 347--360. Plenum Press, New York. 248
[22]
B. Butterworth and U. Hadar. 1989. Gesture, speech and computational stages: A reply to McNeill. Psychological Review, 96:168--174. 248
[23]
Y. Cao, M. Theune, and A. Nijholt. 2009. Towards cognitive-aware multimodal presentation: the modality effects in high-load HCI. In Proceedings of the 8th International Conference on Engineering Psychology and Cognitive Ergonomics, pp. 3--12. Springer, Berlin/Heidelberg. 252
[24]
J. Cassell. 2000. More than just another pretty face: Embodied conversational interface agents. Communication of the ACM, 43:70--78. 245
[25]
J. Cassell, H. Vilhjálmsson, and T. Bickmore. 2001. BEAT: The behavior expression animation toolkit. In Proceedings of SIGGRAPH '01, pp. 477--486. New York. 245
[26]
F. Chen, N. Ruiz, E. Choi, J. Epps, A. Khawaja, R. Taib, B. Yin, and Y.Wang. 2012. Multimodal behavior and interaction as indicators of cognitive load. ACM Transactions on Interactive Intelligent Systems, 2(4):22. 240 Christenfeld, S. Schachter, and F. Bilous. 1991. Filled pauses and gestures: It's not coincidence. Journal of Psycholinguistic Research, 20(1), 1--10. 248
[27]
J. M. Clark and A. Paivio. 1991. Dual coding theory and education. Educational Psychological Review, 3:149--210. 251
[28]
A. M. Collins and E. F. Loftus. 1975. A spreading-activation theory of semantic processing. Psychological Review, 82(6):407--428. 256
[29]
N. Cowan. 1995. Attention and Memory: An Integrated Framework. Oxford Psychology Series. Oxford University Press, Oxford, UK. 251
[30]
N. Cowan. 1999. An embedded process model of working memory. In A. Miyake and P. Shah, editors, Models of Working Memory: Mechanisms of Active Maintenance and Executive Control. Cambridge University Press, Cambridge, UK. 251, 252
[31]
N. Cowan, J. S. Saults, and C. L. Blume. 2014. Central and peripheral components of working memory storage. Journal of Experimental Psychology: General, 143(5):1806--1836. 252
[32]
J. de Ruiter. 2000. The production of gesture and speech. In D. McNeill, editor, Language and Gesture, pp. 284--311. Cambridge University Press, Cambridge, UK. 249
[33]
J. P. de Ruiter, A. Bangerter, and P. Dings. 2012. The interplay between gesture and speech in the production of referring expressions: investigating the tradeoff hypothesis. Topics in Cognitive Sciences, 4:232--248. 247
[34]
A. T. Dittmann and L. G. Llewellyn. 1969. Body movement and speech rhythm in social conversation. Journal of Personality and Social Psychology., 11(2):98--106. 248
[35]
B. Endrass and E. André. 2014. Integration of cultural factors into behavioral models for virtual characters. In A. Stent and S. Bangalore, editors, Natural Language Generation in Interactive Systems, pp. 227--251. Cambridge University Press, Cambridge, MA. 246
[36]
K. Georgila and O. Lemon. 2004. Adaptive multimodal dialogue management based on the information state update approach. In W3C Workshop on Multimodal Interaction, p. 142. 252
[37]
A. E. Goldberg. 1995. Constructions: A Construction Grammar Approach to Argument Structure. University of Chicago Press, Chicago. 253
[38]
S. Goldin-Meadow, H. Nusbaum, S. Kelly, and S. Wagner. 2001. Explaining math: Gesturing lightens the load. Psychological Science, 12:516--522. 240
[39]
M. W. Hoetjes, E. J. Krahmer, and M. G. J. Swerts. 2014. Does our speech change when we cannot gesture? Speech Communication, 57:257--267. 248
[40]
A. Hostetter and M. W. Alibali. 2007. Raise your hand if you're spatial---Relations between verbal and spatial skills and gesture production. Gesture, 7:73--95. 248
[41]
A. Hostetter and M. W. Alibali. 2008. Visible embodiment: Gestures as simulated action. Psychonomic Bulletin Review, 15(3):495--514. 251, 256
[42]
R. Jackendoff. 2002. Foundations of Language: Brain, Meaning, Grammar, Evolution. Oxford University Press, Oxford, UK. 253
[43]
K. H. James, S. Vinci-Booher, and L. F. Munoz-Rabke. 2017. The impact of multimodalmultisensory learning on human performance and brain activation patterns. In S. Oviatt, B. Schuller, P. Cohen, D. Sonntag, G. Potamianos, and A. Krüger, editors, The Handbook of Multimodal-Multisensor Interfaces, Volume 1: Foundations, User Modeling, and Common Modality Combinations, Ch. 2. Morgan & Claypool Publishers, San Rafael, CA. 240
[44]
A. Jameson, C. Kiefer, Müller, B. Großmann-Hutter, F. Wittig, and R. Rummer. 2010. Assessment of a User's Time Pressure and Cognitive Load on the Basis of Features of Speech. In M.E. Crocker and J. Siekmann, editors, Resource-Adaptive Cognitive Processes, pp. 171--204. Springer, Berlin/Heidelberg. 240
[45]
D. E. Kieras and D. E. Meyer. 1997. An overview of the EPIC architecture for cognition and performance with application to human-computer interaction. Human-Computer Interaction, 12:391--438. 252
[46]
S. Kita. 2000. How representational gestures helps speaking. In D. McNeill, editor, Language and Gesture, pp. 162--185. Cambridge University Press, Cambridge, UK. 248
[47]
S. Kita and T. S. Davies. 2009. Competing conceptual representations trigger co-speech representational gestures. Language and Cognitive Processes, 24:761--775. 248
[48]
S. Kita and A. Özyürek. 2003. What does cross-linguistic variation in semantic coordination of speech and gesture reveal?: Evidence for an interface representation of spatial thinking and speaking. Journal of Membrane Science, 48:16--32. 249, 250, 262, 267
[49]
S. Kita, A. Özyürek, S. Allen, A. Brown, R. Furman, and T. Ishizuka. 2007. Relations between syntactic encoding and co-speech gestures: Implications for a model of speech and gesture production. Language and Cognitive Processes, 22(8):1212--1236. 267
[50]
S. Kopp. 2017. Studying the functions of gesture with human-agent interaction. In B. Church, M. Alibali, and S. Kelly, editors, The Functions of Gesture. John Benjamins Publishing Company, Amsterdam. 245, 269
[51]
S. Kopp, K. Bergmann, and S. Kahl. 2013. A spreading-activation model of the semantic coordination of speech and gesture. Proceedings of the 35th Annual Meeting of the Cognitive Science Society, pp. 823--828. Cognitive Science Society, Austin, TX. 254, 255, 257, 258
[52]
S. Kopp, P. Tepper, and J. Cassell. 2004. Towards integrated microplanning of language and iconic gesture for multimodal output. In Proceedings of the International Conference on Multimodal Interfaces, pp. 97--04. New York. 246
[53]
S. Kopp, H. van Welbergen, R. Yaghoubzadeh, and H. Buschmeier. 2014. An architecture for fluid real-time conversational agents: Integrating incremental output generation and input processing. Journal Multimodal User Interfaces, 8:97--108. 255
[54]
S. Kosslyn. 1987. Seeing and imagining in the celebral hemispheres: A computational approach. Psychological Review, 94:148--175. 243, 632
[55]
R. Krauss, Y. Chen, and R. Gottesman. 2000. Lexical gestures and lexical access: A process model. In D. McNeill, editor, Language and Gesture, pp. 261--283. Cambridge University Press, Cambridg, UK. 249
[56]
R. M. Krauss and U. Hadar. 1999. The role of speech-related arm/hand gestures in word retrieval. In L. Messing and R. Campbell, editors, Gesture, Speech and Sign, pp. 93--116. Oxford University Press, Oxford. 248
[57]
J. E. Laird. 2012. The SOAR Cognitive Architecture. MIT Press, Cambridge, MA. 241, 242, 252, 611
[58]
J. E. Laird, A. Newell, and P.S. Rosenbloom. 1987. SOAR: An architecture for general intelligence. Artificial Intelligence, 33(1):1--64. 252
[59]
J. Lee and S. Marsella. 2006. Nonverbal behavior generator for embodied conversational agents. In J. Gratch, M. Young, R. Aylett, D. Ballin, and P. Olivier, editors, Proceedings of the 6th International Conference on Intelligent Virtual Agents, pp. 243--255. Springer, Berlin/Heidelberg. 245
[60]
W. J. M. Levelt. 1989. Speaking: From Intention to Articulation. MIT Press, Cambridge, MA. 249, 256
[61]
M. Lhommet and S. Marsella. 2014. Metaphoric gestures: towards grounded mental spaces. In Proceedings of the 14th International Conference on Intelligent Virtual Agents, pp. 264--274. Springer, Berlin/Heidelberg. 254
[62]
R. E. Mayer and R. Moreno. 2003. Nine ways to reduce cognitive load in working memory. Educational Psychology, 38 (1):43--52. 251
[63]
D. McNeill. 1992. Hand and Mind---What Gestures Reveal about Thought. University of Chicago Press, Chicago. 242, 244, 250, 251, 612, 617, 618, 620
[64]
D. McNeill. 2005. Gesture and Thought. University of Chicago Press, Chicago. 244, 250, 264
[65]
D. McNeill and S. D. Duncan. 2000. Growth points in thinking-for-speaking. In D. McNeill, editor, Language and Gesture, pp. 141--161. Cambridge University Press, Cambridge, UK. 244
[66]
M. Neff, M. Kipp, I. Albrecht, and H.-P. Seidel. 2008. Gesture modeling and animation based on a probabilistic re-creation of speaker style. ACM Transactions on Graphics, 27(1):1--24. 246
[67]
S. Nobe. 2000. Where do most spontaneous representational gestures actually occur with respect to speech? In D. McNeill, editor, Language and Gesture, pp. 186--198. Cambridge University Press, Cambridge, UK. 248
[68]
K. Oberauer and R. Kliegl. 2006. A formal model of capacity limits in working memory. Journal of Memory and Language, 55:601--626. 251
[69]
S. Oviatt. 1997. Multimodal interactive maps: Designing for human performance. Human-Computer Interaction, 12(1):93--129. 240
[70]
S. Oviatt. 2013. The Design of Future Educational Interfaces. Routledge Press, New York. 239, 240
[71]
S. Oviatt. 2017. Theoretical foundations of multimodal interfaces and systems. In S. Oviatt, B. Schuller, P. Cohen, D. Sonntag, G. Potamianos, and A. Krüger, editors, The Handbook of Multimodal-Multisensor Interfaces, Volume 1: Foundations, User Modeling, and Common Modality Combinations, Ch. 1. Morgan & Claypool Publishers, San Rafael, CA. 250
[72]
S. Oviatt and P. Cohen. 2015. The Paradigm Shift in Multimodal Interfaces. Morgan & Claypool Publishers, San Rafael, CA. 240, 244
[73]
S. Oviatt, R. Coulston, and R. Lunsford. 2004. When do we interact multimodally? Cognitive load and multimodal communication patterns. In Proceedings of the 6th International Conference on Multimodal interfaces, pp. 129--136. ACM, New York. 240
[74]
A. Özyürek. 2002. Speech-gesture relationship across languages and in second language learners: Implications for spatial thinking and speaking. In B. Skarabela, S. Fish, and A.H. Do, editors, Proceedings of the 26th Annual Boston University Conference on Language Development, pp. 500--509. Cascadilla Press, Somerville, MA. 267
[75]
A. Özyürek, S. Kita, S. Allen, A. Brown, R. Furman, and T. Ishizuka. 2008. Development of cross-linguistic variation in speech and gesture: motion events in English and Turkish. Developmental Psychology, 44(4):1040--1054. 264
[76]
A. Paivio. 1986. Mental Representations. Oxford University Press, Oxford. 251
[77]
T. A. Polk and C. M. Seifert. 2002. Cognitive Modeling. Cambridge University Press, Cambridge, MA. 241
[78]
F. Putze and T. Schultz. 2009. Cognitive memory modeling for interactive systems in dynamic environments. In 1st international Workshop on Spoken Dialog Systems, Kloster Irsee, Germany. 253
[79]
N. Ruiz, R. Tab, and F. Chen. 2006. Examining the redundancy of multimodal input. Proceedings of OZCHI 2006, pp. 389--392. Sydney, Australia. 262, 264
[80]
S. Schaffer and D. Reitter. 2012. Modeling efficiency-guided modality choice in voice and graphical user interfaces. Proceedings of the 11th International Conference on Cognitive Modeling, pp. 253--254. Universitätsverlag der TU Berlin, Berlin. 252
[81]
H. Schultheis, T. Barkowsky, and S. Bertel. 2006. LTMc---An improved long-term memory for cognitive architectures. In Proceedings of the 7th International Conference on Cognitive Modeling, pp. 274--279. Trieste, Italy. 253
[82]
J. Schweppe and R. Rummer. 2014. Attention, working memory, and long-term memory in multimedia learning: An integrated perspective based on process models of working memory. Educational Psychology Review, 26:285--306. 252
[83]
D. I. Slobin. 1996. From "thought and language" to "thinking for speaking." In J. Gumperz and S. Levinson, editors, Rethinking Linguistic Relativity, pp. 70--96. Cambridge University Press, Cambridge, UK. 243, 250, 256, 259, 630
[84]
W. C. So, S. Kita, and S. Goldin-Meadow. 2009. Using the hands to identify who does what to whom: Gesture and speech go hand-in-hand. Cognitive Science, 33:115--125. 247
[85]
T. Sowa and I. Wachsmuth. 2009. A computational model for the representation and processing of shape in coverbal iconic gestures. In K. Coventry, T. Tenbrink, and J. Bateman, editors, Spatial Language and Dialogue, pp. 132.146. Oxford University Press, Oxford, UK. 243, 256, 257, 632
[86]
M. Turunen, J. Hakulinen, O. Ståhl, B. Gambäck, P. Hansen, M.C.R. Gancedo, and M. Cavazza. 2011. Multimodal and mobile conversational health and fitness companions. Computer Speech and Language, 25(2):192.209. 253
[87]
P. Wagner, Z. Malisz, and S. Kopp. 2014. Gesture and Speech in Interaction: An Overview. Speech Communication, 57(Special Iss.), 209.232. 244, 245, 249, 251
[88]
C. D. Wickens, D. Sandry, and M. Vidulich. 1983. Compatibility and resource competition between modalities of input, output, and central processing. Human Factors, 25:227. 248.

Cited By

View all
  • (2023)The promise and peril of interactive embodied agents for studying non-verbal communication: a machine learning perspectivePhilosophical Transactions of the Royal Society B: Biological Sciences10.1098/rstb.2021.0475378:1875Online publication date: 6-Mar-2023
  • (2023)Gesture links language and cognition for spoken and signed languagesNature Reviews Psychology10.1038/s44159-023-00186-92:7(407-420)Online publication date: 25-May-2023
  • (2023)Pegasos: a framework for the creation of direct mobile coaching feedback systemsJournal on Multimodal User Interfaces10.1007/s12193-023-00411-y18:1(1-19)Online publication date: 12-Sep-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Books
The Handbook of Multimodal-Multisensor Interfaces: Foundations, User Modeling, and Common Modality Combinations - Volume 1
April 2017
662 pages
ISBN:9781970001679
DOI:10.1145/3015783

Publisher

Association for Computing Machinery and Morgan & Claypool

Publication History

Published: 24 April 2017

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Chapter

Appears in

ACM Books

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)94
  • Downloads (Last 6 weeks)74
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)The promise and peril of interactive embodied agents for studying non-verbal communication: a machine learning perspectivePhilosophical Transactions of the Royal Society B: Biological Sciences10.1098/rstb.2021.0475378:1875Online publication date: 6-Mar-2023
  • (2023)Gesture links language and cognition for spoken and signed languagesNature Reviews Psychology10.1038/s44159-023-00186-92:7(407-420)Online publication date: 25-May-2023
  • (2023)Pegasos: a framework for the creation of direct mobile coaching feedback systemsJournal on Multimodal User Interfaces10.1007/s12193-023-00411-y18:1(1-19)Online publication date: 12-Sep-2023
  • (2021)Embodied ConversationEncyclopedia of Organizational Knowledge, Administration, and Technology10.4018/978-1-7998-3473-1.ch076(1091-1107)Online publication date: 2021
  • (2021)Copresence With Virtual Humans in Mixed Reality: The Impact of Contextual Responsiveness on Social PerceptionsFrontiers in Robotics and AI10.3389/frobt.2021.6345208Online publication date: 12-Apr-2021
  • (2020)Improving robot’s perception of uncertain spatial descriptors in navigational instructions by evaluating influential gesture notionsJournal on Multimodal User Interfaces10.1007/s12193-020-00328-w15:1(11-24)Online publication date: 12-Jun-2020
  • (2019)Development of a Repository of Virtual 3D Conversational Gestures and ExpressionsApplied Physics, System Science and Computers III10.1007/978-3-030-21507-1_16(105-110)Online publication date: 27-Jun-2019
  • (2018)Advanced Content and Interface Personalization through Conversational Behavior and Affective Embodied Conversational AgentsArtificial Intelligence - Emerging Trends and Applications10.5772/intechopen.75599Online publication date: 27-Jun-2018
  • (2018)Validación de un modelo cognitivo basado en M ++ para la generación de preguntas Factoid-WhTeknos revista científica10.25044/25392190.967(11-20)Online publication date: 30-Dec-2018
  • (2018)Multimodal learning analyticsThe Handbook of Multimodal-Multisensor Interfaces10.1145/3107990.3108003(331-374)Online publication date: 1-Oct-2018
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media