Embodied Human Computer Interaction

Pustejovsky, James; Krishnaswamy, Nikhil

doi:10.1007/s13218-021-00727-5

Embodied Human Computer Interaction

Technical Contribution
Published: 16 September 2021

Volume 35, pages 307–327, (2021)
Cite this article

KI - Künstliche Intelligenz Aims and scope Submit manuscript

1174 Accesses
24 Citations
2 Altmetric
Explore all metrics

Abstract

In this paper, we argue that embodiment can play an important role in the design and modeling of systems developed for Human Computer Interaction. To this end, we describe a simulation platform for building Embodied Human Computer Interactions (EHCI). This system, VoxWorld, enables multimodal dialogue systems that communicate through language, gesture, action, facial expressions, and gaze tracking, in the context of task-oriented interactions. A multimodal simulation is an embodied 3D virtual realization of both the situational environment and the co-situated agents, as well as the most salient content denoted by communicative acts in a discourse. It is built on the modeling language VoxML (Pustejovsky and Krishnaswamy in VoxML: a visualization modeling language, proceedings of LREC, 2016), which encodes objects with rich semantic typing and action affordances, and actions themselves as multimodal programs, enabling contextually salient inferences and decisions in the environment. VoxWorld enables an embodied HCI by situating both human and artificial agents within the same virtual simulation environment, where they share perceptual and epistemic common ground. We discuss the formal and computational underpinnings of embodiment and common ground, how they interact and specify parameters of the interaction between humans and artificial agents, and demonstrate behaviors and types of interactions on different classes of artificial agents.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Role of Embodiment and Simulation in Evaluating HCI: Theory and Framework

The Role of Embodiment and Simulation in Evaluating HCI: Experiments and Evaluation

Prototyping User Interfaces for Investigating the Role of Virtual Agents in Human-Machine Interaction

Notes

This recalls the question of how to best model situated action [16, 97].
See Sect. 5 for details on integrating various sensor types and their relationships with the particulars of the artificial agent’s embodiment.
as = argument structure; qs = qualia structure.
Beginning in [52], voxemes have been denoted [[voxeme]].
It should be noted that Gibsonian affordances might be construed as the goal of an activity in some contexts.
TTR encodes actions (such as put and grasp above) as finite-state sequences of subevents (cf. [72]), but the computational effect of applying the updating functions over the current RobotState, given an action, are similar to our interpretation of events as state-transformers; e.g., mapping from RobotState to RobotState.
VoxSim source can be found here.
Shared aural perception is possible, while haptic technology is rapidly advancing. We expect that much of the semantics presented here would be suitable for modeling extra-visual shared perception. This is the topic of ongoing research, beginning with haptics in VR.
This is similar in many respects to the representations introduced in [20, 27] and [37] for modeling action and control with robots.
The theory of semiotic schemas introduced in [83] attempts to encode the perceptual context of a linguistic utterance as well, to resolve reference.
Forward kinematics computes the position of the end-effector from the joint parameters. Inverse kinematics computes the joint parameters from the position of the effector.
\([\![S ]\!]= ([\![\mathbf{NP} ]\!][\![\mathbf{GP} ]\!]).\)
\([\![\mathbf{GP}_1 ]\!]= \lambda j. ([\![\mathbf{D}_{Obj} ]\!];\lambda j'.(([\![\mathbf{G}_{af} ]\!]j')j)).\)
\([\![\mathbf{GP}_2 ]\!]= \lambda k. ([\![\mathbf{D}_{Loc} ]\!]; \lambda j. ([\![\mathbf{D}_{Obj} ]\!];\lambda j'.(([\![\mathbf{G}_{af} ]\!]j')j)k)).\)
\([\![\mathbf{GP}_3 ]\!]= \lambda k. ([\![\mathbf{D}_{Dir} ]\!]; \lambda j. ([\![\mathbf{D}_{Obj} ]\!];\lambda j'.(([\![\mathbf{G}_{af} ]\!]j')j)k)).\)
\([\alpha ]_{\sigma } (x_i \vee e_i)\), \([\beta ]_{\sigma } (x_i \vee e_i).\)
\([\alpha ]_{\sigma } ([\beta ]_{\sigma } (x_i \vee e_i))\), \([\beta ]_{\sigma } ([\alpha ]_{\sigma } (x_i \vee e_i)).\)
\([\beta ]_{\sigma } ([\alpha ]_{\sigma } ([\beta ]_{\sigma } (x_i \vee e_i))) \), \([\alpha ]_{\sigma } ([\beta ]_{\sigma } ([\alpha ]_{\sigma } (x_i \vee e_i))).\)
\([(\alpha \cup \beta )^*]_{\sigma } \varphi. \)
A video demo can be viewed here http://www.voxicon.net/wp-content/uploads/2020/07/DARPA-CwC-Brandeis-CSU-July-2020.mp4.
VoxML encodes relations using a number of common spatial reasoning calculi, including the Region Connection Calculus [82], where this would be encoded EC(y, sfc).

References

Anderson ML (2003) Embodied cognition: a field guide. Artif Intell 149(1):91–130
Google Scholar
Asher N (1998) Common ground, corrections and coordination. J Semant
Asher N (2008) A type driven theory of predication with complex types. Fund Inf 84(2):151–183
MathSciNet MATH Google Scholar
Asher N, Lascarides A (2003) Logics of conversation. Cambridge University Press, Cambridge
Google Scholar
Asher N, Pogodalla S (2010) Sdrt and continuation semantics. In: JSAI international symposium on artificial intelligence, Springer, New York, pp 3–15
Asher N, Pustejovsky J (2006) A type composition logic for generative lexicon. J Cognit Sci 6:1–38
Google Scholar
Baker CL, Jara-Ettinger J, Saxe R, Tenenbaum JB (2017) Rational quantitative attribution of beliefs, desires and percepts in human mentalizing. Nat Hum Behav 1(4):1–10
Google Scholar
Ballard DH (1981) Generalizing the hough transform to detect arbitrary shapes. Pattern Recogn 13(2):111–122
MATH Google Scholar
Barker C, Shan CC (2014) Continuations and natural language, vol 53. Oxford Studies in Theoretical Linguistics
van Benthem JFAK (1991) Logic and the flow of information
Bergen BK (2012) Louder than words: the new science of how the mind makes meaning. Basic Books
Blackburn P, Bos J (2003) Computational semantics. Theor Int J Theory Hist Found Sci pp 27–45
Cassell J, Stone M, Yan H (2000a) Coordination and context-dependence in the generation of embodied conversation. In: Proceedings of the first international conference on Natural language generation-Volume 14, ACL, pp 171–178
Cassell J, Sullivan J, Churchill E, Prevost S (2000b) Embodied conversational agents. MIT Press, New York
Google Scholar
Chrisley R (2003) Embodied artificial intelligence. Artif Intell 149(1):131–150
Google Scholar
Clancey WJ (1993) Situated action: A neuropsychological interpretation response to vera and simon. Cogn Sci 17(1):87–116
Google Scholar
Clark HH, Brennan SE (1991) Grounding in communication. Perspect Soc Share Cognit 13(1991):127–149
Google Scholar
Cooper R (2005) Records and record types in semantic theory. J Logic Comput 15(2):99–112
MathSciNet MATH Google Scholar
Cooper R (2017) Adapting type theory with records for natural language semantics. In: Modern perspectives in type-theoretical semantics, Springer, New York, pp 71–94
Cooper R, Ginzburg J (2015) Type theory with records for natural language semantics. The handbook of contemporary semantic theory p 375
Coventry K, Garrod SC (2005) Spatial prepositions and the functional geometric framework. Towards a classification of extra-geometric influences
Craik KJW (1943) The nature of explanation. Cambridge University, Cambridge
Google Scholar
De Groote P (2001) Type raising, continuations, and classical logic. In: Proceedings of the thirteenth Amsterdam Colloquium, pp 97–101
Dekker PJ (2012) Predicate logic with anaphora. In: Dynamic Semantics, Springer, New York, pp 7–47
Dobnik S, Cooper R (2017) Interfacing language, spatial perception and cognition in type theory with records. J Lang Modell 5(2):273–301
Google Scholar
Dobnik S, Cooper R, Larsson S (2012) Modelling language, action, and perception in type theory with records. In: International workshop on constraint solving and language processing, Springer, New York, pp 70–91
Dobnik S, Cooper R, Larsson S (2013) Modelling language, action, and perception in type theory with records. In: Constraint solving and language processing, Springer, New York, pp 70–91
Evans V (2013) Language and time: a cognitive linguistics approach. Cambridge University Press, Cambridge
Google Scholar
Feldman J (2010) Embodied language, best-fit analysis, and formal compositionality. Phys Life Rev 7(4):385–410
Google Scholar
Fernando T (2009) Situations in ltl as strings. Inf Comput 207(10):980–999
MathSciNet MATH Google Scholar
Fischer K (2011) How people talk with robots: designing dialog to reduce user uncertainty. AI Magn 32(4):31–38
Google Scholar
Foster ME (2007) Enhancing human–computer interaction with embodied conversational agents. In: International conference on universal access in human–computer interaction, Springer, New York, pp 828–837
Gatsoulis Y, Alomari M, Burbridge C, Dondrup C, Duckworth P, Lightbody P, Hanheide M, Hawes N, Hogg D, Cohn A, et al. (2016) Qsrlib: a software library for online acquisition of qualitative spatial relations from video
Gibson JJ (1977) The theory of affordances. Perceiving, acting, and knowing: toward an ecological psychology, pp 67–82
Gibson JJ (1979) The ecological approach to visual perception. Psychology Press
Ginzburg J (1996) Interrogatives: questions, facts and dialogue. The handbook of contemporary semantic theory. Blackwell, Oxford pp 359–423
Ginzburg J, Fernández R (2010) Computational models of dialogue. The handbook of computational linguistics and natural language processing 57:1
Google Scholar
Goldman AI (1989) Interpretation psychologized*. Mind Lang 4(3):161–185
Google Scholar
Gordon RM (1986) Folk psychology as simulation. Mind Lang 1(2):158–171
Google Scholar
Gregoromichelaki E, Kempson R, Howes C (2020) Actionism in syntax and semantics. Dial Percept pp 12–27
Griffiths TL, Chater N, Kemp C, Perfors A, Tenenbaum JB (2010) Probabilistic models of cognition: exploring representations and inductive biases. Trends Cogn Sci 14(8):357–364
Google Scholar
Groenendijk J, Stokhof M (1991) Dynamic predicate logic. Linguist Philos pp 39–100
Harel D (1984) Dynamic logic. In: Gabbay M, Gunthner F (eds) Handbook of philosophical logic, volume II: extensions of classical logic, Reidel, p 497–604
Harel D, Kozen D, Tiuyn J (2000) Dynamic logic, 1st edn. The MIT Press, New York
Google Scholar
Johnson M (1987) The body in the mind: the bodily basis of meaning, imagination, and reason. University of Chicago Press, Chicago
Google Scholar
Kamp H, Van Genabith J, Reyle U (2011) Discourse representation theory. In: Handbook of philosophical logic, Springer, New York, pp 125–394
Kendon A (2004) Gesture: visible action as utterance. Cambridge University Press, Cambridge
Google Scholar
Kiela D, Bulat L, Vero AL, Clark S (2016) Virtual embodiment: A scalable long-term strategy for artificial intelligence research. arXiv preprint arXiv:161007432
Klein E, Sag IA (1985) Type-driven translation. Linguist Philos 8(2):163–201
Google Scholar
Konrad K (2004) 4 minimal model generation. In: Model generation for natural language interpretation and analysis, Springer, New York, pp 55–56
Kopp S, Wachsmuth I (2010) Gesture in embodied communication and human–computer interaction, vol 5934. Springer, New York
Google Scholar
Krishnaswamy N (2017) Monte-carlo simulation generation through operationalization of spatial primitives. PhD thesis, Brandeis University
Krishnaswamy N, Pustejovsky J (2016a) Multimodal semantic simulations of linguistically underspecified motion events. In: Spatial Cognition X, Springer, New York, pp 177–197
Krishnaswamy N, Pustejovsky J (2016b) VoxSim: a visual platform for modeling motion language. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics, ACL
Krishnaswamy N, Pustejovsky J (2018) Deictic adaptation in a virtual environment. In: Spatial cognition XI, Springer, New York, pp 180–196
Krishnaswamy N, Narayana P, Wang I, Rim K, Bangar R, Patil D, Mulay G, Ruiz J, Beveridge R, Draper B, Pustejovsky J (2017) Communicating and acting: Understanding gesture in simulation semantics. In: 12th International workshop on computational semantics
Kruijff GJM, Lison P, Benjamin T, Jacobsson H, Zender H, Kruijff-Korbayová I, Hawes N (2010) Situated dialogue processing for human–robot interaction. In: Cognitive systems, Springer, pp 311–364
Landragin F (2006) Visual perception, language and gesture: a model for their understanding in multimodal dialogue systems. Signal Process 86(12):3578–3595
MATH Google Scholar
Lascarides A, Stone M (2006) Formal semantics for iconic gesture. In: Proceedings of the 10th workshop on the semantics and pragmatics of dialogue (BRANDIAL), pp 64–71
Lascarides A, Stone M (2009) A formal semantic analysis of gesture. J Semant p ffp004
Lücking A, Pfeiffer T, Rieser H (2015) Pointing and reference reconsidered. J Pragmat 77:56–79
Google Scholar
Mani I, Pustejovsky J (2012) Interpreting motion: grounded representations for spatial language. Oxford University Press, Oxford
Google Scholar
Marge M, Rudnicky AI (2013) Towards evaluating recovery strategies for situated grounding problems in human–robot dialogue. In: 2013 IEEE RO-MAN, IEEE, pp 340–341
Marshall P, Hornecker E (2013) Theories of embodiment in hci. SAGE Handb Digit Technol Res 1:144–158
Google Scholar
McNeely-White DG, Ortega FR, Beveridge JR, Draper BA, Bangar R, Patil D, Pustejovsky J, Krishnaswamy N, Rim K, Ruiz J, Wang I (2019) User-aware shared perception for embodied agents. In: 2019 IEEE international conference on humanized computing and communication (HCC), IEEE, pp 46–51
Miller GA, Johnson-Laird PN (1976) Language and perception. Belknap Press, Cambridge
Google Scholar
Muller P, Prévot L (2009) Grounding information in route explanation dialogues
Narayana P, Krishnaswamy N, Wang I, Bangar R, Patil D, Mulay G, Rim K, Beveridge R, Ruiz J, Pustejovsky J, Draper B (2018) Cooperating with avatars through gesture, language and action. In: Intelligent systems conference (IntelliSys)
Narayanan S (2010) Mind changes: a simulation semantics account of counterfactuals. Cognit Sci
Naumann R (2001) Aspects of changes: a dynamic event semantics. J Semant 18:27–81
Google Scholar
Plaza J (2007) Logics of public communications. Synthese 158(2):165–179
MathSciNet MATH Google Scholar
Pustejovsky J (1991) The syntax of event structure. Cognition 41(1–3):47–81
Google Scholar
Pustejovsky J (1995) The generative Lexicon. MIT Press, New York
Google Scholar
Pustejovsky J (2013) Dynamic event structure and habitat theory. In: Proceedings of the 6th international conference on generative approaches to the Lexicon (GL2013), ACL, pp 1–10
Pustejovsky J (2018) From actions to events: communicating through language and gesture. Interact Stud 19(1–2):289–317
Google Scholar
Pustejovsky J, Batiukova O (2019) The lexicon. Cambridge University Press, Cambridge
Google Scholar
Pustejovsky J, Boguraev B (1993) Lexical knowledge representation and natural language processing. Artif Intell 63(1–2):193–223
Google Scholar
Pustejovsky J, Krishnaswamy N (2016) Voxml: a visualization modeling language. Proceedings of LREC
Pustejovsky J, Krishnaswamy N (2020) Embodied human-computer interactions through situated grounding. In: IVA ’20: proceedings of the 20th international conference on intelligent virtual agents, ACM
Pustejovsky J, Moszkowicz JL (2011) The qualitative spatial dynamics of motion in language. Spatial Cognit Comput 11(1):15–44
Google Scholar
Qing C, Goodman ND, Lassiter D (2016) A rational speech-act model of projective content. In: Proceedings of cognitive science, pp 1110–1115
Randell D, Cui Z, Cohn A, Nebel B, Rich C, Swartout W (1992) A spatial logic based on regions and connection. In: KR’92. Principles of knowledge representation and reasoning: proceedings of the 3rd international conference, Morgan Kaufmann, San Mateo, pp 165–176
Roy D (2005) Semiotic schemas: a framework for grounding language in action and perception. Artif Intell 167(1–2):170–205
Google Scholar
Schaffer S, Reithinger N (2019) Conversation is multimodal: thus conversational user interfaces should be as well. In: Proceedings of the 1st international conference on conversational user interfaces, pp 1–3
Scheutz M, Cantrell R, Schermerhorn P (2011) Toward humanlike task-based dialogue processing for human robot interaction. AI Magn 32(4):77–84
Google Scholar
Schlenker P (2020) Gestural grammar. Nat Lang Linguist Theory pp 1–50
Shapiro L (2014) The Routledge handbook of embodied cognition. Routledge, England
Google Scholar
Stalnaker R (2002) Common ground. Linguist Philos 25(5–6):701–721
Google Scholar
Tavares JMRS, Padilha AJMN (1995) A new approach for merging edge line segments. In: Proceedings RecPad’95, Aveiro
Tellex S, Gopalan N, Kress-Gazit H, Matuszek C (2020) Robots that use language. Annu Rev Control Robot Auton Syst 3:25–55
Google Scholar
Tomasello M, Carpenter M (2007) Shared intentionality. Dev Sci 10(1):121–125
Google Scholar
Ullman TD, Goodman ND, Tenenbaum JB (2012) Theory learning as stochastic search in the language of thought. Cogn Dev 27(4):455–480
Google Scholar
Unger C (2011) Dynamic semantics as monadic computation. In: JSAI international symposium on artificial intelligence, Springer, New York, pp 68–81
Van Benthem J (2011) Logical dynamics of information and interaction. Cambridge University Press, Cambridge
Van Ditmarsch H, van Der Hoek W, Kooi B (2007) Dynamic epistemic logic, vol 337. Springer, New York
MATH Google Scholar
Van Eijck J, Unger C (2010) Computational semantics with functional programming. Cambridge University Press, Cambridge
MATH Google Scholar
Vera AH, Simon HA (1993) Situated action: a symbolic interpretation. Cognit Sci 17(1):7–48. https://doi.org/10.1016/S0364-0213(05)80008-4
Article Google Scholar
Wahlster W (2006) Dialogue systems go multimodal: The smartkom experience. In: SmartKom: foundations of multimodal dialogue systems, Springer, New York, pp 3–27
Wang I, Narayana P, Patil D, Mulay G, Bangar R, Draper B, Beveridge R, Ruiz J (2017) EGGNOG: A continuous, multi-modal data set of naturally occurring gestures with ground truth labels. In: To appear in the Proceedings of the 12th IEEE international conference on automatic face & gesture recognition
Weiser M (1999) The computer for the 21st century. ACM SIGMOBILE Mob Comput Commun Rev 3(3):3–11
Google Scholar
Williams T, Bussing M, Cabrol S, Boyle E, Tran N (2019) Mixed reality deictic gesture for multi-modal robot communication. In: 2019 14th ACM/IEEE international conference on human–robot interaction (HRI), IEEE, pp 191–201
Winston ME, Chaffin R, Herrmann D (1987) A taxonomy of part-whole relations. Cognit Sci 11(4):417–444
Google Scholar

Download references

Acknowledgements

We would like to thank Ross Beveridge, Bruce Draper, Francisco R. Ortega, and their team at Colorado State University, and Jaime Ruiz and his team at the University of Florida, without whose contribution the Diana System would not be a reality. We would also like to thank Katherine Krajovic, R. Pito Salas, and Nathaniel J. Dimick for their work on the Kirby implementation. Particular thanks to Ms. Krajovic for assembling the dialogue flowcharts in Fig. 14. We would also like to thank Ken Lai for his discussion regarding common ground structure. This work was supported by the US Defense Advanced Research Projects Agency (DARPA) and the Army Research Office (ARO) under contract #W911NF-15-C-0238 at Brandeis University. This work was also supported in part by a grant to James Pustejovsky from the IIS Division of National Science Foundation (1763926) entitled “Building a Uniform Meaning Representation for Natural Language Processing”. The points of view expressed herein are solely those of the authors and do not represent the views of the Department of Defense or the United States Government. Any errors or omissions are, of course, the responsibility of the authors.

Author information

Authors and Affiliations

Brandeis University, Waltham, MA, USA
James Pustejovsky
Colorado State University, Fort Collins, CO, USA
Nikhil Krishnaswamy

Authors

James Pustejovsky
View author publications
You can also search for this author in PubMed Google Scholar
Nikhil Krishnaswamy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to James Pustejovsky.

Additional information

This work was supported by the US Defense Advanced Research Projects Agency (DARPA) and the Army Research Office (ARO) under contract #W911NF-15-C-0238 at Brandeis University. It was first presented in [79], on which this discussion is based.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pustejovsky, J., Krishnaswamy, N. Embodied Human Computer Interaction. Künstl Intell 35, 307–327 (2021). https://doi.org/10.1007/s13218-021-00727-5

Download citation

Received: 07 December 2020
Accepted: 11 May 2021
Published: 16 September 2021
Issue Date: November 2021
DOI: https://doi.org/10.1007/s13218-021-00727-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Embodied Human Computer Interaction

Abstract

Access this article

Similar content being viewed by others

The Role of Embodiment and Simulation in Evaluating HCI: Theory and Framework

The Role of Embodiment and Simulation in Evaluating HCI: Experiments and Evaluation

Prototyping User Interfaces for Investigating the Role of Virtual Agents in Human-Machine Interaction

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Embodied Human Computer Interaction

Abstract

Access this article

Similar content being viewed by others

The Role of Embodiment and Simulation in Evaluating HCI: Theory and Framework

The Role of Embodiment and Simulation in Evaluating HCI: Experiments and Evaluation

Prototyping User Interfaces for Investigating the Role of Virtual Agents in Human-Machine Interaction

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation