Modeling Theory of Mind in Multimodal HCI

Zhu, Yifan; VanderHoeven, Hannah; Lai, Kenneth; Bradford, Mariah; Tam, Christopher; Khebour, Ibrahim; Brutti, Richard; Krishnaswamy, Nikhil; Pustejovsky, James

doi:10.1007/978-3-031-60405-8_14

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14684))

Included in the following conference series:

International Conference on Human-Computer Interaction

1120 Accesses

Abstract

As multimodal interactions between humans and computers become more sophisticated, involving not only speech, but gestures, haptics, eye movement, and other input types, each modality introduces subtleties which can be misinterpreted without a deeper understanding of the agent’s mental state. In this paper, we argue that Simulation Theory of Mind (SToM) [23], interpreted within a model of embodied HCI [41, 42], can help model the capacity to attribute beliefs and intentions to oneself and others. We adopt a version of Dynamic Epistemic Logic that admits of degrees of belief, reflecting changing evidence available to an agent [5, 6]. This model is able to address the complexities of mutual perception and belief, and how a dynamic common ground is constructed and changes [15]. To demonstrate this, we apply the SToM model to the problem of Common Ground Tracking (CGT) in multi-party dialogues, focusing here on a joint problem-solving task called the Weights Task, where participants cooperate to find the weights of a set of blocks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

The Role of Embodiment and Simulation in Evaluating HCI: Theory and Framework

The Role of Embodiment and Simulation in Evaluating HCI: Experiments and Evaluation

Embodied Human Computer Interaction

Article 16 September 2021

Notes

1.
The blocks are uniquely colored as: red (r), yellow (y), blue (b), green (g), and purple (p). The scale is denoted as s.
2.
ELAN serves as an annotation tool designed for the enhancement of audio and video recordings. It facilitates users in incorporating an extensive array of textual annotations onto audio and/or video recordings. These annotations may encompass sentences, individual words or glosses, comments, translations, or descriptions of observed features within the media.

References

Asher, N.: Common ground, corrections and coordination. J. Semant. 15, 239–299 (1998)
Article Google Scholar
Baltag, A., Moss, L.S., Solecki, S.: The logic of public announcements, common knowledge, and private suspicions. In: Arló-Costa, H., Hendricks, V.F., van Benthem, J. (eds.) Readings in Formal Epistemology. SGTP, vol. 1, pp. 773–812. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-20451-2_38
Chapter Google Scholar
Barsalou, L.W.: Perceptions of perceptual symbols. Behav. Brain Sci. 22(4), 637–660 (1999)
Article Google Scholar
Belle, V., Bolander, T., Herzig, A., Nebel, B.: Epistemic planning: perspectives on the special issue. Artif. Intell. 316, 103842 (2023)
Article MathSciNet Google Scholar
van Benthem, J., Fernández-Duque, D., Pacuit, E.: Evidence and plausibility in neighborhood structures. Ann. Pure Appl. Logic 165(1), 106–133 (2014)
Article MathSciNet Google Scholar
van Benthem, J., Pacuit, E.: Dynamic logics of evidence-based beliefs. Stud. Logica. 99, 61–92 (2011)
Article MathSciNet Google Scholar
Bolander, T.: Seeing is believing: formalising false-belief tasks in dynamic epistemic logic. In: Jaakko Hintikka on Knowledge and Game-theoretical Semantics, pp. 207–236 (2018)
Google Scholar
Bolander, T., Andersen, M.B.: Epistemic planning for single-and multi-agent systems. J. Appl. Non-Classical Logics 21(1), 9–34 (2011)
Article MathSciNet Google Scholar
Bolander, T., Jensen, M.H., Schwarzentruber, F.: Complexity results in epistemic planning. In: IJCAI, pp. 2791–2797 (2015)
Google Scholar
Brutti, R., Donatelli, L., Lai, K., Pustejovsky, J.: Abstract meaning Representation for gesture. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp. 1576–1583. European Language Resources Association, Marseille, France, June 2022
Google Scholar
Clark, H.H., Brennan, S.E.: Grounding in communication. Perspect. Socially Shared Cogn. 13(1991), 127–149 (1991)
Article Google Scholar
Dautenhahn, K.: Socially intelligent robots: dimensions of human-robot interaction. Philos. Trans. R. Soc. B: Biol. Sci. 362(1480), 679–704 (2007)
Article Google Scholar
De Groote, P.: Type raising, continuations, and classical logic. In: Proceedings of the Thirteenth Amsterdam Colloquium, pp. 97–101 (2001)
Google Scholar
Dey, I., Puntambekar, S.: Examining nonverbal interactions to better understand collaborative learning. In: Proceedings of the 16th International Conference on Computer-Supported Collaborative Learning-CSCL 2023, pp. 273–276. International Society of the Learning Sciences (2023)
Google Scholar
Dissing, L., Bolander, T.: Implementing theory of mind on a robot using dynamic epistemic logic. In: IJCAI, pp. 1615–1621 (2020)
Google Scholar
Eijck, J.: Perception and change in update logic. In: van Eijck, J., Verbrugge, R. (eds.) Games, Actions and Social Software. LNCS, vol. 7010, pp. 119–140. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29326-9_7
Chapter Google Scholar
Feldman, J.: Embodied language, best-fit analysis, and formal compositionality. Phys. Life Rev. 7(4), 385–410 (2010)
Article Google Scholar
Feldman, R.: Respecting the evidence. Philos. Perspect. 19, 95–119 (2005)
Article Google Scholar
Geib, C., George, D., Khalid, B., Magnotti, R., Stone, M.: An integrated architecture for common ground in collaboration (2022)
Google Scholar
Gianotti, M., Patti, A., Vona, F., Pentimalli, F., Barbieri, J., Garzotto, F.: Multimodal interaction for persons with autism: the 5A case study. In: Antona, M., Stephanidis, C. (eds.) Universal Access in Human-Computer Interaction, HCII 2023. LNCS, vol. 14020, pp. 581–600. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-35681-0_38
Ginzburg, J.: Interrogatives: Questions, Facts and Dialogue. The Handbook of Contemporary Semantic Theory, pp. 359–423. Blackwell, Oxford (1996)
Google Scholar
Ginzburg, J.: The Interactive Stance: Meaning for Conversation. OUP, Oxford (2012)
Google Scholar
Goldman, A.I.: In defense of the simulation theory. Mind Lang. 7(1–2), 104–119 (1992)
Article Google Scholar
Goldman, A.I.: Simulating Minds: The Philosophy, Psychology, and Neuroscience of Mindreading. Oxford University Press, Oxford (2006)
Google Scholar
Gopnik, A.: How we know our minds: the illusion of first-person knowledge of intentionality. Behav. Brain Sci. 16(1), 1–14 (1993)
Article Google Scholar
Gordon, R.M.: Folk psychology as simulation. Mind Lang. 1(2), 158–171 (1986)
Article Google Scholar
Heal, J.: Simulation, Theory, and Content. Theories of Theories of Mind, pp. 75–89 (1996)
Google Scholar
Henderson, M., Thomson, B., Williams, J.D.: The second dialog state tracking challenge. In: Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), pp. 263–272 (2014)
Google Scholar
Khebour, I., et al.: The weights task dataset: a multimodal dataset of collaboration in a situated task. J. Open Humanities Data 10 (2024)
Google Scholar
Kolve, E., et al.: AI2-THOR: an interactive 3D environment for visual AI. arXiv preprint arXiv:1712.05474 (2017)
Krishnaswamy, N., et al.: Diana’s World: a situated multimodal interactive agent. In: AAAI Conference on Artificial Intelligence (AAAI): Demos Program. AAAI (2020)
Google Scholar
Krishnaswamy, N., Pustejovsky, J.: VoxSim: a visual platform for modeling motion language. In: Proceedings of COLING 2016, The 26th International Conference on Computational Linguistics: Technical Papers. ACL (2016)
Google Scholar
Krishnaswamy, N., Pustejovsky, J.: Multimodal continuation-style architectures for human-robot interaction. arXiv preprint arXiv:1909.08161 (2019)
Krshnaswamy, N., Pickard, W., Cates, B., Blanchard, N., Pustejovsky, J.: VoxWorld platform for multimodal embodied agents. In: LREC Proceedings, vol. 13 (2022)
Google Scholar
Miller, P.W.: Body language in the classroom. Tech. Connecting Educ. Careers 80(8), 28–30 (2005)
Google Scholar
Narayanan, S.: Mind changes: a simulation semantics account of counterfactuals. Cognitive Science (2010)
Google Scholar
Pacuit, E.: Neighborhood Semantics for Modal Logic. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67149-9
Plaza, J.: Logics of public communications. In: Proceedings 4th International Symposium on Methodologies for Intelligent Systems, pp. 201–216 (1989)
Google Scholar
Premack, D., Woodruff, G.: Does the chimpanzee have a theory of mind? Behav. Brain Sci. 1(4), 515–526 (1978)
Article Google Scholar
Pustejovsky, J., Krishnaswamy, N.: VoxML: a visualization modeling language. arXiv preprint arXiv:1610.01508 (2016)
Pustejovsky, J., Krishnaswamy, N.: Embodied human computer interaction. KI-Künstliche Intelligenz 35(3–4), 307–327 (2021)
Google Scholar
Pustejovsky, J., Krishnaswamy, N.: The role of embodiment and simulation in evaluating HCI: theory and framework. In: Duffy, V.G. (ed.) HCII 2021. LNCS, vol. 12777, pp. 288–303. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-77817-0_21
Chapter Google Scholar
Radu, I., Tu, E., Schneider, B.: Relationships between body postures and collaborative learning states in an augmented reality study. In: Bittencourt, I., Cukurova, M., Muldner, K., Luckin, R., Millán, E. (eds.) Artificial Intelligence in Education: 21st International Conference, AIED 2020, Ifrane, Morocco, 6–10 July 2020, Proceedings, Part II 21, pp. 257–262. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52240-7_47
Savva, M., et al.: Habitat: a platform for embodied AI research. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9339–9347 (2019)
Google Scholar
Schneider, B., Pea, R.: Does seeing one another’s gaze affect group dialogue? A computational approach. J. Learn. Analytics 2(2), 107–133 (2015)
Article Google Scholar
Sousa, A., Young, K., D’aquin, M., Zarrouk, M., Holloway, J.: Introducing CALMED: multimodal annotated dataset for emotion detection in children with autism. In: Antona, M., Stephanidis, C. (eds.) International Conference on Human-Computer Interaction, pp. 657–677. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-35681-0_43
Stalnaker, R.: Common ground. Linguist. Philos. 25(5–6), 701–721 (2002)
Google Scholar
Sun, C., Shute, V.J., Stewart, A., Yonehiro, J., Duran, N., D’Mello, S.: Towards a generalized competency model of collaborative problem solving. Comput. Educ. 143, 103672 (2020)
Article Google Scholar
Suzuki, R., Karim, A., Xia, T., Hedayati, H., Marquardt, N.: Augmented reality and robotics: a survey and taxonomy for AR-enhanced human-robot interaction and robotic interfaces. In: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, pp. 1–33 (2022)
Google Scholar
Tam, C., Brutti, R., Lai, K., Pustejovsky, J.: Annotating situated actions in dialogue. In: Proceedings of the 4th International Workshop on Designing Meaning Representation (2023)
Google Scholar
Tolzin, A., Körner, A., Dickhaut, E., Janson, A., Rummer, R., Leimeister, J.M.: Designing pedagogical conversational agents for achieving common ground. In: Gerber, A., Baskerville, R. (eds.) International Conference on Design Science Research in Information Systems and Technology, pp. 345–359. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-32808-4_22
Tu, J., Rim, K., Pustejovsky, J.: Competence-based question generation. In: Proceedings of the 29th International Conference on Computational Linguistics, pp. 1521–1533 (2022)
Google Scholar
Van Fraassen, C.: Belief and the will. J. Philos. 81(5), 235–256 (1984)
Article Google Scholar
VanderHoeven, H., et al.: Multimodal design for interactive collaborative problem-solving support. In: HCII 2024. Springer, Cham (2024)
Google Scholar
Wellman, H.M., Carey, S., Gleitman, L., Newport, E.L., Spelke, E.S.: The Child’s Theory of Mind. The MIT Press, Cambridge (1990)
Google Scholar
Wimmer, H., Perner, J.: Beliefs about beliefs: representation and constraining function of wrong beliefs in young children’s understanding of deception. Cognition 13(1), 103–128 (1983)
Article Google Scholar
Wittenburg, P., Brugman, H., Russel, A., Klassmann, A., Sloetjes, H.: ELAN: a professional framework for multimodality research. In: 5th LREC 2006, pp. 1556–1559 (2006)
Google Scholar
Won, A.S., Bailenson, J.N., Janssen, J.H.: Automatic detection of nonverbal behavior predicts learning in dyadic interactions. IEEE Trans. Affect. Comput. 5(2), 112–125 (2014)
Article Google Scholar
Xia, F., Zamir, A.R., He, Z., Sax, A., Malik, J., Savarese, S.: Gibson ENV: real-world perception for embodied agents. In: Proceedings of the IEEE Conference on Computer Vision And Pattern Recognition, pp. 9068–9079 (2018)
Google Scholar

Download references

Acknowledgements

This work was supported in part by NSF grant DRL 2019805, to Dr. Pustejovsky at Brandeis University, and Dr. Krishnaswamy at Colorado State University. It was also supported in part by NSF grant CNS 2033932 to Dr. Pustejovsky. We would like to thank the reviewers for their comments and suggestions. The views expressed herein are ours alone.

Author information

Authors and Affiliations

Brandeis University, Waltham, MA, 02453, USA
Yifan Zhu, Kenneth Lai, Christopher Tam, Richard Brutti & James Pustejovsky
Colorado State University, Fort Collins, CO, 80523, USA
Hannah VanderHoeven, Mariah Bradford, Ibrahim Khebour & Nikhil Krishnaswamy

Authors

Yifan Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Hannah VanderHoeven
View author publications
You can also search for this author in PubMed Google Scholar
Kenneth Lai
View author publications
You can also search for this author in PubMed Google Scholar
Mariah Bradford
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Tam
View author publications
You can also search for this author in PubMed Google Scholar
Ibrahim Khebour
View author publications
You can also search for this author in PubMed Google Scholar
Richard Brutti
View author publications
You can also search for this author in PubMed Google Scholar
Nikhil Krishnaswamy
View author publications
You can also search for this author in PubMed Google Scholar
James Pustejovsky
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to James Pustejovsky .

Editor information

Editors and Affiliations

The Open University of Japan, Chiba, Japan
Masaaki Kurosu
Hosei University, Tokyo, Japan
Ayako Hashizume

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, Y. et al. (2024). Modeling Theory of Mind in Multimodal HCI. In: Kurosu, M., Hashizume, A. (eds) Human-Computer Interaction. HCII 2024. Lecture Notes in Computer Science, vol 14684. Springer, Cham. https://doi.org/10.1007/978-3-031-60405-8_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-60405-8_14
Published: 01 June 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-60404-1
Online ISBN: 978-3-031-60405-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Modeling Theory of Mind in Multimodal HCI