ABSTRACT
We have developed a framework to understand situations and intentions of speakers focusing on the utterances of demonstratives. We aim at constructing a 'Multimodal Infant Behavior Corpus', which makes a valuable contribution to the elucidation of human commonsense knowledge and its acquisition mechanism. For this purpose, we have constructed environments for multimodal observation of infant behavior, in particular, environments for infant behavior recording; we have set up multiple cameras and microphones in the Cedar yurt. We have also developed a wearable speech recording device of high quality to capture infant utterances clearly. Moreover, we have developed a comment-collecting system which allows everyone to make comments easily from the multi-viewpoints. Those construction and developments make it possible to realize a framework for multimodal observation of infant behavior. Utilizing the multimodal environments, we propose a situation description model based on observation of demonstratives uttered by infants, since demonstratives appear frequently in their conversations and become a precious clue to understand situations. The proposed model, which represents the mental distances of speakers and listeners to objects on a general and simple model, enables us to predict speakers' next behavior. The consideration results enable us to conclude that the constructed environments lead to development and realization of human interaction models applicable to spoken dialog systems for elder people supporting.
- Winograd. T, Understanding Natural Language, Academic Google ScholarDigital Library
- Stone. M, Communicative Intentions and Conversational Processes in Human-Human and Human-Computer Dialogue, World Situated Language Use: MITPress, Vol. Psycholinguistic, Linguistic and Computational Perspectives on Bridging the Product and Action Traditions, 2002.Google Scholar
- Byron. D. K, Understanding Referring Expressions in Situated Language Some Challenges for Real-World Agents, In the First International Workshop on Language Understanding and Agents for the Real World, 2003.Google Scholar
- Zhang. J, Huebner. K, and Knoll. A: Learning based situation recognition by sectoring omnidirectional images for robot localisation, IEEE-Press, Vol. In Proceeding of the IEEE Workshop on Omnidirectional Vision, 2001.Google Scholar
- Roy. D: SITUATION-AWARE SPOKEN LANGUAGE PROCESSING, In Royal Institute of Acoustics Workshop on Innovation in Speech Processing, 2001.Google Scholar
- Otake. K, A Consideration on Infant Behavior Corpus, FIT2005, Vol.2, 2005, 129--130 (in Japanese).Google Scholar
- Kamio. A, Territory of Information: Pragmatics and Beyond New Series (John Benjamins Pub Co, 1997).Google Scholar
- Suga. T, Solving the Riddle of Language Acquisition: How to Design a Virtual Infant (MLAS): http://www10.ocn.ne.jp/~mlas/JSLS_Paper.htmGoogle Scholar
Index Terms
- A methodological study of situation understanding utilizing environments for multimodal observation of infant behavior
Recommendations
Multimodal spatial reference in mediated environments: users' preferences and the pragmatics of pointing and talking
CHI EA '06: CHI '06 Extended Abstracts on Human Factors in Computing SystemsThis paper describes the current results and future developments of a project on multimodal spatial reference in mediated environments. The database consists of video-recorded sessions, with 120 participants in three experimental designs, contrasting ...
Extending chatterbot system into multimodal interaction framework with embodied contextual understanding
HRI '12: Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot InteractionThis work aims to realize multimodal interaction with embodied contextual understanding based on the simple chatterbot system. A system framework is proposed to integrate the dialogue system into a 3D simulation platform, SIGVerse to attain multimodal ...
Intrapersonal dependencies in multimodal behavior
IVA '20: Proceedings of the 20th ACM International Conference on Intelligent Virtual AgentsHuman interlocutors automatically adapt verbal and non-verbal signals so that different behaviors become synchronized over time. Multimodal communication comes naturally to humans, while this is not the case for Embodied Conversational Agents (ECAs). ...
Comments