skip to main content
10.1145/1027933.1028000acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
Article

Utilizing gestures to better understand dynamic structure of human communication

Published: 13 October 2004 Publication History

Abstract

<b>Motivation:</b> Many researchers have highlighted the importance of gesture in natural human communication. McNeill [4] puts forward the hypothesis that gesture and speech stem from the same mental process and so tend to be both temporally and semantically related. However in contrast to speech, which surfaces as a linear progression of segments, sounds, and words, gestures appear to be nonlinear, holistic, and imagistic. Gesture adds an important dimension to language understanding due to this property of sharing a common origin with speech while using a very different mechanism for transferring information. Ignoring this information when constructing a model of human communication would limit its potential effectiveness.
<b>Goal and Method:</b> This thesis concerns the development of methods to effectively incorporate gestural information from a human communication into a computer model to more accurately interpret the content and structure of that communication. Levelt [5] suggests that structure in human communication stems from the dynamic conscious process of language production, during which a conversant organizes the concepts to be expressed, plans the discourse, and selects appropriate words, prosody, and gestures while also correcting errors that occur in this process. Clues related to this conscious processing emerge in both the final speech stream and gestures. This thesis will attempt to utilize these clues to determine the structural elements of human-to-human dialogs, including sentence boundaries, topic boundaries, and disfluency structure. For this purpose, the data driven approach is used. This work requires three important components: corpus generation, feature extraction, and model construction.
<b>Previous Work:</b> Some work related to each of these components has already been conducted. A data collection and processing protocol for constructing multimodal corpora has been created; details on the video and audio processing can be found in the <i>Data and Annotation</i> section of [3]. To improve the speed of producing a corpus while maintaining its quality, we have surveyed factors impacting the accuracy of forced alignments of transcriptions to audio files [2]. These alignments provide a crucial temporal synchronization between video events and spoken words (and their components) for this research effort. We have also conducted measurement studies in an attempt to understand how to model multimodal conversations. For example, we have investigated the types of gesture patterns that occur during speech repairs [1]. Recently, we constructed a preliminary model combining speech and gesture features for detecting sentence boundaries in videotaped dialogs. This model combines language and prosody models together with a simple gestural model to more effectively detect sentence boundaries [3].
<b>Future Work:</b> To date, our multimodal corpora involve human monologues and dialogues (see http://vislab.cs.wright.edu/kdi). We are participating in the collection and preparation of a corpus of multi-party meetings (see http://vislab.cs.wright.edu/Projects/Meeting-Analysis). To facilitate the multi-channel audio processing, we are constructing a tool to support accurate audio transcription and alignment. The data from this meeting corpus will enable the development of more sophisticated gesture models allowing us to expand the set of gesture features (e.g., spatial properties of the tracked gestures). Additionally, we will investigate more advanced machine learning methods in an attempt to improve the performance of our models. We also plan to expand our models to phenomena such as topic segmentation.

References

[1]
L. Chen, M. Harper, and F. Quek. Gesture patterns during speech repairs. In Proc.of the Fourth International Conference of Multimodal Interface (ICMI), Pittsburg, PA, Oct. 2002.
[2]
L. Chen, Y. Liu, M. Harper, E. Maia, and S. McRoy. Evaluating factors impacting the accuracy of forced alignments in a multimodal corpus. In Proc. of Language Resource and Evaluation Conference (LREC), Lisbon, Portugal, June 2004.
[3]
L. Chen, M. Harper, Y. Liu and E. Shriberg. Multimodal model integration for sentence unit detection. In Proc. of Sixth International Conference of Multimodal Interface (ICMI), College Park PA, Oct. 2004.
[4]
D. McNeill. Hand and Mind: What Gestures Reveal about Thought. Univ. Chicago Press, 1992.
[5]
W. Levelt. Speaking: from intention to articulation. MIT Press, Cambridge, MA, 1989.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI '04: Proceedings of the 6th international conference on Multimodal interfaces
October 2004
368 pages
ISBN:1581139950
DOI:10.1145/1027933
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 October 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dialog
  2. gesture
  3. language models
  4. multimodal fusion
  5. prosody
  6. sentence boundary detection

Qualifiers

  • Article

Conference

ICMI04
Sponsor:

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 173
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media