Article

Toward a theory of organized multimodal integration patterns during human-computer interaction

Authors:
Sharon Oviatt

Oregon Health & Science University, Beaverton, OR

Oregon Health & Science University, Beaverton, OR
View Profile

,
Rachel Coulston

Oregon Health & Science University, Beaverton, OR

Oregon Health & Science University, Beaverton, OR
View Profile

,
Stefanie Tomko

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

,
Benfang Xiao

Oregon Health & Science University, Beaverton, OR

Oregon Health & Science University, Beaverton, OR
View Profile

,
Rebecca Lunsford

Oregon Health & Science University, Beaverton, OR

Oregon Health & Science University, Beaverton, OR
View Profile

,
Matt Wesson

Oregon Health & Science University, Beaverton, OR

Oregon Health & Science University, Beaverton, OR
View Profile

,
Lesley Carmichael

University of Washington, Seattle, WA

University of Washington, Seattle, WA
View Profile

ICMI '03: Proceedings of the 5th international conference on Multimodal interfacesNovember 2003Pages 44–51https://doi.org/10.1145/958432.958443

Published:05 November 2003Publication History

ICMI '03: Proceedings of the 5th international conference on Multimodal interfaces

Pages 44–51

ABSTRACT

As a new generation of multimodal systems begins to emerge, one dominant theme will be the integration and synchronization requirements for combining modalities into robust whole systems. In the present research, quantitative modeling is presented on the organization of users' speech and pen multimodal integration patterns. In particular, the potential malleability of users' multimodal integration patterns is explored, as well as variation in these patterns during system error handling and tasks varying in difficulty. Using a new dual-wizard simulation method, data was collected from twelve adults as they interacted with a map-based task using multimodal speech and pen input. Analyses based on over 1600 multimodal constructions revealed that users' dominant multimodal integration pattern was resistant to change, even when strong selective reinforcement was delivered to encourage switching from a sequential to simultaneous integration pattern, or vice versa. Instead, both sequential and simultaneous integrators showed evidence of entrenching further in their dominant integration patterns (i.e., increasing either their inter-modal lag or signal overlap) over the course of an interactive session, during system error handling, and when completing increasingly difficult tasks. In fact, during error handling these changes in the co-timing of multimodal signals became the main feature of hyper-clear multimodal language, with elongation of individual signals either attenuated or absent. Whereas Behavioral/Structuralist theory cannot account for these data, it is argued that Gestalt theory provides a valuable framework and insights into multimodal interaction. Implications of these findings are discussed for the development of a coherent theory of multimodal integration during human-computer interaction, and for the design of a new class of adaptive multimodal interfaces.

References

Benoit, J., C. Martin, C. Pelachaud, L. Schomaker & B. Suhm. Audio-visual and multimodal speech-based systems. Handbook of Multimodal and Spoken Dialogue Systems: Resources, Terminology and Product Evaluation (D. Gibbon, I. Mertins & R. Moore, eds.), Kluwer, Boston MA, 2000, 102--203.Google Scholar
Bregman, A. S. Auditory Scene Analysis. MIT Press, Cambridge MA, 1990.Google Scholar
Koffka, K. Principles of Gestalt Psychology. Harcourt, Brace & Company, NY, 1935.Google Scholar
Kohler, W. Dynamics in Psychology. Liveright, NY, 1929.Google Scholar
Massaro, D. & D. Stork. Sensory integration and speech reading by humans and machines. Amer. Scien., 1998, 86, 236--244.Google ScholarCross Ref
McGrath, M. & Q. Summerfield. Intermodal timing relations and audio-visual speech recognition by normal-hearing adults. JASA, 1985, 77(2), 678--685.Google ScholarCross Ref
McNeill, D. Hand and Mind: What Gestures Reveal about Thought. Univ. of Chicago Press, Chicago IL, 1992.Google Scholar
Naughton, K. Spontaneous gesture and sign: A study of ASL signs co-occurring with speech. In Proc. of the Workshop on the Integration of Gesture in Language & Speech (L. Messing, ed.), Univ. of Delaware, 1996, 125--34.Google Scholar
Oviatt, S. L. Ten myths of multimodal interaction. CACM, 1999, 42(11), 74--81. Google ScholarDigital Library
Oviatt, S. L. Multimodal Interfaces. Handbook of Human-Computer Interaction (J. Jacko & A. Sears, eds.), Law. Erlb., Mahwah NJ, 2003, 286--304. Google ScholarDigital Library
Oviatt, S. L., R. Coulston & C. Darves. Predicting children's hyperarticulate speech during human-computer error resolution. Conf. of ASA, Nashville TN., April 2003.Google Scholar
Oviatt, S.L., A. DeAngeli & K. Kuhn. Integration and synchronization of input modes during multimodal human-computer interaction, In Proc. of CHI '97, 415--422. Google ScholarDigital Library
Oviatt, S. L., G. Levow, E. Moreton & M. MacEachern. Modeling global and focal hyperarticulation during human-computer error resolution. JASA, 1998, 104(5), 1--19.Google ScholarCross Ref
Xiao, B., C. Girand & S. L. Oviatt. Multimodal integration patterns in children. In Proc. of ICSLP'2002, 629--632.Google Scholar
Xiao, B., R. Lunsford, R. Coulston, M. Wesson & S. L. Oviatt. Modeling multimodal integration patterns and performance in seniors: Toward adaptive processing of individual differences, to be presented at the Fifth International Conference on Multimodal Interfaces, Vancouver, B.C., Nov. 2003. Google ScholarDigital Library

Index Terms

Toward a theory of organized multimodal integration patterns during human-computer interaction
1. Hardware
  1. Communication hardware, interfaces and storage
    1. Sound-based input / output
2. Human-centered computing
  1. Human computer interaction (HCI)
  2. Interaction design
    1. Interaction design process and methods
      1. Interface design prototyping
      2. User centered design
    2. Interaction design theory, concepts and paradigms

Recommendations

Integration and synchronization of input modes during multimodal human-computer interaction
CHI '97: Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
Read More
When do we interact multimodally?: cognitive load and multimodal communication patterns
ICMI '04: Proceedings of the 6th international conference on Multimodal interfaces

Mobile usage patterns often entail high and fluctuating levels of difficulty as well as dual tasking. One major theme explored in this research is whether a flexible multimodal interface supports users in managing cognitive load. Findings from this ...
Read More
Integration and synchronization of input modes during multimodal human-computer interaction
ReferringPhenomena '97: Referring Phenomena in a Multimedia Context and their Computational Treatment

Our ability to develop robust multimodal systems will depend on knowledge of the natural integration patterns that typify people's combined use of different input modes. To provide a foundation for theory and design, the present research analyzed ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMI '03: Proceedings of the 5th international conference on Multimodal interfaces
November 2003
318 pages
ISBN:1581136218
DOI:10.1145/958432
Conference Chair:
Sharon Oviatt
Oregon Health & Science University
,
Program Chairs:
Trevor Darrell
Massachusetts Institute of Technology
,
Mark Maybury
MITRE
,
Wolfgang Wahlster
DFKI, Germany
Copyright © 2003 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 November 2003
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Gestalt theory
co-timing
entrenchment
error handling
multimodal integration
speech and pen input
task difficulty
Qualifiers
- Article
Conference

Acceptance Rates
ICMI '03 Paper Acceptance Rate45of130submissions,35%Overall Acceptance Rate453of1,080submissions,42%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 69
  Total Citations
  View Citations
- 1,642
  Total Downloads
- Downloads (Last 12 months)26
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.