ABSTRACT
As a new generation of multimodal systems begins to emerge, one dominant theme will be the integration and synchronization requirements for combining modalities into robust whole systems. In the present research, quantitative modeling is presented on the organization of users' speech and pen multimodal integration patterns. In particular, the potential malleability of users' multimodal integration patterns is explored, as well as variation in these patterns during system error handling and tasks varying in difficulty. Using a new dual-wizard simulation method, data was collected from twelve adults as they interacted with a map-based task using multimodal speech and pen input. Analyses based on over 1600 multimodal constructions revealed that users' dominant multimodal integration pattern was resistant to change, even when strong selective reinforcement was delivered to encourage switching from a sequential to simultaneous integration pattern, or vice versa. Instead, both sequential and simultaneous integrators showed evidence of entrenching further in their dominant integration patterns (i.e., increasing either their inter-modal lag or signal overlap) over the course of an interactive session, during system error handling, and when completing increasingly difficult tasks. In fact, during error handling these changes in the co-timing of multimodal signals became the main feature of hyper-clear multimodal language, with elongation of individual signals either attenuated or absent. Whereas Behavioral/Structuralist theory cannot account for these data, it is argued that Gestalt theory provides a valuable framework and insights into multimodal interaction. Implications of these findings are discussed for the development of a coherent theory of multimodal integration during human-computer interaction, and for the design of a new class of adaptive multimodal interfaces.
- Benoit, J., C. Martin, C. Pelachaud, L. Schomaker & B. Suhm. Audio-visual and multimodal speech-based systems. Handbook of Multimodal and Spoken Dialogue Systems: Resources, Terminology and Product Evaluation (D. Gibbon, I. Mertins & R. Moore, eds.), Kluwer, Boston MA, 2000, 102--203.Google Scholar
- Bregman, A. S. Auditory Scene Analysis. MIT Press, Cambridge MA, 1990.Google Scholar
- Koffka, K. Principles of Gestalt Psychology. Harcourt, Brace & Company, NY, 1935.Google Scholar
- Kohler, W. Dynamics in Psychology. Liveright, NY, 1929.Google Scholar
- Massaro, D. & D. Stork. Sensory integration and speech reading by humans and machines. Amer. Scien., 1998, 86, 236--244.Google ScholarCross Ref
- McGrath, M. & Q. Summerfield. Intermodal timing relations and audio-visual speech recognition by normal-hearing adults. JASA, 1985, 77(2), 678--685.Google ScholarCross Ref
- McNeill, D. Hand and Mind: What Gestures Reveal about Thought. Univ. of Chicago Press, Chicago IL, 1992.Google Scholar
- Naughton, K. Spontaneous gesture and sign: A study of ASL signs co-occurring with speech. In Proc. of the Workshop on the Integration of Gesture in Language & Speech (L. Messing, ed.), Univ. of Delaware, 1996, 125--34.Google Scholar
- Oviatt, S. L. Ten myths of multimodal interaction. CACM, 1999, 42(11), 74--81. Google ScholarDigital Library
- Oviatt, S. L. Multimodal Interfaces. Handbook of Human-Computer Interaction (J. Jacko & A. Sears, eds.), Law. Erlb., Mahwah NJ, 2003, 286--304. Google ScholarDigital Library
- Oviatt, S. L., R. Coulston & C. Darves. Predicting children's hyperarticulate speech during human-computer error resolution. Conf. of ASA, Nashville TN., April 2003.Google Scholar
- Oviatt, S.L., A. DeAngeli & K. Kuhn. Integration and synchronization of input modes during multimodal human-computer interaction, In Proc. of CHI '97, 415--422. Google ScholarDigital Library
- Oviatt, S. L., G. Levow, E. Moreton & M. MacEachern. Modeling global and focal hyperarticulation during human-computer error resolution. JASA, 1998, 104(5), 1--19.Google ScholarCross Ref
- Xiao, B., C. Girand & S. L. Oviatt. Multimodal integration patterns in children. In Proc. of ICSLP'2002, 629--632.Google Scholar
- Xiao, B., R. Lunsford, R. Coulston, M. Wesson & S. L. Oviatt. Modeling multimodal integration patterns and performance in seniors: Toward adaptive processing of individual differences, to be presented at the Fifth International Conference on Multimodal Interfaces, Vancouver, B.C., Nov. 2003. Google ScholarDigital Library
Index Terms
- Toward a theory of organized multimodal integration patterns during human-computer interaction
Recommendations
Integration and synchronization of input modes during multimodal human-computer interaction
CHI '97: Proceedings of the ACM SIGCHI Conference on Human factors in computing systemsWhen do we interact multimodally?: cognitive load and multimodal communication patterns
ICMI '04: Proceedings of the 6th international conference on Multimodal interfacesMobile usage patterns often entail high and fluctuating levels of difficulty as well as dual tasking. One major theme explored in this research is whether a flexible multimodal interface supports users in managing cognitive load. Findings from this ...
Integration and synchronization of input modes during multimodal human-computer interaction
ReferringPhenomena '97: Referring Phenomena in a Multimedia Context and their Computational TreatmentOur ability to develop robust multimodal systems will depend on knowledge of the natural integration patterns that typify people's combined use of different input modes. To provide a foundation for theory and design, the present research analyzed ...
Comments