Abstract
In recent years, a new generation of multimodal systems has emerged as a major direction within the HCI community. Multimodal interfaces and architectures are time-critical and data-intensive to develop, which poses new research challenges. The goal of the present work is to model and adapt to users’ multimodal integration patterns, so that faster and more robust systems can be developed with on-line adaptation to individual’s multimodal temporal thresholds. In this paper, we summarize past user-modeling results on speech and pen multimodal integration patterns, which indicate that there are two dominant types of multimodal integration pattern among users that can be detected very early and remain highly consistent. The empirical results also indicate that, when interacting with a multimodal system, users intermix unimodal with multimodal commands. Based on these results, we present new machine-learning results comparing three models of on-line system adaptation to users’ integration patterns, which were based on Bayesian Belief Networks. This work utilized data from ten adults who provided approximately 1,000 commands while interacting with a map-based multimodal system. Initial experimental results with our learning models indicated that 85% of users’ natural mixed input could be correctly classified as either unimodal or multimodal, and 82% of users’ mulitmodal input could be correctly classified as either sequentially or simultaneously integrated. The long-term goal of this research is to develop new strategies for combining empirical user modeling with machine learning techniques to bootstrap accelerated, generalized, and improved reliability of information fusion in new types of multimodal system.
Keywords
- User Modeling
- Machine Learning Technique
- Speech Recognition System
- Integration Pattern
- Multimodal Interface
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Benoit, C., Martin, J.-C., Pelachaud, C., Schomaker, L., Suhm, B.: Audio-visual and multimodal speech-based systems. In: The Handbook of Multimodal and Spoken Dialogue Systems: Resources, Terminology and Product Evaluation, Boston, MA, pp. 102–203 (2000)
Oviatt, S.: Multimodal interfaces. In: The Handbook of Human-Computer Interaction, pp. 286–304. Law. Erlb. (2003)
Massaro, D., Stork, D.: Sensory ntegration and speech reading by humans and machines. American Sciences 86, 236–244 (1998)
Oviatt, S.: Integration and sychronization of input modes during multimodal human computer interaction. In: Proc. of CHI 1997, pp. 415–422 (1997)
Illina, I.: Tree-structured maximum a posteriori adaptation for a segment-based speech recognition system. In: Proc. of ICSLP 2002, pp. 1405–1408 (2002)
Xiao, B., Lunsford, R., Coulston, R., Wesson, M., Oviatt, S.L.: Modeling multimodal integration patterns and performance in seniors: Toward adaptive processing of individual differences. In: Proc. of ICMI 2003, Vancouver, B.C., pp. 265–272 (2003)
Bengio, S.: An asynchronous hidden markov model for audio-visual speech recognition. In: Advances in Neural Information Processing Systems, vol. 15, pp. 1213–1220 (2003)
Bengio, S.: Multimodal authentication using asynchronous hmms. In: AVBPA, pp. 770–777 (2003)
Howard, A., Jebara, T.: Dynamical systems trees. In: Uncertainty in Artificial Intelligence (2004)
Huang, X., Weng, J., Zhang, Z.: Office presence detection using multimodal context information. In: Proc. of ICASSP 2004, Montreal, Quebec, Canada, USA (2004)
Oliver, N., Garg, A., Horvitz, E.: Layered representations for learning and inferring office activity from multiple sensory channels. Int. Journal on Computer Vision and Image Understanding 96(2), 227–248 (2004)
Oviatt, S.: Ten myths of multimodal interaction. Communications of the ACM 42(11), 74–81 (1999)
Oviatt, S., Coulston, R., Lunsford, R.: When do we interact multimodally? Cognitive load and multimodal communication patterns. In: Proc. of ICMI 2004, Pennsylvania, USA, pp. 129–136. ACM Press, New York (2004)
Oviatt, S., Coulston, R., Tomko, S., Xiao, B., Lunsford, R., Wesson, M., Carmichael, L.: Toward a theory of organized multmodal integration patterns during human-computer interaction. In: Proc. of ICMI 2003, Vancouver, B.C., pp. 44–51. ACM Press, New York (2003)
Oviatt, S., Lunsford, R., Coulston, R.: Individual differences in multimodal integration patterns: What are they and why do they exist? In: Prof. of CHI 2005, pp. 241–249. ACM Press, New York (2005)
Heckerman, D.: A tutorial on learning with Bayesian networks. Learning in Graphical Modals. MIT Press, Cambridge (1999)
Murphy, K.: The Bayes net toolbox for matlab. Computing Science and Statistics 33 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Huang, X., Oviatt, S. (2006). Toward Adaptive Information Fusion in Multimodal Systems. In: Renals, S., Bengio, S. (eds) Machine Learning for Multimodal Interaction. MLMI 2005. Lecture Notes in Computer Science, vol 3869. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11677482_2
Download citation
DOI: https://doi.org/10.1007/11677482_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32549-9
Online ISBN: 978-3-540-32550-5
eBook Packages: Computer ScienceComputer Science (R0)