Combining User Modeling and Machine Learning to Predict Users’ Multimodal Integration Patterns

Huang, Xiao; Oviatt, Sharon; Lunsford, Rebecca

doi:10.1007/11965152_5

Xiao Huang¹⁹,
Sharon Oviatt^19,20 &
Rebecca Lunsford^19,20

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4299))

Included in the following conference series:

International Workshop on Machine Learning for Multimodal Interaction

831 Accesses

Abstract

Temporal as well as semantic constraints on fusion are at the heart of multimodal system processing. The goal of the present work is to develop user-adaptive temporal thresholds with improved performance characteristics over state-of-the-art fixed ones, which can be accomplished by leveraging both empirical user modeling and machine learning techniques to handle the large individual differences in users’ multimodal integration patterns. Using simple Naive Bayes learning methods and a leave-one-out training strategy, our model correctly predicted 88% of users’ mixed speech and pen signal input as either unimodal or multimodal, and 91% of their multimodal input as either sequentially or simultaneously integrated. In addition to predicting a user’s multimodal pattern in advance of receiving input, predictive accuracies also were evaluated after the first signal’s end-point detection—the earliest time when a speech/pen multimodal system makes a decision regarding fusion. This system-centered metric yielded accuracies of 90% and 92%, respectively, for classification of unimodal/multimodal and sequential/simultaneous input patterns. In addition, empirical modeling revealed a .92 correlation between users’ multimodal integration pattern and their likelihood of interacting multimodally, which may have accounted for the superior learning obtained with training over heterogeneous user data rather than data partitioned by user subtype. Finally, in large part due to guidance from user-modeling, the techniques reported here required as little as 15 samples to predict a “surprise” user’s input patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Consistent categorization of multimodal integration patterns during human–computer interaction

Article 22 March 2017

Bayesian Multimodal Data Analytics: AnIntroduction

Multimodal Systems: An Excursus of the Main Research Questions

References

Oviatt, S.: Ten myths of multimodal interaction. Comm. of the ACM 42(11), 74–81 (1999)
Article Google Scholar
Oviatt, S.: Integration and synchronization of input modes during multimodal human computer interaction. In: Proc. of CHI, pp. 415–422 (1997)
Google Scholar
Oviatt, S., Coulston, R., Tomko, S., Xiao, B., Lunsford, R., Wesson, M., Carmichael, L.: Toward a theory of organized multimodal integration patterns during human-computer interaction. In: Proc. of ICMI, pp. 44–51 (2003)
Google Scholar
Xiao, B., Lunsford, R., Coulston, R., Wesson, M., Oviatt, S.: Modeling multimodal integration patterns and performance in seniors: Toward adaptive processing of individual differences. In: Proc. of ICMI, pp. 265–272 (2003)
Google Scholar
Oviatt, S., Coulston, R., Lunsford, R.: When do we interact multimodally? Cognitive load and multimodal communication patterns. In: Proc. of ICMI, pp. 129–136 (2004)
Google Scholar
Gupta, A., Anastasakos, T.: Dynamic time windows for multimodal input fusion. In: Proc. of Interspeech, pp. 2293–2296 (2004)
Google Scholar
Oviatt, S., Lunsford, R., Coulston, R.: Individual differences in multimodal integration patterns: What are they and why do they exist? In: Proc. of CHI, pp. 241–249 (2005)
Google Scholar
Huang, X., Oviatt, S.: Towards adaptive information fusion in multimodal systems. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 15–27. Springer, Heidelberg (2006)
Chapter Google Scholar
Bengio, S.: An asynchronous hidden Markov model for audio-visual speech recognition. In: Proc. of Advances in Neural Information Processing Systems, pp. 1213–1220 (2003)
Google Scholar
Oliver, N., Garg, A., Horvitz, E.: Layered representations for learning and inferring office activity from multiple sensory channels. Int. Journal on Computer Vision and Image Understanding 96(2), 163–180 (2004)
Article Google Scholar
Lester, J., Choudhury, T., Borriello, G.: A Practical approach to recognizing physical activities. In: The Proc. of Pervasive (to appear, 2006)
Google Scholar
Bengio, S.: Multimodal authentication using asynchronous HMMs. In: Kittler, J., Nixon, M.S. (eds.) AVBPA 2003. LNCS, vol. 2688, pp. 770–777. Springer, Heidelberg (2003)
Chapter Google Scholar
Rabiner, L.: A tutorial on hidden Markov model and selected applications in speech recognition. Proc. of the IEEE 77(2), 257–286 (1989)
Article Google Scholar
Duda, R., Hart, P., Stork, D.: Pattern classification. Morgan Kaufmann, San Francisco (2002)
Google Scholar
Heckerman, D.: A tutorial on learning with Bayesian networks. Learning in Graphical Modals. MIT Press, Cambridge (1999)
Google Scholar
Murphy, K.: The Bayes net toolbox for Matlab. Computing Science and Statistics 33 (2001)
Google Scholar
Richardson, M., Domingos, P.: Markov Logic Networks. Machine Learning 62, 107–136 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Natural Interaction Systems, 10260 Sw Greenburg Road Suite 400, Portland, OR, 97223
Xiao Huang, Sharon Oviatt & Rebecca Lunsford
Center for Human-Computer Communication, Computer Science Department, Oregon Health and Science University, Beaverton, OR, 97006
Sharon Oviatt & Rebecca Lunsford

Authors

Xiao Huang
View author publications
You can also search for this author in PubMed Google Scholar
Sharon Oviatt
View author publications
You can also search for this author in PubMed Google Scholar
Rebecca Lunsford
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Edinburgh, Edinburgh, Scotland
Steve Renals
IDIAP Research Institute, Martigny, Switzerland
Samy Bengio
National Institute Of Standards and Technology, 100 Bureau Drive Stop 8940, Gaithersburg, MD, 20899
Jonathan G. Fiscus

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, X., Oviatt, S., Lunsford, R. (2006). Combining User Modeling and Machine Learning to Predict Users’ Multimodal Integration Patterns. In: Renals, S., Bengio, S., Fiscus, J.G. (eds) Machine Learning for Multimodal Interaction. MLMI 2006. Lecture Notes in Computer Science, vol 4299. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11965152_5

Download citation

DOI: https://doi.org/10.1007/11965152_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69267-6
Online ISBN: 978-3-540-69268-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics