skip to main content
10.1145/2305484.2305490acmconferencesArticle/Chapter ViewAbstractPublication PageseicsConference Proceedingsconference-collections
research-article

Fusion in multimodal interactive systems: an HMM-based algorithm for user-induced adaptation

Published: 25 June 2012 Publication History

Abstract

Multimodal interfaces have shown to be ideal candidates for interactive systems that adapt to a user either automatically or based on user-defined rules. However, user-based adaptation demands for the corresponding advanced software architectures and algorithms. We present a novel multimodal fusion algorithm for the development of adaptive interactive systems which is based on hidden Markov models (HMM). In order to select relevant modalities at the semantic level, the algorithm is linked to temporal relationship properties. The presented algorithm has been evaluated in three use cases from which we were able to identify the main challenges involved in developing adaptive multimodal interfaces.

References

[1]
Arhippainen, L., Rantakokko, T., and Tähti, M. Navigation with an Adaptive Mobile Map-Application: User Experiences of Gesture- and Context-Sensitiveness. In Proceedings of the 2nd International Conference on Ubiquitous Computing Systems (UCS 2004) (Tokyo, Japan, November 2005), 62--73.
[2]
Baum, L., Petrie, T., Soules, G., and Weiss, N. A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains. The Annals of Mathematical Statistics 41, 1 (1970), 164--171.
[3]
Bolt, R. A. "Put-that-there": Voice and Gesture at the Graphics Interface. In Proceedings of the 7th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH 1980) (Seattle, USA, July 1980), 262--270.
[4]
Bourguet, M.-L. A Toolkit for Creating and Testing Multimodal Interface Designs. In Companion of the 15th Annual Symposium on User Interface Software and Technology (UIST 2002) (Paris, France, October 2002), 29--30.
[5]
Cesar, P., Vaishnavi, I., Kernchen, R., Meissner, S., Hesselman, C., Boussard, M., Spedalieri, A., Bulterman, D. C., and Gao, B. Multimedia Adaptation in Ubiquitous Environments: Benefits of Structured Multimedia Documents. In Proceedings of the 8th ACM Symposium on Document Engineering (DocEng 2008) (Saõ Paulo, Brazil, September 2008), 275--284.
[6]
Chai, J., Hong, P., and Zhou, M. A Probabilistic Approach to Reference Resolution in Multimodal User Interfaces. In Proceedings of the 9th International Conference on Intelligent User Interfaces (IUI 2004) (Funchal, Madeira, Portugal, January 2004), 70--77.
[7]
Chen, Q., Georganas, N. D., and Petriu, E. M. Hand Gesture Recognition Using Haar-Like Features and a Stochastic Context-Free Grammar. IEEE Transactions on Instrumentation and Measurement 57, 8 (August 2008), 1562--1571.
[8]
Choudhury, T., Clarkson, B., Jebara, T., and Pentland, A. Multimodal Person Recognition using Unconstrained Audio and Video. In Proceedings of the International Conference on Audio- and Video-Based Person Authentication (AVBPA 1999) (Washington DC, USA, 1999), 176--181.
[9]
Cohen, P. R., Johnston, M., McGee, D., Oviatt, S., Pittman, J., Smith, I., Chen, L., and Clow, J. QuickSet: Multimodal Interaction for Simulation Set-Up and Control. In Proceedings of the 5th Conference on Applied Natural Language Processing (ANLC 1997) (Washington DC, USA, April 1997), 20--24.
[10]
Coutaz, J., Nigay, L., Salber, D., Blandford, A., May, J., and Young, R. M. Four Easy Pieces for Assessing the Usability of Multimodal Interaction: The CARE Properties. In Proceedings of the 5th International Conference on Human-Computer Interaction (Interact 1995) (Lillehammer, Norway, June 1995), 115--120.
[11]
Dumas, B., Ingold, R., and Lalanne, D. Benchmarking Fusion Engines of Multimodal Interactive Systems. In Proceedings of the 11th International Conference on Multimodal Interfaces (ICMI-MLMI 2009) (Cambridge, USA, November 2009), 169--176.
[12]
Dumas, B., Lalanne, D., and Ingold, R. HephaisTK: A Toolkit for Rapid Prototyping of Multimodal Interfaces. In Proceedings of the 11th International Conference on Multimodal Interfaces (ICMI-MLMI 2009) (Cambridge, USA, November 2009), 231--232.
[13]
Dumas, B., Lalanne, D., and Ingold, R. Description Languages for Multimodal Interaction: A Set of Guidelines and its Illustration with SMUIML. Journal on Multimodal User Interfaces: "Special Issue on The Challenges of Engineering Multimodal Interaction" 3, 3 (February 2010), 237--247.
[14]
Flippo, F., Krebs, A., and Marsic, I. A Framework for Rapid Development of Multimodal Interfaces. In Proceedings of the 5th International Conference on Multimodal Interfaces (ICMI 2003) (Vancouver, Canada, November 2003), 109--116.
[15]
Forney Jr, G. The Viterbi Algorithm. Proceedings of the IEEE 61, 3 (1973), 268--278.
[16]
Hoste, L., Dumas, B., and Signer, B. Mudra: A Unified Multimodal Interaction Framework. In Proceedings of the 13th International Conference on Multimodal Interfaces (ICMI 2011) (Alicante, Spain, November 2011), 97--104.
[17]
Huang, J., Liu, Z., Wang, Y., Chen, Y., and Wong, E. Integration of Multimodal Features for Video Scene Classification Based on HMM. In Proceedings of the 3rd IEEE Workshop on Multimedia Signal Processing (MMSP 1999) (Copenhagen, Denmark, September 1999), 53--58.
[18]
Johnston, M., Cohen, P., McGee, D., Oviatt, S., Pittman, J., and Smith, I. Unification-Based Multimodal Integration. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL 1997) (Madrid, Spain, July 1997), 281--288.
[19]
Juang, B., and Rabiner, L. Hidden Markov Models for Speech Recognition. Technometrics 33 (1991), 251--272.
[20]
König, W. A., Rädle, R., and Reiterer, H. Squidy: A Zoomable Design Environment for Natural User Interfaces. In Proceedings of the 27th International Conference on Human Factors in Computing Systems (CHI 2009) (Boston, USA, April 2009), 4561--4566.
[21]
Lalanne, D., Nigay, L., Palanque, P., Robinson, P., Vanderdonckt, J., and Ladry, J. Fusion Engines for Multimodal Input: A Survey. In Proceedings of the 11th International Conference on Multimodal Interfaces (ICMI-MLMI 2009) (Cambridge, USA, September 2009), 153--160.
[22]
López-Jaquero, V., Vanderdonckt, J., Montero, F., and González, P. Towards an Extended Model of User Interface Adaptation: The Isatine Framework. In Engineering Interactive Systems, J. Gulliksen, M. B. Harning, P. Palanque, G. C. Veer, and J. Wesson, Eds., vol. 4940 of LNCS. Springer Verlag, 2008, 374--392.
[23]
Malinowski, U., Thomas, K., Dieterich, H., and Schneider-Hufschmidt, M. A Taxonomy of Adaptive User Interfaces. In Proceedings of the Conference on People and Computers VII (HCI 1992) (York, United Kingdom, September 1992), 391--414.
[24]
Octavia, J., Raymaekers, C., and Coninx, K. Adaptation in Virtual Environments: Conceptual Framework and User Models. Multimedia Tools and Applications 54 (2011), 121--142.
[25]
Oviatt, S. Human-Centered Design Meets Cognitive Load Theory: Designing Interfaces That Help People Think. In Proceedings of the 14th ACM International Conference on Multimedia (ACM MM 2006) (Santa Barbara, USA, October 2006), 871--880.
[26]
Rabiner, L. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE 77, 2 (1989), 257--286.
[27]
Schwarz, J., Hudson, S., Mankoff, J., and Wilson, A. D. A Framework for Robust and Flexible Handling of Inputs with Uncertainty. In Proceedings of the 23nd Symposium on User Interface Software and Technology (UIST 2010) (New York, USA, October 2010), 47--56.
[28]
Serrano, M., Nigay, L., Lawson, J., Ramsay, A., Murray-Smith, R., and Denef, S. The OpenInterface Framework: A Tool for Multimodal Interaction. In Proceedings of the 26th International Conference on Human Factors in Computing Systems (CHI 2008) (Florence, Italy, April 2008), 3501--3506.
[29]
Sharma, R., Pavlovic, V., and Huang, T. Toward Multimodal Human-Computer Interface. Proceedings of the IEEE 86, 5 (1998), 853--869.
[30]
Vo, M., and Wood, C. Building an Application Framework for Speech and Pen Input Integration in Multimodal Learning Interfaces. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 1996) (Atlanta, USA, May 1996), 3545--3548.
[31]
Wu, L., Oviatt, S., and Cohen, P. From Members to Teams to Committee - A Robust Approach to Gestural and Multimodal Recognition. IEEE Transactions on Neural Networks 13, 4 (2002), 972--982.

Cited By

View all
  • (2023)Late multimodal fusion for image and audio music transcriptionExpert Systems with Applications10.1016/j.eswa.2022.119491216(119491)Online publication date: Apr-2023
  • (2023)Multimodal Strategies for Image and Audio Music Transcription: A Comparative StudyPattern Recognition, Computer Vision, and Image Processing. ICPR 2022 International Workshops and Challenges10.1007/978-3-031-37731-0_6(64-77)Online publication date: 10-Aug-2023
  • (2021)Multimodal image and audio music transcriptionInternational Journal of Multimedia Information Retrieval10.1007/s13735-021-00221-6Online publication date: 11-Nov-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
EICS '12: Proceedings of the 4th ACM SIGCHI symposium on Engineering interactive computing systems
June 2012
350 pages
ISBN:9781450311687
DOI:10.1145/2305484
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. hmm-based fusion
  2. multimodal fusion
  3. multimodal interaction
  4. user interface adaptation

Qualifiers

  • Research-article

Conference

EICS'12
Sponsor:

Acceptance Rates

Overall Acceptance Rate 73 of 299 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Late multimodal fusion for image and audio music transcriptionExpert Systems with Applications10.1016/j.eswa.2022.119491216(119491)Online publication date: Apr-2023
  • (2023)Multimodal Strategies for Image and Audio Music Transcription: A Comparative StudyPattern Recognition, Computer Vision, and Image Processing. ICPR 2022 International Workshops and Challenges10.1007/978-3-031-37731-0_6(64-77)Online publication date: 10-Aug-2023
  • (2021)Multimodal image and audio music transcriptionInternational Journal of Multimedia Information Retrieval10.1007/s13735-021-00221-6Online publication date: 11-Nov-2021
  • (2021)BED‐online: Acceptance and efficacy of an internet‐based treatment for binge‐eating disorder: A randomized clinical trial including waitlist conditionsEuropean Eating Disorders Review10.1002/erv.285629:6(937-954)Online publication date: 21-Aug-2021
  • (2018)Semantic Fusion for Natural Multimodal Interfaces using Concurrent Augmented Transition NetworksMultimodal Technologies and Interaction10.3390/mti20400812:4(81)Online publication date: 6-Dec-2018
  • (2018)Fusion strategy-based multimodal human-computer interactionInternational Journal of Computational Vision and Robotics10.1504/IJCVR.2018.0930758:3(300-317)Online publication date: 19-Dec-2018
  • (2018)Audio–visual perception‐based multimodal HCIThe Journal of Engineering10.1049/joe.2017.03332018:4(190-198)Online publication date: 12-Apr-2018
  • (2017)Management of Multimodal User Interaction in Companion-SystemsCompanion Technology10.1007/978-3-319-43665-4_10(187-207)Online publication date: 5-Dec-2017
  • (2016)Input ForagerProceedings of the 15th International Conference on Mobile and Ubiquitous Multimedia10.1145/3012709.3012719(115-122)Online publication date: 12-Dec-2016
  • (2016)Increasing robustness of multimodal interaction via individual interaction historiesProceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction10.1145/3011263.3011273(56-63)Online publication date: 12-Nov-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media