research-article

Fusion in multimodal interactive systems: an HMM-based algorithm for user-induced adaptation

Authors:

Denis LalanneAuthors Info & Claims

EICS '12: Proceedings of the 4th ACM SIGCHI symposium on Engineering interactive computing systems

Pages 15 - 24

https://doi.org/10.1145/2305484.2305490

Published: 25 June 2012 Publication History

Abstract

Multimodal interfaces have shown to be ideal candidates for interactive systems that adapt to a user either automatically or based on user-defined rules. However, user-based adaptation demands for the corresponding advanced software architectures and algorithms. We present a novel multimodal fusion algorithm for the development of adaptive interactive systems which is based on hidden Markov models (HMM). In order to select relevant modalities at the semantic level, the algorithm is linked to temporal relationship properties. The presented algorithm has been evaluated in three use cases from which we were able to identify the main challenges involved in developing adaptive multimodal interfaces.

References

[1]

Arhippainen, L., Rantakokko, T., and Tähti, M. Navigation with an Adaptive Mobile Map-Application: User Experiences of Gesture- and Context-Sensitiveness. In Proceedings of the 2nd International Conference on Ubiquitous Computing Systems (UCS 2004) (Tokyo, Japan, November 2005), 62--73.

Digital Library

[2]

Baum, L., Petrie, T., Soules, G., and Weiss, N. A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains. The Annals of Mathematical Statistics 41, 1 (1970), 164--171.

[3]

Bolt, R. A. "Put-that-there": Voice and Gesture at the Graphics Interface. In Proceedings of the 7th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH 1980) (Seattle, USA, July 1980), 262--270.

Digital Library

[4]

Bourguet, M.-L. A Toolkit for Creating and Testing Multimodal Interface Designs. In Companion of the 15th Annual Symposium on User Interface Software and Technology (UIST 2002) (Paris, France, October 2002), 29--30.

[5]

Cesar, P., Vaishnavi, I., Kernchen, R., Meissner, S., Hesselman, C., Boussard, M., Spedalieri, A., Bulterman, D. C., and Gao, B. Multimedia Adaptation in Ubiquitous Environments: Benefits of Structured Multimedia Documents. In Proceedings of the 8th ACM Symposium on Document Engineering (DocEng 2008) (Saõ Paulo, Brazil, September 2008), 275--284.

Digital Library

[6]

Chai, J., Hong, P., and Zhou, M. A Probabilistic Approach to Reference Resolution in Multimodal User Interfaces. In Proceedings of the 9th International Conference on Intelligent User Interfaces (IUI 2004) (Funchal, Madeira, Portugal, January 2004), 70--77.

Digital Library

[7]

Chen, Q., Georganas, N. D., and Petriu, E. M. Hand Gesture Recognition Using Haar-Like Features and a Stochastic Context-Free Grammar. IEEE Transactions on Instrumentation and Measurement 57, 8 (August 2008), 1562--1571.

[8]

Choudhury, T., Clarkson, B., Jebara, T., and Pentland, A. Multimodal Person Recognition using Unconstrained Audio and Video. In Proceedings of the International Conference on Audio- and Video-Based Person Authentication (AVBPA 1999) (Washington DC, USA, 1999), 176--181.

[9]

Cohen, P. R., Johnston, M., McGee, D., Oviatt, S., Pittman, J., Smith, I., Chen, L., and Clow, J. QuickSet: Multimodal Interaction for Simulation Set-Up and Control. In Proceedings of the 5th Conference on Applied Natural Language Processing (ANLC 1997) (Washington DC, USA, April 1997), 20--24.

Digital Library

[10]

Coutaz, J., Nigay, L., Salber, D., Blandford, A., May, J., and Young, R. M. Four Easy Pieces for Assessing the Usability of Multimodal Interaction: The CARE Properties. In Proceedings of the 5th International Conference on Human-Computer Interaction (Interact 1995) (Lillehammer, Norway, June 1995), 115--120.

[11]

Dumas, B., Ingold, R., and Lalanne, D. Benchmarking Fusion Engines of Multimodal Interactive Systems. In Proceedings of the 11th International Conference on Multimodal Interfaces (ICMI-MLMI 2009) (Cambridge, USA, November 2009), 169--176.

Digital Library

[12]

Dumas, B., Lalanne, D., and Ingold, R. HephaisTK: A Toolkit for Rapid Prototyping of Multimodal Interfaces. In Proceedings of the 11th International Conference on Multimodal Interfaces (ICMI-MLMI 2009) (Cambridge, USA, November 2009), 231--232.

Digital Library

[13]

Dumas, B., Lalanne, D., and Ingold, R. Description Languages for Multimodal Interaction: A Set of Guidelines and its Illustration with SMUIML. Journal on Multimodal User Interfaces: "Special Issue on The Challenges of Engineering Multimodal Interaction" 3, 3 (February 2010), 237--247.

[14]

Flippo, F., Krebs, A., and Marsic, I. A Framework for Rapid Development of Multimodal Interfaces. In Proceedings of the 5th International Conference on Multimodal Interfaces (ICMI 2003) (Vancouver, Canada, November 2003), 109--116.

Digital Library

[15]

Forney Jr, G. The Viterbi Algorithm. Proceedings of the IEEE 61, 3 (1973), 268--278.

[16]

Hoste, L., Dumas, B., and Signer, B. Mudra: A Unified Multimodal Interaction Framework. In Proceedings of the 13th International Conference on Multimodal Interfaces (ICMI 2011) (Alicante, Spain, November 2011), 97--104.

Digital Library

[17]

Huang, J., Liu, Z., Wang, Y., Chen, Y., and Wong, E. Integration of Multimodal Features for Video Scene Classification Based on HMM. In Proceedings of the 3rd IEEE Workshop on Multimedia Signal Processing (MMSP 1999) (Copenhagen, Denmark, September 1999), 53--58.

[18]

Johnston, M., Cohen, P., McGee, D., Oviatt, S., Pittman, J., and Smith, I. Unification-Based Multimodal Integration. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL 1997) (Madrid, Spain, July 1997), 281--288.

Digital Library

[19]

Juang, B., and Rabiner, L. Hidden Markov Models for Speech Recognition. Technometrics 33 (1991), 251--272.

Digital Library

[20]

König, W. A., Rädle, R., and Reiterer, H. Squidy: A Zoomable Design Environment for Natural User Interfaces. In Proceedings of the 27th International Conference on Human Factors in Computing Systems (CHI 2009) (Boston, USA, April 2009), 4561--4566.

Digital Library

[21]

Lalanne, D., Nigay, L., Palanque, P., Robinson, P., Vanderdonckt, J., and Ladry, J. Fusion Engines for Multimodal Input: A Survey. In Proceedings of the 11th International Conference on Multimodal Interfaces (ICMI-MLMI 2009) (Cambridge, USA, September 2009), 153--160.

Digital Library

[22]

López-Jaquero, V., Vanderdonckt, J., Montero, F., and González, P. Towards an Extended Model of User Interface Adaptation: The Isatine Framework. In Engineering Interactive Systems, J. Gulliksen, M. B. Harning, P. Palanque, G. C. Veer, and J. Wesson, Eds., vol. 4940 of LNCS. Springer Verlag, 2008, 374--392.

Digital Library

[23]

Malinowski, U., Thomas, K., Dieterich, H., and Schneider-Hufschmidt, M. A Taxonomy of Adaptive User Interfaces. In Proceedings of the Conference on People and Computers VII (HCI 1992) (York, United Kingdom, September 1992), 391--414.

Digital Library

[24]

Octavia, J., Raymaekers, C., and Coninx, K. Adaptation in Virtual Environments: Conceptual Framework and User Models. Multimedia Tools and Applications 54 (2011), 121--142.

Digital Library

[25]

Oviatt, S. Human-Centered Design Meets Cognitive Load Theory: Designing Interfaces That Help People Think. In Proceedings of the 14th ACM International Conference on Multimedia (ACM MM 2006) (Santa Barbara, USA, October 2006), 871--880.

Digital Library

[26]

Rabiner, L. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE 77, 2 (1989), 257--286.

[27]

Schwarz, J., Hudson, S., Mankoff, J., and Wilson, A. D. A Framework for Robust and Flexible Handling of Inputs with Uncertainty. In Proceedings of the 23nd Symposium on User Interface Software and Technology (UIST 2010) (New York, USA, October 2010), 47--56.

Digital Library

[28]

Serrano, M., Nigay, L., Lawson, J., Ramsay, A., Murray-Smith, R., and Denef, S. The OpenInterface Framework: A Tool for Multimodal Interaction. In Proceedings of the 26th International Conference on Human Factors in Computing Systems (CHI 2008) (Florence, Italy, April 2008), 3501--3506.

Digital Library

[29]

Sharma, R., Pavlovic, V., and Huang, T. Toward Multimodal Human-Computer Interface. Proceedings of the IEEE 86, 5 (1998), 853--869.

[30]

Vo, M., and Wood, C. Building an Application Framework for Speech and Pen Input Integration in Multimodal Learning Interfaces. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 1996) (Atlanta, USA, May 1996), 3545--3548.

Digital Library

[31]

Wu, L., Oviatt, S., and Cohen, P. From Members to Teams to Committee - A Robust Approach to Gestural and Multimodal Recognition. IEEE Transactions on Neural Networks 13, 4 (2002), 972--982.

Digital Library

Cited By

Alfaro-Contreras MValero-Mas JIñesta JCalvo-Zaragoza J(2023)Late multimodal fusion for image and audio music transcriptionExpert Systems with Applications10.1016/j.eswa.2022.119491216(119491)Online publication date: Apr-2023
https://doi.org/10.1016/j.eswa.2022.119491
Alfaro-Contreras MValero-Mas JIñesta JCalvo-Zaragoza J(2023)Multimodal Strategies for Image and Audio Music Transcription: A Comparative StudyPattern Recognition, Computer Vision, and Image Processing. ICPR 2022 International Workshops and Challenges10.1007/978-3-031-37731-0_6(64-77)Online publication date: 10-Aug-2023
https://doi.org/10.1007/978-3-031-37731-0_6
de la Fuente CValero-Mas JCastellanos FCalvo-Zaragoza J(2021)Multimodal image and audio music transcriptionInternational Journal of Multimedia Information Retrieval10.1007/s13735-021-00221-6Online publication date: 11-Nov-2021
https://doi.org/10.1007/s13735-021-00221-6
Show More Cited By

Index Terms

Fusion in multimodal interactive systems: an HMM-based algorithm for user-induced adaptation
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI theory, concepts and models
  2. Interaction design
    1. Interaction design process and methods
      1. Interface design prototyping
    2. Interaction design theory, concepts and paradigms

Recommendations

Joint training strategy of unimodal and multimodal for multimodal sentiment analysis
Abstract
With the explosive growth of social media video content, research on multimodal sentiment analysis (MSA) has attracted considerable attention recently. Despite significant progress in MSA, there remains challenges: current research mostly focuses ...
Highlights
- Jointly training unimodal and multimodal tasks to optimize multimodal fusion.
- Using two modules for unimodal and multimodal learning.
- The proposed model achieves competitive results compared to latest baselines.
Toward multimodal fusion of affective cues
HCM '06: Proceedings of the 1st ACM international workshop on Human-centered multimedia

During face to face communication, it has been suggested that as much as 70% of what people communicate when talking directly with others is through paralanguage involving multiple modalities combined together (e.g. voice tone and volume, body language)...
IMF: Interactive Multimodal Fusion Model for Link Prediction
WWW '23: Proceedings of the ACM Web Conference 2023

Link prediction aims to identify potential missing triples in knowledge graphs. To get better results, some recent studies have introduced multimodal information to link prediction. However, these methods utilize multimodal information separately and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

EICS '12: Proceedings of the 4th ACM SIGCHI symposium on Engineering interactive computing systems

June 2012

350 pages

ISBN:9781450311687

DOI:10.1145/2305484

General Chairs:
Simone D.J. Barbosa
PUC Rio de Janeiro, Brazil
,
José Creissac Campos
University of Minho, Portugal
,
Program Chairs:
Rick Kazman
Carnegie Mellon University and University of Hawai'i, USA
,
Philippe Palanque
University of Toulouse 3, France
,
Michael Harrison
Newcastle University, UK
,
Steve Reeves
University of Waikato, New Zealand

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

EICS'12

Sponsor:

SIGCHI

EICS'12: ACM SIGCHI Symposium on Engineering Interactive Computing Systems

June 25 - 26, 2012

Copenhagen, Denmark

Acceptance Rates

Overall Acceptance Rate 73 of 299 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
311
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Alfaro-Contreras MValero-Mas JIñesta JCalvo-Zaragoza J(2023)Late multimodal fusion for image and audio music transcriptionExpert Systems with Applications10.1016/j.eswa.2022.119491216(119491)Online publication date: Apr-2023
https://doi.org/10.1016/j.eswa.2022.119491
Alfaro-Contreras MValero-Mas JIñesta JCalvo-Zaragoza J(2023)Multimodal Strategies for Image and Audio Music Transcription: A Comparative StudyPattern Recognition, Computer Vision, and Image Processing. ICPR 2022 International Workshops and Challenges10.1007/978-3-031-37731-0_6(64-77)Online publication date: 10-Aug-2023
https://doi.org/10.1007/978-3-031-37731-0_6
de la Fuente CValero-Mas JCastellanos FCalvo-Zaragoza J(2021)Multimodal image and audio music transcriptionInternational Journal of Multimedia Information Retrieval10.1007/s13735-021-00221-6Online publication date: 11-Nov-2021
https://doi.org/10.1007/s13735-021-00221-6
Wyssen AMeyer AMesserli‐Bürgy NForrer FVanhulst PLalanne DMunsch S(2021)BED‐online: Acceptance and efficacy of an internet‐based treatment for binge‐eating disorder: A randomized clinical trial including waitlist conditionsEuropean Eating Disorders Review10.1002/erv.285629:6(937-954)Online publication date: 21-Aug-2021
https://doi.org/10.1002/erv.2856
Zimmerer CFischbach MLatoschik M(2018)Semantic Fusion for Natural Multimodal Interfaces using Concurrent Augmented Transition NetworksMultimodal Technologies and Interaction10.3390/mti20400812:4(81)Online publication date: 6-Dec-2018
https://doi.org/10.3390/mti2040081
(2018)Fusion strategy-based multimodal human-computer interactionInternational Journal of Computational Vision and Robotics10.1504/IJCVR.2018.0930758:3(300-317)Online publication date: 19-Dec-2018
https://dl.acm.org/doi/10.1504/IJCVR.2018.093075
Yang SGuan Y(2018)Audio–visual perception‐based multimodal HCIThe Journal of Engineering10.1049/joe.2017.03332018:4(190-198)Online publication date: 12-Apr-2018
https://doi.org/10.1049/joe.2017.0333
Schüssel FHonold FBubalo NWeber MHuckauf A(2017)Management of Multimodal User Interaction in Companion-SystemsCompanion Technology10.1007/978-3-319-43665-4_10(187-207)Online publication date: 5-Dec-2017
https://doi.org/10.1007/978-3-319-43665-4_10
Al-Sada MIshizawa FTsurukawa JNakajima THäkkila JOjala T(2016)Input ForagerProceedings of the 15th International Conference on Mobile and Ubiquitous Multimedia10.1145/3012709.3012719(115-122)Online publication date: 12-Dec-2016
https://dl.acm.org/doi/10.1145/3012709.3012719
Schüssel FHonold FBubalo NWeber MBöck RBonin FCampbell NPoppe R(2016)Increasing robustness of multimodal interaction via individual interaction historiesProceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction10.1145/3011263.3011273(56-63)Online publication date: 12-Nov-2016
https://dl.acm.org/doi/10.1145/3011263.3011273
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten