poster

Mudra: a unified multimodal interaction framework

Authors:

Beat SignerAuthors Info & Claims

ICMI '11: Proceedings of the 13th international conference on multimodal interfaces

Pages 97 - 104

https://doi.org/10.1145/2070481.2070500

Published: 14 November 2011 Publication History

Abstract

In recent years, multimodal interfaces have gained momentum as an alternative to traditional WIMP interaction styles. Existing multimodal fusion engines and frameworks range from low-level data stream-oriented approaches to high-level semantic inference-based solutions. However, there is a lack of multimodal interaction engines offering native fusion support across different levels of abstractions to fully exploit the power of multimodal interactions. We present Mudra, a unified multimodal interaction framework supporting the integrated processing of low-level data streams as well as high-level semantic inferences. Our solution is based on a central fact base in combination with a declarative rule-based language to derive new facts at different abstraction levels. Our innovative architecture for multimodal interaction encourages the use of software engineering principles such as modularisation and composition to support a growing set of input modalities as well as to enable the integration of existing or novel multimodal fusion engines.

References

[1]

R. A. Bolt. "Put-That-There": Voice and Gesture at the Graphics Interface. In Proc. of SIGGRAPH 1980, 7th Annual Conference on Computer Graphics and Interactive Techniques, pages 262--270, Seattle, USA, 1980.

Digital Library

[2]

J. Chai, P. Hong, and M. Zhou. A Probabilistic Approach to Reference Resolution in Multimodal User Interfaces. In Proc. of IUI 2004, 9th International Conference on Intelligent User Interfaces, pages 70--77, Funchal, Madeira, Portugal, 2004.

Digital Library

[3]

J. Coutaz, L. Nigay, D. Salber, A. Blandford, J. May, and R. Young. Four Easy Pieces for Assessing the Usability of Multimodal Interaction: The CARE Properties. In Proc. of Interact 1995, International Conference on Human-Computer Interaction, pages 115--120, Lillehammer, Norway, June 1995.

[4]

P. Dietz and D. Leigh. DiamondTouch: A Multi-User Touch Technology. In Proc. of UIST 2001, 14th Annual ACM Symposium on User Interface Software and Technology, pages 219--226, Orlando, USA, 2001.

Digital Library

[5]

B. Dumas, D. Lalanne, and S. Oviatt. Multimodal Interfaces: A Survey of Principles, Models and Frameworks. Human Machine Interaction: Research Results of the MMI Program, pages 3--26, March 2009.

Digital Library

[6]

F. Echtler, M. Huber, and G. Klinker. Hand Tracking for Enhanced Gesture Recognition on Interactive Multi-Touch Surfaces. Technical Report TUM-I0721, Technische Universität München, Department of Computer Science, November 2007.

[7]

C. L. Forgy. Rete: A Fast Algorithm for the Many Pattern/Many Object Pattern Match Problem. Artificial Intelligence, 19(1):17--37, 1982.

Digital Library

[8]

M. Johnston and S. Bangalore. Finite-State Methods for Multimodal Parsing and Integration. In Proc. of ESSLLI 2001, 13th European Summer School in Logic, Language and Information, Helsinki, Finland, August 2001.

[9]

M. Johnston, P. Cohen, D. McGee, S. Oviatt, J. Pittman, and I. Smith. Unification-Based Multimodal Integration. In Proc. of ACL 1997, 35th Annual Meeting of the Association for Computational Linguistics, pages 281--288, Madrid, Spain, July 1997.

Digital Library

[10]

W. König, R. Rädle, and H. Reiterer. Squidy: A Zoomable Design Environment for Natural User Interfaces. In Proc. of CHI 2009, ACM Conference on Human Factors in Computing Systems, pages 4561--4566, Boston, USA, 2009.

Digital Library

[11]

D. Lalanne, L. Nigay, P. Palanque, P. Robinson, J. Vanderdonckt, and J. Ladry. Fusion Engines for Multimodal Input: A Survey. In Proc. of ICMI-MLMI 2009, International Conference on Multimodal Interfaces, pages 153--160, Cambridge, Massachusetts, USA, September 2009.

Digital Library

[12]

S. Oviatt. Advances in Robust Multimodal Interface Design. IEEE Computer Graphics and Applications, 23(5):62--68, September 2003.

Digital Library

[13]

S. Oviatt. Multimodal Interfaces. In The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, Second Edition, pages 286--304. Lawrence Erlbaum Associates, 2007.

Digital Library

[14]

E. Petajan, B. Bischoff, D. Bodoff, and N. Brooke. An Improved Automatic Lipreading System to Enhance Speech Recognition. In Proc. of CHI 1988, ACM Conference on Human Factors in Computing Systems, pages 19--25, Washington, USA, June 1988.

Digital Library

[15]

C. Scholliers, L. Hoste, B. Signer, and W. D. Meuter. Midas: A Declarative Multi-Touch Interaction Framework. In Proc. of TEI 2011, 5th International Conference on Tangible, Embedded and Embodied Interaction, pages 49--56, Funchal, Portugal, January 2011.

Digital Library

[16]

M. Serrano, L. Nigay, J. Lawson, A. Ramsay, R. Murray-Smith, and S. Denef. The OpenInterface Framework: A Tool for Multimodal Interaction. In Proc. of CHI 2008, ACM Conference on Human Factors in Computing Systems, Florence, Italy, April 2008.

Digital Library

[17]

R. Sharma, V. Pavlovic, and T. Huang. Toward Multimodal Human-Computer Interface. Proceedings of the IEEE, 86(5):853--869, 1998.

[18]

T. Sowa, M. Fröhlich, and M. Latoschik. Temporal Symbolic Integration Applied to a Multimodal System Using Gestures and Speech. In Proc. of GW 1999, International Gesture Workshop on Gesture-Based Communication in Human-Computer Interaction, pages 291--302, Gif-sur-Yvette, France, March 1999.

Digital Library

[19]

M. Vo and C. Wood. Building an Application Framework for Speech and Pen Input Integration in Multimodal Learning Interfaces. In Proc. of ICASSP 1996, IEEE International Conference on Acoustics, Speech, and Signal Processing, pages 3545--3548, Atlanta, USA, May 1996.

Digital Library

[20]

G. Welch and G. Bishop. An Introduction to the Kalman Filter. Technical Report TR 95-041, Department of Computer Science, University of North Carolina at Chapel Hill, 2000.

Digital Library

[21]

L. Wu, S. Oviatt, and P. Cohen. From Members to Teams to Committee - A Robust Approach to Gestural and Multimodal Recognition. IEEE Transactions on Neural Networks, 13(4):972--982, 2002.

Digital Library

Cited By

Carayon AMartinie CPalanque PBarboni ESteere S(2024)A Systematic Process to Engineer Dependable Integration of Frame-based Input Devices in a Multimodal Input Chain: Application to Rehabilitation in HealthcareProceedings of the ACM on Human-Computer Interaction10.1145/36646338:EICS(1-31)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3664633
Septon TVillarreal-Narvaez SDevroey XDumas B(2024)Exploiting Semantic Search and Object-Oriented Programming to Ease Multimodal Interface DevelopmentCompanion Proceedings of the 16th ACM SIGCHI Symposium on Engineering Interactive Computing Systems10.1145/3660515.3664244(74-80)Online publication date: 24-Jun-2024
https://dl.acm.org/doi/10.1145/3660515.3664244
Yang JShi YZhang YLi KRosli DJain AZhang SLi TLanday JLam M(2024)ReactGenie: A Development Framework for Complex Multimodal Interactions Using Large Language ModelsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642517(1-23)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642517
Show More Cited By

Index Terms

Mudra: a unified multimodal interaction framework
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI theory, concepts and models
  2. Interaction design
    1. Interaction design theory, concepts and paradigms
2. Software and its engineering
  1. Software notations and tools
    1. Context specific languages
      1. Domain specific languages
  2. Software organization and properties
    1. Software system structures
      1. Software architectures

Recommendations

Joint training strategy of unimodal and multimodal for multimodal sentiment analysis
Abstract
With the explosive growth of social media video content, research on multimodal sentiment analysis (MSA) has attracted considerable attention recently. Despite significant progress in MSA, there remains challenges: current research mostly focuses ...
Highlights
- Jointly training unimodal and multimodal tasks to optimize multimodal fusion.
- Using two modules for unimodal and multimodal learning.
- The proposed model achieves competitive results compared to latest baselines.
Changes in verbal and nonverbal conversational behavior in long-term interaction
ICMI '12: Proceedings of the 14th ACM international conference on Multimodal interaction

We present an empirical investigation of conversational behavior in dyadic interaction spanning multiple conversations, in the context of a developing interpersonal relationship between a health counselor and her clients. Using a longitudinal video ...
Prediction of Various Backchannel Utterances Based on Multimodal Information
IVA '23: Proceedings of the 23rd ACM International Conference on Intelligent Virtual Agents

The listener's backchannels are an important part of dialogues. With appropriate backchannels, people are able to smoothly promote dialogues. Thus, backchannels are considered to be important in dialogues between not only humans but also humans and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '11: Proceedings of the 13th international conference on multimodal interfaces

November 2011

432 pages

ISBN:9781450306416

DOI:10.1145/2070481

General Chairs:
Hervé Bourlard
Idiap Research Institute, Switzerland
,
Thomas S. Huang
University of Illinois, USA
,
Enrique Vidal
Universitat Politécnica Valéncia, Spain
,
Program Chairs:
Daniel Gatica-Perez
Idiap Research Institute, Switzerland
,
Louis-Philippe Morency
University of Southern California, USA
,
Nicu Sebe
University of Trento, Italy

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 November 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster

Conference

ICMI'11

Sponsor:

SIGCHI

ICMI'11: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

November 14 - 18, 2011

Alicante, Spain

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

45
Total Citations
View Citations
590
Total Downloads

Downloads (Last 12 months)18
Downloads (Last 6 weeks)2

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Carayon AMartinie CPalanque PBarboni ESteere S(2024)A Systematic Process to Engineer Dependable Integration of Frame-based Input Devices in a Multimodal Input Chain: Application to Rehabilitation in HealthcareProceedings of the ACM on Human-Computer Interaction10.1145/36646338:EICS(1-31)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3664633
Septon TVillarreal-Narvaez SDevroey XDumas B(2024)Exploiting Semantic Search and Object-Oriented Programming to Ease Multimodal Interface DevelopmentCompanion Proceedings of the 16th ACM SIGCHI Symposium on Engineering Interactive Computing Systems10.1145/3660515.3664244(74-80)Online publication date: 24-Jun-2024
https://dl.acm.org/doi/10.1145/3660515.3664244
Yang JShi YZhang YLi KRosli DJain AZhang SLi TLanday JLam M(2024)ReactGenie: A Development Framework for Complex Multimodal Interactions Using Large Language ModelsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642517(1-23)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642517
Rocha ANunes FValente ASilva STeixeira A(2024)Supporting the Communication of People with Aphasia While Lying in BedInformation and Communication Technologies for Ageing Well and e-Health10.1007/978-3-031-62753-8_2(19-42)Online publication date: 26-Jul-2024
https://doi.org/10.1007/978-3-031-62753-8_2
Barros FValente ATeixeira ASilva S(2023)Harnessing the Role of Speech Interaction in Smart Environments Towards Improved Adaptability and Health MonitoringWireless Mobile Communication and Healthcare10.1007/978-3-031-32029-3_24(271-286)Online publication date: 14-May-2023
https://doi.org/10.1007/978-3-031-32029-3_24
Van Verre LAvila HNicolay JDe Meuter WDe Meuter WEugster PSalvaneschi GSant'Anna FZiarek LWeisenburger P(2022)FLOREnce: A Hybrid Logic-Functional Reactive Programming LanguageProceedings of the 9th ACM SIGPLAN International Workshop on Reactive and Event-Based Languages and Systems10.1145/3563837.3568339(24-36)Online publication date: 29-Nov-2022
https://dl.acm.org/doi/10.1145/3563837.3568339
Joshi GWalambe RKotecha K(2021)A Review on Explainability in Multimodal Deep Neural NetsIEEE Access10.1109/ACCESS.2021.30702129(59800-59821)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3070212
Dessì SSpano L(2020)DG3: Exploiting Gesture Declarative Models for Sample Generation and Online RecognitionProceedings of the ACM on Human-Computer Interaction10.1145/33978704:EICS(1-21)Online publication date: 18-Jun-2020
https://dl.acm.org/doi/10.1145/3397870
Ferreira DAlmeida NBrás SSoares STeixeira ASilva S(2020)Enabling Multimodal Emotionally-Aware Ecosystems Through a W3C-Aligned Generic Interaction ModalityWireless Mobile Communication and Healthcare10.1007/978-3-030-49289-2_11(140-152)Online publication date: 28-May-2020
https://doi.org/10.1007/978-3-030-49289-2_11
Almeida NTeixeira ASilva SKetsmur M(2019)The AM4I Architecture and Framework for Multimodal Interaction and Its Application to Smart EnvironmentsSensors10.3390/s1911258719:11(2587)Online publication date: 6-Jun-2019
https://doi.org/10.3390/s19112587
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten