skip to main content
10.1145/2070481.2070500acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
poster

Mudra: a unified multimodal interaction framework

Published: 14 November 2011 Publication History

Abstract

In recent years, multimodal interfaces have gained momentum as an alternative to traditional WIMP interaction styles. Existing multimodal fusion engines and frameworks range from low-level data stream-oriented approaches to high-level semantic inference-based solutions. However, there is a lack of multimodal interaction engines offering native fusion support across different levels of abstractions to fully exploit the power of multimodal interactions. We present Mudra, a unified multimodal interaction framework supporting the integrated processing of low-level data streams as well as high-level semantic inferences. Our solution is based on a central fact base in combination with a declarative rule-based language to derive new facts at different abstraction levels. Our innovative architecture for multimodal interaction encourages the use of software engineering principles such as modularisation and composition to support a growing set of input modalities as well as to enable the integration of existing or novel multimodal fusion engines.

References

[1]
R. A. Bolt. "Put-That-There": Voice and Gesture at the Graphics Interface. In Proc. of SIGGRAPH 1980, 7th Annual Conference on Computer Graphics and Interactive Techniques, pages 262--270, Seattle, USA, 1980.
[2]
J. Chai, P. Hong, and M. Zhou. A Probabilistic Approach to Reference Resolution in Multimodal User Interfaces. In Proc. of IUI 2004, 9th International Conference on Intelligent User Interfaces, pages 70--77, Funchal, Madeira, Portugal, 2004.
[3]
J. Coutaz, L. Nigay, D. Salber, A. Blandford, J. May, and R. Young. Four Easy Pieces for Assessing the Usability of Multimodal Interaction: The CARE Properties. In Proc. of Interact 1995, International Conference on Human-Computer Interaction, pages 115--120, Lillehammer, Norway, June 1995.
[4]
P. Dietz and D. Leigh. DiamondTouch: A Multi-User Touch Technology. In Proc. of UIST 2001, 14th Annual ACM Symposium on User Interface Software and Technology, pages 219--226, Orlando, USA, 2001.
[5]
B. Dumas, D. Lalanne, and S. Oviatt. Multimodal Interfaces: A Survey of Principles, Models and Frameworks. Human Machine Interaction: Research Results of the MMI Program, pages 3--26, March 2009.
[6]
F. Echtler, M. Huber, and G. Klinker. Hand Tracking for Enhanced Gesture Recognition on Interactive Multi-Touch Surfaces. Technical Report TUM-I0721, Technische Universität München, Department of Computer Science, November 2007.
[7]
C. L. Forgy. Rete: A Fast Algorithm for the Many Pattern/Many Object Pattern Match Problem. Artificial Intelligence, 19(1):17--37, 1982.
[8]
M. Johnston and S. Bangalore. Finite-State Methods for Multimodal Parsing and Integration. In Proc. of ESSLLI 2001, 13th European Summer School in Logic, Language and Information, Helsinki, Finland, August 2001.
[9]
M. Johnston, P. Cohen, D. McGee, S. Oviatt, J. Pittman, and I. Smith. Unification-Based Multimodal Integration. In Proc. of ACL 1997, 35th Annual Meeting of the Association for Computational Linguistics, pages 281--288, Madrid, Spain, July 1997.
[10]
W. König, R. Rädle, and H. Reiterer. Squidy: A Zoomable Design Environment for Natural User Interfaces. In Proc. of CHI 2009, ACM Conference on Human Factors in Computing Systems, pages 4561--4566, Boston, USA, 2009.
[11]
D. Lalanne, L. Nigay, P. Palanque, P. Robinson, J. Vanderdonckt, and J. Ladry. Fusion Engines for Multimodal Input: A Survey. In Proc. of ICMI-MLMI 2009, International Conference on Multimodal Interfaces, pages 153--160, Cambridge, Massachusetts, USA, September 2009.
[12]
S. Oviatt. Advances in Robust Multimodal Interface Design. IEEE Computer Graphics and Applications, 23(5):62--68, September 2003.
[13]
S. Oviatt. Multimodal Interfaces. In The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, Second Edition, pages 286--304. Lawrence Erlbaum Associates, 2007.
[14]
E. Petajan, B. Bischoff, D. Bodoff, and N. Brooke. An Improved Automatic Lipreading System to Enhance Speech Recognition. In Proc. of CHI 1988, ACM Conference on Human Factors in Computing Systems, pages 19--25, Washington, USA, June 1988.
[15]
C. Scholliers, L. Hoste, B. Signer, and W. D. Meuter. Midas: A Declarative Multi-Touch Interaction Framework. In Proc. of TEI 2011, 5th International Conference on Tangible, Embedded and Embodied Interaction, pages 49--56, Funchal, Portugal, January 2011.
[16]
M. Serrano, L. Nigay, J. Lawson, A. Ramsay, R. Murray-Smith, and S. Denef. The OpenInterface Framework: A Tool for Multimodal Interaction. In Proc. of CHI 2008, ACM Conference on Human Factors in Computing Systems, Florence, Italy, April 2008.
[17]
R. Sharma, V. Pavlovic, and T. Huang. Toward Multimodal Human-Computer Interface. Proceedings of the IEEE, 86(5):853--869, 1998.
[18]
T. Sowa, M. Fröhlich, and M. Latoschik. Temporal Symbolic Integration Applied to a Multimodal System Using Gestures and Speech. In Proc. of GW 1999, International Gesture Workshop on Gesture-Based Communication in Human-Computer Interaction, pages 291--302, Gif-sur-Yvette, France, March 1999.
[19]
M. Vo and C. Wood. Building an Application Framework for Speech and Pen Input Integration in Multimodal Learning Interfaces. In Proc. of ICASSP 1996, IEEE International Conference on Acoustics, Speech, and Signal Processing, pages 3545--3548, Atlanta, USA, May 1996.
[20]
G. Welch and G. Bishop. An Introduction to the Kalman Filter. Technical Report TR 95-041, Department of Computer Science, University of North Carolina at Chapel Hill, 2000.
[21]
L. Wu, S. Oviatt, and P. Cohen. From Members to Teams to Committee - A Robust Approach to Gestural and Multimodal Recognition. IEEE Transactions on Neural Networks, 13(4):972--982, 2002.

Cited By

View all
  • (2024)A Systematic Process to Engineer Dependable Integration of Frame-based Input Devices in a Multimodal Input Chain: Application to Rehabilitation in HealthcareProceedings of the ACM on Human-Computer Interaction10.1145/36646338:EICS(1-31)Online publication date: 17-Jun-2024
  • (2024)Exploiting Semantic Search and Object-Oriented Programming to Ease Multimodal Interface DevelopmentCompanion Proceedings of the 16th ACM SIGCHI Symposium on Engineering Interactive Computing Systems10.1145/3660515.3664244(74-80)Online publication date: 24-Jun-2024
  • (2024)ReactGenie: A Development Framework for Complex Multimodal Interactions Using Large Language ModelsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642517(1-23)Online publication date: 11-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI '11: Proceedings of the 13th international conference on multimodal interfaces
November 2011
432 pages
ISBN:9781450306416
DOI:10.1145/2070481
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 November 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. declarative programming
  2. multimodal fusion
  3. multimodal interaction
  4. rule language

Qualifiers

  • Poster

Conference

ICMI'11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)2
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Systematic Process to Engineer Dependable Integration of Frame-based Input Devices in a Multimodal Input Chain: Application to Rehabilitation in HealthcareProceedings of the ACM on Human-Computer Interaction10.1145/36646338:EICS(1-31)Online publication date: 17-Jun-2024
  • (2024)Exploiting Semantic Search and Object-Oriented Programming to Ease Multimodal Interface DevelopmentCompanion Proceedings of the 16th ACM SIGCHI Symposium on Engineering Interactive Computing Systems10.1145/3660515.3664244(74-80)Online publication date: 24-Jun-2024
  • (2024)ReactGenie: A Development Framework for Complex Multimodal Interactions Using Large Language ModelsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642517(1-23)Online publication date: 11-May-2024
  • (2024)Supporting the Communication of People with Aphasia While Lying in BedInformation and Communication Technologies for Ageing Well and e-Health10.1007/978-3-031-62753-8_2(19-42)Online publication date: 26-Jul-2024
  • (2023)Harnessing the Role of Speech Interaction in Smart Environments Towards Improved Adaptability and Health MonitoringWireless Mobile Communication and Healthcare10.1007/978-3-031-32029-3_24(271-286)Online publication date: 14-May-2023
  • (2022)FLOREnce: A Hybrid Logic-Functional Reactive Programming LanguageProceedings of the 9th ACM SIGPLAN International Workshop on Reactive and Event-Based Languages and Systems10.1145/3563837.3568339(24-36)Online publication date: 29-Nov-2022
  • (2021)A Review on Explainability in Multimodal Deep Neural NetsIEEE Access10.1109/ACCESS.2021.30702129(59800-59821)Online publication date: 2021
  • (2020)DG3: Exploiting Gesture Declarative Models for Sample Generation and Online RecognitionProceedings of the ACM on Human-Computer Interaction10.1145/33978704:EICS(1-21)Online publication date: 18-Jun-2020
  • (2020)Enabling Multimodal Emotionally-Aware Ecosystems Through a W3C-Aligned Generic Interaction ModalityWireless Mobile Communication and Healthcare10.1007/978-3-030-49289-2_11(140-152)Online publication date: 28-May-2020
  • (2019)The AM4I Architecture and Framework for Multimodal Interaction and Its Application to Smart EnvironmentsSensors10.3390/s1911258719:11(2587)Online publication date: 6-Jun-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media