ABSTRACT
This research aims to propose a multimodal fusion framework for high-level data integration between two or more modalities. It takes as input extracted low level features from different system devices, analyzes and identifies intrinsic meanings in these data through dedicated processes running in parallel. Extracted meanings are mutually compared to identify complementarities, ambiguities and inconsistencies to better understand the user intention when interacting with the system. The whole fusion lifecycle will be described and evaluated in an ambient intelligence scenario, where two co-workers interact by voice and movements, demonstrating their intentions and the system gives advices according to identified needs.
- Bolt R. A., "Put-that-there: Voice and gesture at the graphics interface," in International Conference on Computer Graphics and Interactive Techniques, July 1980, pp. 262--270 Google ScholarDigital Library
- Bouchet, J. and Nigay, L. ICARE: a component-based approach for the design and development of multimodal interfaces. In CHI '04 Extended Abstracts on Human Factors in Computing Systems. Vienna, 2004. Google ScholarDigital Library
- Cole L., Austin D., Cole L., Visual Object Recognition using Template Matching, Proceedings of Australasian Conference on Robotics and Automation, 2004.Google Scholar
- Curran J., Clark S. and Bos J. (2007): Linguistically Motivated Large-Scale NLP with C&C and Boxer. Proceedings of the ACL 2007 Demonstrations Session (ACL-07 demo), pp. 29--32. Google ScholarDigital Library
- Engel, Ralf; Pfleger, Norbert. Modality Fusion. SmartKom: Foundations of Multimodal Dialogue Systems. Springer, Berlin, 2006.Google Scholar
- Eugene C., Mark J. Coarse-to-Fine N-Best Parsing and MaxEnt Discriminative Reranking. Proc. ACL2005. pp. 173--180. Ann Arbor, USA, 2005. Google ScholarDigital Library
- Intel Corporation, Open Source Computer Vision Library--OpenCV, http://www.intel.com/technology/computing/opencv/index.htm, 2008.Google Scholar
- Kaiser, Ed; Olwal, Alex; McGee, David; Benko, Hrvoje; Corradini, Andrea; Li, Xiaoguang; Cohen, Phil; Feiner, Steven. Mutual Disambiguation of 3D Multimodal Interaction in Augmented and Virtual Reality. ICMI'03. Vancouver, Canada. November, 2003. Google ScholarDigital Library
- Lawson, Lionel; Macq, Benoit. OpenInterface Platform for Multimodal Applications Prototyping. ICASSP Show&Tell '08, Las Vegas, USA. April, 2008.Google Scholar
- Martin, J. C. TYCOON: Theoretical Framework and Software Tools for Multimodal Interfaces. Intelligence and Multimodality in Multimedia interfaces. (ed.) John Lee, AAAI Press, 1998.Google Scholar
- Protégé: http://protege.stanford.edu/Google Scholar
- Soar: http://sitemaker.umich.edu/soar/homeGoogle Scholar
- Vybornova, Olga; Gemo, Monica; Macq, Benoit. Multimodal Multi-Level Fusion Using Contextual Information. ERCIM News, N0. 70. July, 2006.Google Scholar
- Vybornova, Olga; Mendonça, Hildeberto; Neiberg, Daniel; Gomez, Antonio; Shen, Ao. Multimodal High Level Data Integration. eNTERFACE'08 Workshop, Orsay-Paris, France. August, 2008.Google Scholar
- Vybornova, Olga; Mendonça, Hildeberto; Lawson, Lionel; Macq, Benoit. High Level Data Fusion on a Multimodal Interactive Application Platform. IEEE ISM, Berkeley, California, USA. December, 2008. Google ScholarDigital Library
- W. Walker, et. al. "Sphinx-4: A Flexible Open Source Framework for Speech Recognition" Technical Report, SMLI TR2004-0811, 2004 SUN MICROSYSTEMS INC.Google Scholar
Index Terms
- High level data fusion on a multimodal interactive application platform
Recommendations
A fusion framework for multimodal interactive applications
ICMI-MLMI '09: Proceedings of the 2009 international conference on Multimodal interfacesThis research aims to propose a multi-modal fusion framework for high-level data fusion between two or more modalities. It takes as input low level features extracted from different system devices, analyses and identifies intrinsic meanings in these ...
Toward multimodal fusion of affective cues
HCM '06: Proceedings of the 1st ACM international workshop on Human-centered multimediaDuring face to face communication, it has been suggested that as much as 70% of what people communicate when talking directly with others is through paralanguage involving multiple modalities combined together (e.g. voice tone and volume, body language)...
MULTIMODAL FUSION AS COMMUNICATIVE ACTS DURING HUMAN–ROBOT INTERACTION
Research on dialog systems is a very active area in social robotics. During the last two decades, these systems have evolved from those based only on speech recognition and synthesis to the current and modern systems, which include new components and ...
Comments