ABSTRACT
Many intelligent interfaces must recognize patterns of user activity that cross a variety of different input channels. These multimodal interfaces offer significant challenges to both the designer and the software engineer. The designer needs a method of expressing interaction patterns that has the power to capture real use cases and a clear semantics. The software engineer needs a processing model that can identify the described interaction patterns efficiently while maintaining meaningful intermediate state to aid in debugging and system maintenanceIn this paper, we describe an input model, a general recognition model, and a series of important classes of recognition parsers with useful computational characteristics; that is, we can say with some certainty how efficient the recognizers will be, and the kind of patterns the recognizers will accept. Examples illustrate the ability of these recognizers to integrate information from multiple channels across varying time intervals.
- Allen, J., Maintaining knowledge about temporal intervals. Communications of the ACM, 1983. 26(11): 832--843. Google ScholarDigital Library
- Eclipse.org, Eclipse Integrated Development Environment. 2002, http://www.eclipse.org.Google Scholar
- Firby, R.J., et al. An architecture for vision and action. Proceedings of International Joint Conference on Artificial Intelligence. 1995. Google ScholarDigital Library
- Fitzgerald, W., Building Embedded Conceptual Parsers. Unpublished Ph.D. Thesis, Northwestern University, 1994. Google ScholarDigital Library
- Fitzgerald, W. and Firby, R.J. The Dynamic Predictive Memory Architecture: Integrating language with task execution. Proceedings of IEEE Symposia on Intelligence and Systems. 1998. Washington, DC. Google ScholarDigital Library
- Flachsbart, J., Franklin, D., and Hammond, K. Improving human computer interaction in a classroom environment using computer vision. Proceedings of Intelligent User Interfaces. 2000. New Orleans, LA: ACM. Google ScholarDigital Library
- Horswill, I., Specialization of Perceptual Processes. Unpublished Ph.D. Thesis, Massachusetts Institute of Technology, 1993. Google ScholarDigital Library
- Johnson, M. Unification-based multimodal parsing. Proceedings of COLING-ACL 98. 1998. Montreal, Quebec: ACL Publications. Google ScholarDigital Library
- Johnson, M. and Bagalore, S. Finite-state multimodal understanding and parsing. Proceedings of COLING-2000. 2002. Saarbrücken, Germany: ACL Publications. Google ScholarDigital Library
- Martin, C.E., Case-based parsing and Micro-DMAP, in Inside Case-Based Reasoning, C.K. Riesbeck and R.C. Schank, Editors. 1989, Lawrence Erlbaum Associates: Hillsdale, NJ.Google Scholar
- Martin, D.L., Cheyer, A.J., and Moran, D.B., The Open Agent Architecture: A framework for building distributed software systems. Applied Artificial Intelligence, 1999. 13: 91--128. Google ScholarDigital Library
- Moran, D.B., et al. Multimodal user interfaces in the Open Agent Architecture. Proceedings of Intelligent User Interfaces. 1997. Orlando, FL: ACM. Google ScholarDigital Library
- Oviatt, S., Ten myths of multimodal interaction. Communications of the ACM, 1999. 42(11): 74--81. Google ScholarDigital Library
- Schreckenghost, D.C., et al., Intelligent control of life support for space missions. IEEE Intelligent Systems, 2002. 17(5): 24--31. Google ScholarDigital Library
Index Terms
Multimodal event parsing for intelligent user interfaces
Recommendations
Demonstration of the complex event recognition architecture for multimodal event parsing
IUI '03: Proceedings of the 8th international conference on Intelligent user interfacesAn important criterion for many intelligent user interfaces is that the interface be multimodal, that is, allow the user to interact with the system using a variety of different input channels. In addition to user interface interactions per se, the ...
Multimodal Attentive Fusion Network for audio-visual event recognition
AbstractEvent classification is inherently sequential and multimodal. Therefore, deep neural models need to dynamically focus on the most relevant time window and/or modality of a video. In this study, we propose the Multimodal Attentive ...
Highlights- State-of-the-art audio and visual interactions in neural networks are relatively simple.
Multimodal user interfaces for a travel assistant
IHM '03: Proceedings of the 15th Conference on l'Interaction Homme-MachineAs a part of a project to develop a personal assistant for travellers, we have studied three types of multimodal interfaces for a PDA: 1) a combination of Control menus and vocal inputs to control zoomable user interfaces to graphical or textual ...
Comments