Abstract
This paper describes the “FAME” multi-modal demonstrator, which integrates multiple communication modes – vision, speech and object manipulation – by combining the physical and virtual worlds to provide support for multi-cultural or multi-lingual communication and problem solving.
The major challenges are automatic perception of human actions and understanding of dialogs between people from different cultural or linguistic backgrounds. The system acts as an information butler, which demonstrates context awareness using computer vision, speech and dialog modeling. The integrated computer-enhanced human-to-human communication has been publicly demonstrated at the FORUM2004 in Barcelona and at IST2004 in The Hague.
Specifically, the “Interactive Space” described features an “Augmented Table” for multi-cultural interaction, which allows several users at the same time to perform multi-modal, cross-lingual document retrieval of audio-visual documents previously recorded by an “Intelligent Cameraman” during a week-long seminar.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Gieselmann, P., Denecke, M.: Towards multimodal interaction within an intelligent room. In: Proc. Eurospeech 2003, Geneva, Switzerland, ISCA (2003)
Crowley, J.L., Reignier, P.: Dynamic composition of process federations for context aware perception of human activity. In: Proc. International Conference on Integration of Knowledge Intensive Multi-Agent Systems, KIMAS 2003, vol. 10. IEEE, Los Alamitos (2003)
Consorci Universitat Internacional Menéndez Pelayo de Barcelona: Tecnologies de la llengua: darrers avenços and Llenguatge, cognició i evolució (2004), http://www.cuimpb.es/
FORUM2004: Universal Forum of Cultures (2004), http://www.barcelona2004.org/
Lachenal, C., Coutaz, J.: A reference framework for multi-surface interaction. In: Proc. HCI International 2003. Crete University Press, Greece, Crete (2003)
Metze, F., Fügen, C., Pan, Y., Waibel, A.: Automatically Transcribing Meetings Using Distant Microphones. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, Philadelphia, PA, USA. IEEE, Los Alamitos (2005)
Center, S.A.: Open Agent Architecture 2.3.0. (2003), http://www.ai.sri.com/~oaa/
Rey, G., Crowley, J.L., Coutaz, J., Reignier, P.: Perceptual components for context aware computing. In: Borriello, G., Holmquist, L.E. (eds.) UbiComp 2002. LNCS, vol. 2498, p. 117. Springer, Heidelberg (2002)
Allen, J.: Towards a general theory of action and time. Artificial Intelligence 13 (1984)
Caporossi, A., Hall, D., Reignier, P., Crowley, J.: Robust visual tracking from dynamic control of processing. In: PETS 2004, Workshop on Performance Evaluation for tracking and Surveillance, ECCV 2004. Czech Republic, Prague (2004)
Lamel, L., Gauvain, J., Eskenazi, M.: BREF, a large vocabulary spoken corpus for French. In: Proc. Eurospeech 1991, Geneva, Switzerland (1991)
Surcin, S., Stiefelhagen, R., McDonough, J.: Evaluation packages for the first chil evaluation campaign. CHIL Deliverable D4.2 (2005), http://chil.server.de/
Bertran, M., Gatius, M., Rodriguez, H.: FameIr, multimedia information retrieval shell. In: Proceedings of JOTRI 2003. Universidad Carlos III, Madrid, Spain (2003)
The Global WordNet Association: EuroWordNet (1999), http://www.globalwordnet.org/
Arranz, V., Bertran, M., Rodriguez, H.: Which is the current topic? what is relevant to it? a topic detection retrieval and presentation system. FAME Deliverable D7.2 (2003)
Ware, C., Balakrishnan, R.: Reaching for objects in vr displays: Lag and frame rate. ACM Transactions on Computer-Human Interaction (TOCHI) 1, 331–356 (1994)
Watson, B., Walker, N., Ribarsky, W., Spaulding, V.: The effects of variation of system responsiveness on user performance in virtual environments. Human Factors, Special Section on Virtual Environments 3, 403–414 (1998)
Liang, J., Shaw, C., Green, M.: On temporal-spatial realism in the virtual reality environment. In: ACMsymposium on User interface software and technology, Hilton Head, South Carolina, pp. 19–25 (1991)
Denecke, M.: Rapid prototyping for spoken dialogue systems. In: Proceedings of the 19th International Conference on Computational Linguistics, Taiwan (2002)
Holzapfel, H.: Towards development of multilingual spoken dialogue systems. In: Proceedings of the 2nd Language and Technology Conference (2005)
Cettolo, M., Brugnara, F., Federico, M.: Advances in the automatic transcription of lectures. In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), Montreal, Canada. IEEE, Los Alamitos (2004)
Lamel, L., Schiel, F., Fourcin, A., Mariani, J., Tillmann, H.: The translanguage english database (ted). In: Proc. ICSLP 1994, Yokohama, Japan, ISCA, pp. 1795–1798 (1994)
Coutaz, J., et al.: Evaluation of the fame interaction techniques and lessons learned. FAME Deliverable D8.2 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Metze, F. et al. (2006). The “FAME” Interactive Space. In: Renals, S., Bengio, S. (eds) Machine Learning for Multimodal Interaction. MLMI 2005. Lecture Notes in Computer Science, vol 3869. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11677482_11
Download citation
DOI: https://doi.org/10.1007/11677482_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32549-9
Online ISBN: 978-3-540-32550-5
eBook Packages: Computer ScienceComputer Science (R0)