The IntelliMedia WorkBench-An Environment for Building Multimodal Systems

Brøndsted, Tom; Dalsgaard, Paul; Larsen, Lars Bo; Manthey, Michael; Kevitt, Paul Mc; Moeslund, Thomas B.; Olesen, Kristian G.

doi:10.1007/3-540-45520-5_13

Tom Brøndsted³,
Paul Dalsgaard³,
Lars Bo Larsen³,
Michael Manthey³,
Paul Mc Kevitt³,
Thomas B. Moeslund³ &
…
Kristian G. Olesen³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2155))

Included in the following conference series:

International Conference on Cooperative Multimodal Communication

280 Accesses

Abstract

Intelligent MultiMedia (IntelliMedia) focuses on the computer processing and understanding of signal and symbol input from at least speech, text and visual images in terms of semantic representations. We have developed a general suite of tools in the form of a software and hardware platform called “Chameleon” that can be tailored to conducting IntelliMedia in various application domains. Chameleon has an open distributed processing architecture and currently includes ten agent modules: blackboard, dialogue manager, domain model, gesture recogniser, laser system, microphone array, speech recogniser, speech synthesiser, natural language processor, and a distributed Topsy learner. Most of the modules are programmed in C and C++ and are glued together using the Dacs communications system. In effect, the blackboard, dialogue manager and Dacs form the kernel of Chameleon. Modules can communicate with each other and the blackboard which keeps a record of interactions over time via semantic representations in frames. Inputs to Chameleon can include synchronised spoken dialogue and images and outputs include synchronised laser pointing and spoken dialogue. An initial prototype application of Chameleon is an IntelliMedia Work-Bench where a user will be able to ask for information about things (e.g. 2D/3D models, pictures, objects, gadgets, people, or whatever) on a physical table. The current domain is a Campus Information System for 2D building plans which provides information about tenants, rooms and routes and can answer questions like Whose office is this? and Show me the route from Paul Mc Kevitt’s office to Paul Dalsgaard’s office. in real time. Chameleon and the IntelliMedia WorkBench are ideal for testing integrated signal and symbol processing of language and vision for the future of SuperinformationhighwayS.

Paul Mc Kevitt was also a British Engineering and Physical Sciences Research Council (EPSRC) Advanced Fellow at the University of Sheffield, England for five years under grant B/94/AF/1833 for the Integration of Natural Language, Speech and Vision Processing and recently took up appointment as Chair in Digital MultiMedia at The University of Ulster (Magee), Northern Ireland (p.mckevitt@ulst.ac.uk).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Extensible Multimodal Annotation for Intelligent Interactive Systems

What HLT Can Do for You (and Vice Versa)

Say Hi to Eliza

References

Bakman, L., M. Blidegn, T.D. Nielsen, and S. Carrasco Gonzalez (1997) NIVICO-Natural Interface for VIdeo COnferencing. Project Report (8th Semester), Department of Communication Technology, Institute for Electronic Systems, Aalborg University, Denmark.
Google Scholar
Bech, A. (1991) Description of the EUROTRA framework. In The Eurotra Formal Specifications, Studies in Machine Translation and Natural Language Processing, C. Copeland, J. Durand, S. Krauwer, and B. Maegaard (Eds), Vol. 2, 7–40. Luxembourg: Office for Official Publications of the Commission of the European Community.
Google Scholar
Br’lndsted, T. (1998) nlparser. http://www.kom.auc.dk/tb/nlparser
Brøndsted, T., P. Dalsgaard, L.B. Larsen, M. Manthey, P. Mc Kevitt, T.B. Moeslund, and K.G. Olesen (1998) A platform for developing Intelligent MultiMedia applications. Technical Report R-98-1004, Center for PersonKommunikation (CPK), Institute for Electronic Systems (IES), Aalborg University, Denmark, May.
Google Scholar
Christensen, H., B. Lindberg, and P. Steingrimsson (1998) Functional specification of the CPK Spoken LANGuage recognition research system (SLANG). Center for PersonKommunikation, Aalborg University, Denmark, March.
Google Scholar
CPK Annual Report (1998) CPK Annual Report. Center for PersonKommunikation (CPK), Fredrik Bajers Vej 7-A2, Institute for Electronic Systems (IES), Aalborg University, DK-9220, Aalborg, Denmark.
Google Scholar
Denis, M. and M. Carfantan (Eds.) (1993) Images et langages: multimodalité et modelisation cognitive. Actes du Colloque Interdisciplinaire du Comité National de la Recherche Scientifique, Salle des Conférences, Siége du CNRS, Paris, April.
Google Scholar
Fink, G.A., N. Jungclaus, H. Ritter, and G. Sagerer (1995) A communication framework for heterogeneous distributed pattern analysis. In Proc. International Conference on Algorithms and Applications for Parallel Processing, V. L. Narasimhan (Ed.), 881–890. IEEE, Brisbane, Australia.
Chapter Google Scholar
Fink, G.A., N. Jungclaus, and F. Kummert, H. Ritter, and G. Sagerer (1996) A distributed system for integrated speech and image understanding. In Proceedings of the International Symposium on Artificial Intelligence, Rogelio Soto (Ed.), 117–126. Cancun, Mexico.
Google Scholar
Infovox (1994) INFOVOX: Text-to-speech converter user’s manual (version 3.4). Solna, Sweden: Telia Promotor Infovox AB
Google Scholar
Jensen, F.V. (1996) An introduction to Bayesian Networks. London, England: UCL Press.
Google Scholar
Jensen, F. (1996) Bayesian belief network technology and the HUGIN system. In Proceedings of UNICOM seminar on Intelligent Data Management, Alex Gammerman (Ed.), 240–248. Chelsea Village, London, England, April.
Google Scholar
Kosslyn, S.M. and J.R. Pomerantz (1977) Imagery, propositions and the form of internal representations. In Cognitive Psychology, 9, 52–76.
Article Google Scholar
Leth-Espensen, P. and B. Lindberg (1996) Separation of speech signals using eigen-filtering in a dual beamforming system. In Proc. IEEE Nordic Signal Processing Symposium (NORSIG), Espoo, Finland, September, 235–238.
Google Scholar
Manthey, M.J. (1998) The Phase Web Paradigm. In International Journal of General Systems, special issue on General Physical Systems Theories, K. Bowden (Ed.). in press.
Google Scholar
Mc Kevitt, P. (1994) Visions for language. In Proceedings of the Workshop on Integration of Natural Language and Vision processing, Twelfth American National Conference on Artificial Intelligence (AAAI-94), Seattle, Washington, USA, August, 47–57.
Google Scholar
Mc Kevitt, P. (Ed.) (1995/1996) Integration of Natural Language and Vision Processing (Vols. I-IV). Dordrecht, The Netherlands: Kluwer-Academic Publishers.
Google Scholar
Mc Kevitt, P. (1997) Superinformationhighway S. In “Sprog og Multimedier” (Speech and Multimedia), Tom Brøndsted and Inger Lytje (Eds.), 166–183, April 1997. Aalborg, Denmark: Aalborg Universitetsforlag (Aalborg University Press).
Google Scholar
Mc Kevitt, P. and P. Dalsgaard (1997) A frame semantics for an IntelliMedia Tour-Guide. In Proceedings of the Eighth Ireland Conference on Artificial Intelligence (AI-97), Volume 1, 104–111. University of Uster, Magee College, Derry, Northern Ireland, September.
Google Scholar
Minsky, M. (1975) A framework for representing knowledge. In The Psychology of Computer Vision, P.H. Winston (Ed.), 211–217. New York: McGraw-Hill.
Google Scholar
Nielsen, C., J. Jensen, O. Andersen, and E. Hansen (1997) Speech synthesis based on diphone concatenation. Technical Report, No. CPK971120-JJe (in confidence), Center for PersonKommunikation, Aalborg University, Denmark.
Google Scholar
Okada, N. (1997) Integrating vision, motion and language through mind. In Proceedings of the Eighth Ireland Conference on Artificial Intelligence (AI-97), Volume 1, 7–16. University of Uster, Magee, Derry, Northern Ireland, September.
Google Scholar
Pentland, A. (Ed.) (1993) Looking at people: recognition and interpretation of human action. IJCAI-93 Workshop (W28) at The 13th International Conference on Artificial Intelligence (IJCAI-93), Chambéry, France, August.
Google Scholar
Power, K., C. Matheson, D. Ollason, and R. Morton (1997) The grapHvite book (version 1.0). Cambridge, England: Entropic Cambridge Research Laboratory Ltd.
Google Scholar
Pylyshyn, Z. (1973) What the mind’s eye tells the mind’s brain: a critique of mental imagery. In Psychological Bulletin, 80, 1–24.
Article Google Scholar
Rickheit, G. and I. Wachsmuth (1996) Collaborative Research Centre “Situated Artificial Communicators” at the University of Bielefeld, Germany. In Integration of Natural Language and Vision Processing, Volume IV, Recent Advances, Mc Kevitt, Paul (ed.), 11–16. Dordrecht, The Netherlands: Kluwer Academic Publishers.
Google Scholar
Thórisson, K.R. (1997) Layered action control in communicative humanoids. In Proceedings of Computer Graphics Europe’ 97, June 5–7, Geneva, Switzerland.
Google Scholar
Waibel, A., M.T. Vo, P. Duchnowski, and S. Manke (1996) Multimodal interfaces. In Integration of Natural Language and Vision Processing, Volume IV, Recent Advances, Mc Kevitt, Paul (Ed.), 145–165. Dordrecht, The Netherlands: Kluwer Academic Publishers.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Electronic Systems (IES), Aalborg University, Aalborg, Denmark
Tom Brøndsted, Paul Dalsgaard, Lars Bo Larsen, Michael Manthey, Paul Mc Kevitt, Thomas B. Moeslund & Kristian G. Olesen

Authors

Tom Brøndsted
View author publications
You can also search for this author in PubMed Google Scholar
Paul Dalsgaard
View author publications
You can also search for this author in PubMed Google Scholar
Lars Bo Larsen
View author publications
You can also search for this author in PubMed Google Scholar
Michael Manthey
View author publications
You can also search for this author in PubMed Google Scholar
Paul Mc Kevitt
View author publications
You can also search for this author in PubMed Google Scholar
Thomas B. Moeslund
View author publications
You can also search for this author in PubMed Google Scholar
Kristian G. Olesen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computational Linguistics and AI Group, Tilburg University, P.O. Box 90153, 5000, LE Tilburg, The Netherlands
Harry Bunt
Department of Information and Computing Science, Utrecht University, P.O. Box 80.089, 3508, TB Utrecht, The Netherlands
Robbert -Jan Beun

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brøndsted, T. et al. (2001). The IntelliMedia WorkBench-An Environment for Building Multimodal Systems. In: Bunt, H., Beun, R.J. (eds) Cooperative Multimodal Communication. CMC 1998. Lecture Notes in Computer Science(), vol 2155. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45520-5_13

Download citation

DOI: https://doi.org/10.1007/3-540-45520-5_13
Published: 23 October 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42806-0
Online ISBN: 978-3-540-45520-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics