Abstract
Intelligent MultiMedia (IntelliMedia) focuses on the computer processing and understanding of signal and symbol input from at least speech, text and visual images in terms of semantic representations. We have developed a general suite of tools in the form of a software and hardware platform called “Chameleon” that can be tailored to conducting IntelliMedia in various application domains. Chameleon has an open distributed processing architecture and currently includes ten agent modules: blackboard, dialogue manager, domain model, gesture recogniser, laser system, microphone array, speech recogniser, speech synthesiser, natural language processor, and a distributed Topsy learner. Most of the modules are programmed in C and C++ and are glued together using the Dacs communications system. In effect, the blackboard, dialogue manager and Dacs form the kernel of Chameleon. Modules can communicate with each other and the blackboard which keeps a record of interactions over time via semantic representations in frames. Inputs to Chameleon can include synchronised spoken dialogue and images and outputs include synchronised laser pointing and spoken dialogue. An initial prototype application of Chameleon is an IntelliMedia Work-Bench where a user will be able to ask for information about things (e.g. 2D/3D models, pictures, objects, gadgets, people, or whatever) on a physical table. The current domain is a Campus Information System for 2D building plans which provides information about tenants, rooms and routes and can answer questions like Whose office is this? and Show me the route from Paul Mc Kevitt’s office to Paul Dalsgaard’s office. in real time. Chameleon and the IntelliMedia WorkBench are ideal for testing integrated signal and symbol processing of language and vision for the future of SuperinformationhighwayS.
Paul Mc Kevitt was also a British Engineering and Physical Sciences Research Council (EPSRC) Advanced Fellow at the University of Sheffield, England for five years under grant B/94/AF/1833 for the Integration of Natural Language, Speech and Vision Processing and recently took up appointment as Chair in Digital MultiMedia at The University of Ulster (Magee), Northern Ireland (p.mckevitt@ulst.ac.uk).
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bakman, L., M. Blidegn, T.D. Nielsen, and S. Carrasco Gonzalez (1997) NIVICO-Natural Interface for VIdeo COnferencing. Project Report (8th Semester), Department of Communication Technology, Institute for Electronic Systems, Aalborg University, Denmark.
Bech, A. (1991) Description of the EUROTRA framework. In The Eurotra Formal Specifications, Studies in Machine Translation and Natural Language Processing, C. Copeland, J. Durand, S. Krauwer, and B. Maegaard (Eds), Vol. 2, 7–40. Luxembourg: Office for Official Publications of the Commission of the European Community.
Br’lndsted, T. (1998) nlparser. http://www.kom.auc.dk/tb/nlparser
Brøndsted, T., P. Dalsgaard, L.B. Larsen, M. Manthey, P. Mc Kevitt, T.B. Moeslund, and K.G. Olesen (1998) A platform for developing Intelligent MultiMedia applications. Technical Report R-98-1004, Center for PersonKommunikation (CPK), Institute for Electronic Systems (IES), Aalborg University, Denmark, May.
Christensen, H., B. Lindberg, and P. Steingrimsson (1998) Functional specification of the CPK Spoken LANGuage recognition research system (SLANG). Center for PersonKommunikation, Aalborg University, Denmark, March.
CPK Annual Report (1998) CPK Annual Report. Center for PersonKommunikation (CPK), Fredrik Bajers Vej 7-A2, Institute for Electronic Systems (IES), Aalborg University, DK-9220, Aalborg, Denmark.
Denis, M. and M. Carfantan (Eds.) (1993) Images et langages: multimodalité et modelisation cognitive. Actes du Colloque Interdisciplinaire du Comité National de la Recherche Scientifique, Salle des Conférences, Siége du CNRS, Paris, April.
Fink, G.A., N. Jungclaus, H. Ritter, and G. Sagerer (1995) A communication framework for heterogeneous distributed pattern analysis. In Proc. International Conference on Algorithms and Applications for Parallel Processing, V. L. Narasimhan (Ed.), 881–890. IEEE, Brisbane, Australia.
Fink, G.A., N. Jungclaus, and F. Kummert, H. Ritter, and G. Sagerer (1996) A distributed system for integrated speech and image understanding. In Proceedings of the International Symposium on Artificial Intelligence, Rogelio Soto (Ed.), 117–126. Cancun, Mexico.
Infovox (1994) INFOVOX: Text-to-speech converter user’s manual (version 3.4). Solna, Sweden: Telia Promotor Infovox AB
Jensen, F.V. (1996) An introduction to Bayesian Networks. London, England: UCL Press.
Jensen, F. (1996) Bayesian belief network technology and the HUGIN system. In Proceedings of UNICOM seminar on Intelligent Data Management, Alex Gammerman (Ed.), 240–248. Chelsea Village, London, England, April.
Kosslyn, S.M. and J.R. Pomerantz (1977) Imagery, propositions and the form of internal representations. In Cognitive Psychology, 9, 52–76.
Leth-Espensen, P. and B. Lindberg (1996) Separation of speech signals using eigen-filtering in a dual beamforming system. In Proc. IEEE Nordic Signal Processing Symposium (NORSIG), Espoo, Finland, September, 235–238.
Manthey, M.J. (1998) The Phase Web Paradigm. In International Journal of General Systems, special issue on General Physical Systems Theories, K. Bowden (Ed.). in press.
Mc Kevitt, P. (1994) Visions for language. In Proceedings of the Workshop on Integration of Natural Language and Vision processing, Twelfth American National Conference on Artificial Intelligence (AAAI-94), Seattle, Washington, USA, August, 47–57.
Mc Kevitt, P. (Ed.) (1995/1996) Integration of Natural Language and Vision Processing (Vols. I-IV). Dordrecht, The Netherlands: Kluwer-Academic Publishers.
Mc Kevitt, P. (1997) Superinformationhighway S. In “Sprog og Multimedier” (Speech and Multimedia), Tom Brøndsted and Inger Lytje (Eds.), 166–183, April 1997. Aalborg, Denmark: Aalborg Universitetsforlag (Aalborg University Press).
Mc Kevitt, P. and P. Dalsgaard (1997) A frame semantics for an IntelliMedia Tour-Guide. In Proceedings of the Eighth Ireland Conference on Artificial Intelligence (AI-97), Volume 1, 104–111. University of Uster, Magee College, Derry, Northern Ireland, September.
Minsky, M. (1975) A framework for representing knowledge. In The Psychology of Computer Vision, P.H. Winston (Ed.), 211–217. New York: McGraw-Hill.
Nielsen, C., J. Jensen, O. Andersen, and E. Hansen (1997) Speech synthesis based on diphone concatenation. Technical Report, No. CPK971120-JJe (in confidence), Center for PersonKommunikation, Aalborg University, Denmark.
Okada, N. (1997) Integrating vision, motion and language through mind. In Proceedings of the Eighth Ireland Conference on Artificial Intelligence (AI-97), Volume 1, 7–16. University of Uster, Magee, Derry, Northern Ireland, September.
Pentland, A. (Ed.) (1993) Looking at people: recognition and interpretation of human action. IJCAI-93 Workshop (W28) at The 13th International Conference on Artificial Intelligence (IJCAI-93), Chambéry, France, August.
Power, K., C. Matheson, D. Ollason, and R. Morton (1997) The grapHvite book (version 1.0). Cambridge, England: Entropic Cambridge Research Laboratory Ltd.
Pylyshyn, Z. (1973) What the mind’s eye tells the mind’s brain: a critique of mental imagery. In Psychological Bulletin, 80, 1–24.
Rickheit, G. and I. Wachsmuth (1996) Collaborative Research Centre “Situated Artificial Communicators” at the University of Bielefeld, Germany. In Integration of Natural Language and Vision Processing, Volume IV, Recent Advances, Mc Kevitt, Paul (ed.), 11–16. Dordrecht, The Netherlands: Kluwer Academic Publishers.
Thórisson, K.R. (1997) Layered action control in communicative humanoids. In Proceedings of Computer Graphics Europe’ 97, June 5–7, Geneva, Switzerland.
Waibel, A., M.T. Vo, P. Duchnowski, and S. Manke (1996) Multimodal interfaces. In Integration of Natural Language and Vision Processing, Volume IV, Recent Advances, Mc Kevitt, Paul (Ed.), 145–165. Dordrecht, The Netherlands: Kluwer Academic Publishers.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Brøndsted, T. et al. (2001). The IntelliMedia WorkBench-An Environment for Building Multimodal Systems. In: Bunt, H., Beun, R.J. (eds) Cooperative Multimodal Communication. CMC 1998. Lecture Notes in Computer Science(), vol 2155. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45520-5_13
Download citation
DOI: https://doi.org/10.1007/3-540-45520-5_13
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42806-0
Online ISBN: 978-3-540-45520-2
eBook Packages: Springer Book Archive