Abstract
The proliferation of information and services on our global information highways demands mechanisms to support more effective and efficient interaction. This article claims that efficient and effective interaction requires both cooperative and multimedia communication, illustrating this through several applications developed by our group that aim to enhance interaction with complex systems or information sources. After defining the terms cooperative and multimedia and arguing for their centrality in interfaces, we overview key processes in the automated generation of multimedia. We then illustrate these with implemented examples from several domains including information retrieval, direction providing, mission planning and computer maintenance. After describing a visionary system for information access, we conclude by describing our current efforts toward providing content-based browsing and search of video.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Aberdeen, J., Burger, J., Day, D., Hirschman, L., Robinson, P., and Vilain, M. (1995) Description of the Alembic System Used for MUC-6. Proc. Sixth Message Understanding Conference. Columbia, MD: Advanced Research Projects Agency, Information Technology Office.
Aigraine, P., Joly, P. and Longueville, V. (1995) Medium Knowledge-based Macro-Segmentation of Video into Sequences. In M. Maybury (ed.) Working notes of IJCAI-95 Workshop on Intelligent Multimedia Information Retrieval, 5–16. To appear in (Maybury, 1997)
André, E., Finkler, W., Graf, W., Rist, T., Schauder, A., and Wahlster, W. (1993) WIP: The Automatic Synthesis of Multimodal Presentations. In M. Maybury (ed.) Intelligent Multimedia Interfaces, Menlo Park: AAAI/MIT Press, 73–90. Also DFKI Research Report RR-92-46, Saarbrücken.
Appelt, D. (1985) Planning English Sentences. Cambridge, UK: Cambridge University Press.
Arens, Y., Miller, L., and Sondheimer, N. K. (1991) Presentation Design Using an Integrated Knowledge Base. In J.W. Sullivan, and S.W. Tyler (eds) Intelligent User Interfaces. New York: ACM Press, 241–258.
Austin, J. (1962) How to do Things with Words, edited by J. O. Urmson. Oxford, UK: Oxford University Press.
Boyle, C. and Encarnacion, A. O. (1994) MetaDoc: An Adaptive Hypertext Reading System. User Modeling and User-Adapted Interaction 4(1), 1–19.
Bunt, H., R. Ahn, R.-J. Beun, T. Borghuis and C. van Overveld (1995) Cooperative Multimodal Communication in the DenK Project. In H. Bunt, R.-J. Beun and T. Borghuis (eds.) Proc. International Conference on Cooperative Multimodal Communication CMC/95). Eindhoven: IPO, 79–102.
Burger, J., and Marshall, R. (1993) The Application of Natural Language Models to Intelligent Multimedia. In (Maybury, 1993), 167–187.
Catarci, T., Costabile, M. F., and Levialdi, S. (eds) (1992) Advanced Visual Interfaces: Proceedings of the International Workshop AVI'92 Singapore: World Scientific Series in Computer Science, Vol 36.
Cheyer, A. and Julia, L. (1995) Multimodal Maps: An Agent-based Approach. In H. Bunt, R.-J. Beun and T. Borghuis (eds.) Proc. International Conference on Cooperative Multimodal Communication CMC/95). Eindhoven: IPO, 103–113. Reprinted in this volume.
Cremers, A. 1995. Object Reference During task-related Terminal Dialogues. In H. Bunt, R.-J. Beun and T. Borghuis (eds.) Proc. International Conference on Cooperative Multimodal Communication CMC/95). Eindhoven: IPO, 115–128. Reprinted in this volume.
Dubner, B. (1996) Automatic Scene Detector and Videotape logging system, User Guide. Dubner International, Inc., Copyright 1995.
Fais, L., Loken-Kim, K., Park, Y. (1995) Speaker's Responses to Requests for Repetition in a Multimedia Language Processing Environment. In H. Bunt, R.-J. Beun and T. Borghuis (eds.) Proc. International Conference on Cooperative Multimodal Communication CMC/95). Eindhoven: IPO, 129–143. See also Fais et al., this volume.
Feiner, S. K. and McKeown, K. R. (1993) Automating the Generation of Coordinated Multimedia Explanations. In M. Maybury (ed.) Intelligent Multimedia Information Retrieval. Menlo Park: AAAI/MIT Press, 113–134.
Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., Petkovic, D., Steele, D. and Yanker, P. (1995) Query by Image and Video Content: The QBIC System. IEEE Computer.
Goodman, B.A. (1993) Multimedia Explanations for Intelligent Training Systems. In M. Maybury (ed.) Intelligent Multimedia Information Retrieval. Menlo Park: AAAI/MIT Press, 148–171
Graf, W. (1992) Constraint-based Graphical Layout of Multimodal Presentations. In Catarci, Costabile, and Levialdi (1992), 365–385. Also available as DFKI Report RR-92-15.
Herzog, G. and Wazinski, P. (1994) Visual TRAnslator: Linking Perceptions and Natural Language Descriptions. Artificial Intelligence Review 8, 175–187.
Hovy, E. H. and Arens, Y. (1993) On the Knowledge Underlying Multimedia Presentations. In M. Maybury (ed.) Intelligent Multimedia Information Retrieval. Menlo Park: AAAI/MIT Press, 280–306.
Kaplan, C., Fenwick, J. and Chen, J. (1993) Adaptive Hypertext Navigation Based on User Goals and Context. User Modeling and User-Adapted Interaction 3(3), 193–220.
Kobsa, A., Nill, A. and Fink, J. (1994) KN-AHS: An Adaptive Hypertext Client of the User Modeling System BGP-MS. Proc. First International Conference on User Modeling (UM-94), Cape Cod, MA.
Kobsa, A. and Wahlster, W. (eds.) (1989) User Models in Dialogue Systems. Berlin: Springer Verlag.
Koons, D.B., Sparrell, C.J., and Thorisson, K.R. (1993) Integrating Simultaneous Output from Speech, Gaze, and Hand Gestures. In M. Maybury (ed.) Intelligent Multimedia Information Retrieval. Menlo Park: AAAI/MIT Press, 243–261.
Mackinlay, J. D. (1986) Automating the Design of Graphical Presentations of Relational Information. ACM Transactions on Graphics 5(2), 110–141.
Mani I., House, D., Maybury, M. and Green, M. (in press) Toward Content-based Browsing of Broadcast News Video. In (Maybury, forth.)
Maass, W. (1994) From Vision to Multimodal Communication: Incremental Route Descriptions. Artificial Intelligence Review 8, 159–174.
Maybury, M.T. (1991a) Planning Multimedia Explanations Using Communicative Acts. In Proc. Ninth National Conference on Artificial Intelligence. Anaheim, CA: AAAI, 61–66.
Maybury, M. T. (1991b) Topical, Temporal and Spatial Constraints on Linguistic Realization. Computational Intelligence: Special Issue on Natural Language Generation 7(4), 266–275.
Maybury, M. T. (ed.) (1993) Intelligent Multimedia Interfaces. Menlo Park: AAAI/MIT Press.(http://www.aaai.org:80/Press/Books/Maybury-1/maybury.html)
Maybury, M. T. (1994) Automated Explanation and Natural Language Generation. In C. Sabourin (ed) A Bibliography of Natural Language Generation. Montreal: Infolingua, 1–88.
Maybury, M. T. (1995a) Using Similarity Metrics to Determine Content for Explanation Generation. International Journal of Expert Systems with Applications. Special issue on Explanation, 8(4), 513–525.
Maybury, M. T. (1995b) Research in Multimedia Parsing and Generation. In P. McKevitt (ed.) Artificial Intelligence Review: Special Issue on on the Integration of Natural Language and Vision Processing 9(2–3), 103–127.
Maybury, M. T. (ed.) (1997) Intelligent Multimedia Information Retrieval. Menlo Park: AAAI/MIT Press. (http://www.aaai.org:80/Press/Books/Maybury-2)
Maybury, M. T. (in press) Communicative Acts for Multimedia and Multimodal Dialogue. In M. M. Taylor, F. Néel, and D. G. Bouwhuis (eds.) (in press) The Structure of Multimodal Dialogue II.
Michell, R. (1996) Forager for Information on the Super Highway (FISH). Unpublished Manuscript.
Neal, J. G. and Shapiro, S. C. (1991) Intelligent Multi-Media Interface Technology. In Sullivan and Tyler (1991), 11–43.
Neuwirth, C., Chandhok, R., Chamey, D., Wojahn, P. and Kim, L. (1994) Distributed Collaborative Writing: A Comparison of Spoken and Written Modalities for Reviewing and Revising Documents. In Proc. Human Factors inCOmputing Systems (CHI'94), Boston, 51–57.
Pelachaud, C. (1992) Functional Decomposition of Facial Expressions for an Animation System. In Catarci, Costabile, and Levialdi (1992), 26–49.
Reiter, E., Mellish, C. and Levine, J. (1992) Automatic Generation of on-line Documentation in the IDAS Project. Proc. 3rd Conference on Applied Natural Language Processing. Morristown: ACL.
Roth, S. F., and Mattis, J. (1991) Automating the Presentation of Information. In Proc. IEEE Conference on AI Applications, Miami Beach, FL., 90–97.
Searle, J. R. (1969) Speech Acts: An Essay in the Philosophy of Language. Cambridge, UK: Cambridge University Press.
Smotroff, I., Hirschman, L., and Bayer, S. (1995) Integrating Natural Language with Large DataspaceVisualization. To appear in N. Adam and B. Bhargava (eds), Advances in Digital Libraries. Lecture Notes in Computer Science, Berlin: Springer Verlag.
Stock, O. and the ALFRESCO Project Team (1993) ALFRESCO: Enjoying the Combination of Natural Language Processing and Hypermedia for Information Exploration. In M. Maybury (ed.) Intelligent Multimedia Information Retrieval. Menlo Park: AAAI/MIT Press, 197–224.
Sullivan, J.W., and Tyler, S.W. (eds) (1991) Intelligent User Interfaces. New York: ACM Press, Frontier Series.
Sutcliffe, A., Hare, M., Doubleday, A. and Ryan, M. (1997) Empirical Studies in Multimedia Information Retrieval. In M. Maybury (ed.) Intelligent Multimedia Information Retrieval. Menlo Park: AAAI/MIT Press, 449–472.
Wahlster, W. (1991) User and Discourse Models for Multimodal Communication. In Sullivan and Tyler (1991), 45–67.
Wahlster, W. (1996) Intellimedia. Invited Talk at the International Workshop on Cooperative Multimodal Communication CMC/95, May 1995, Eindhoven, the Netherlands.
Webber, B. (1995) Instructing Animated Agents: Viewing Language in Behavioral Terms. Invited Talk at the International Conference on Cooperative Multimodal Communication CMC/95, May 1995, Eindhoven, the Netherlands. Reprinted in this volume.
Zhang, H.J., Low, C.Y., Smoliar, S.W. and Zhong, D. (1995) Video Parsing, Retrieval, and Browsing: An Integrated and Content-Based Solution. Proc. of ACM Multimedia '95. To appear in Maybury, 1997)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag
About this paper
Cite this paper
Maybury, M.T. (1998). Toward cooperative multimedia interaction. In: Bunt, H., Beun, RJ., Borghuis, T. (eds) Multimodal Human-Computer Communication. CMC 1995. Lecture Notes in Computer Science, vol 1374. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0052311
Download citation
DOI: https://doi.org/10.1007/BFb0052311
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64380-7
Online ISBN: 978-3-540-69764-0
eBook Packages: Springer Book Archive