COSMOROE: a cross-media relations framework for modelling multimedia dialectics

Pastra, Katerina

doi:10.1007/s00530-008-0142-0

COSMOROE: a cross-media relations framework for modelling multimedia dialectics

Regular Paper
Published: 13 September 2008

Volume 14, pages 299–323, (2008)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Katerina Pastra¹

331 Accesses
22 Citations
Explore all metrics

Abstract

Though everyday interaction is predominantly multimodal, a purpose-developed framework for describing the semantic interplay between verbal and non-verbal communication is still lacking. This lack not only indicates one’s poor understanding of multimodal human behaviour, but also weakens any attempt to model such behaviour computationally. In this article, we present COSMOROE, a corpus-based framework for describing semantic interrelations between images, language and body movements. We argue that in viewing such relations from a message-formation perspective rather than a communicative goal one, one may develop a framework with descriptive power and computational applicability. We test COSMOROE for compliance to these criteria, by using it for annotating a corpus of TV travel programmes; we present all particulars of the annotation process and conclude with a discussion on the usability and scope of such annotated corpora.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multimodal Corpora

Multimodal Analysis

Towards Pragmatic Understanding of Conversational Intent: A Multimodal Annotation Approach to Multiparty Informal Interaction – The EVA Corpus

References

André, E., Rist, T.: The design of illustrated documents as a planning task. In: Maybury, M. (ed.) Intelligent Multimedia Interfaces, pp. 94–116, Chap. 4. AAAI Press/MIT Press, Cambridge, MA (1993)
André, E., Rist, T.: Referring to world objects with text and pictures. In: Proceedings of the Computational Linguistics Conference, pp. 530–534 (1994)
Barnard K., Duygulu P., Forsyth D., Freitas N., Blei D., Jordan M.: Matching words and pictures. J. Mach. Learn. Res. 3, 1107–1135 (2003)
Article MATH Google Scholar
Barras, C., Geoffrois, E., Wu, Z., Liberman, M.: Transcriber: a free tool for segmenting, labeling and transcribing speech. In: Proceedings of the First International Conference on Language Resources and Evaluation, pp. 1373–1376 (1998)
Barthes, R.: Image, Music, Text. Flamingo (1984)
Bateman, J., Delin, J., Allen, P.: Constraints on layout in multimodal document generation. In: Proceedings of the Workshop on Coherence in Generated Multimedia, First International Natural Language Generation Conference (2000)
Bateman, J., Delin, J., Henschel, R.: Multimodality and empiricism: preparing for a corpus-based approach to the study of multimodal meaning-making. In: Perspectives on Multimodality, pp. 65–89. John Benjamins, Amsterdam (2004)
Bernsen N.: Why are analogue graphics and natural language both needed in hci? In: Paterno, F. (ed.) Interactive Systems: Design, specification and verification. Focus on Computer Graphics, pp. 235–251. Springer, Berlin (1995)
Bordegoni M., Faconti G., Feiner S., Maybury M., Rist T., Ruggieri S., Trahanias P., Wilson M.: A standard reference model for intelligent multimedia presentation systems. Computer Standards Interfaces 18(6/7), 477–496 (1997)
Article Google Scholar
Carletta J.: Assessing agreement on classification tasks: the kappa statistic. Comput. Linguist. 22(2), 249–254 (1996)
Google Scholar
Carlson, L., Marcu, D., Okurowski, M.: Building a discourse-tagged corpus in the framework of rhetorical structure theory. In: Current Directions in Discourse and Dialogue, pp. 85–112. Kluwer, Dordrecht (2003)
de Carolis, B., Pelachaud, C., Poggi, I.: Verbal and nonverbal discourse planning, proceedings of fourth international conference on autonomous agents. In: Proceedings of the Workshop on Achieving Human-Like Behaviour in Interactive Animated Agents, Fourth International Conference on Autonomous Agents (2000)
Cassell, J.: A framework for gesture generation and interpretation. In: Computer Vision in Human–Machine Interaction, Chap. 11. Cambridge University Press, London (1998)
Chen, L., Liu, Y., Harper, M., Maia, E., McRoy, S.: Evaluating factors impacting the accuracy of forced alignments in a multimodal corpus. In: Proceedings of the 4th Language Resources and Evaluation Conference (2004)
Corio, M., Lapalme, G.: Integrated generation of graphics and text: a corpus study. In: Proceedings of the Association of Computational Linguistics Workshop on Content Visualisation and Intermedia Representation, pp. 63–68 (1998)
Corio, M., Lapalme, G.: Generation of texts for information graphics. In: Proceedings of the European Workshop on Natural Languge Generation, pp. 49–58 (1999)
Crewson P.: Fundamental of clinical research for radiologists: reader agreement studies. Am. J. Roentgenol. 184, 1391–1397 (2005)
Google Scholar
Dasiopoulou, S., Papastathis, V., Mezaris, V., Kompatsiaris, I., Strintzis, M.: An ontology framework for knowledge-assisted semantic video analysis and annotation. In: Proceedings of the International Workshop on Knowledge Markup and Semantic Annotation (2004)
Everingham, M., Gool, L.V., Williams, C., Zisserman, A.: Pascal visual object classes challenge results. World Wide Web (http://www.pascal-network.org/challenges/VOC/voc) (2005)
Fasciano, M., Lapalme, G.: Intentions in the co-ordinated generation of graphics and text from tabular data. Knowl. Inform. Syst. 2(3) (2000)
Feiner, S., McKeown, K.: Automating the generation of co-ordinated multimedia explanations. In: Maybury, M. (ed.) Intelligent Multimedia Interfaces, pp. 117–138, chap. 5. AAAI Press/MIT Press, Cambridge, MA (1993)
Fellbaum,C. (ed.):WordNet:An Electronic Lexical Database. The MIT Press, Cambridge, MA (1998)
MATH Google Scholar
Green, N.: An empirical study of multimedia argumentation. In: Proceedings of the International Conference on Computational Sciences-Part I, pp. 1009–1018. Springer, Berlin (2001)
Gut, U., Looks, K., Thies, A., Trippel, T., Gibbon, D.: Cogest conversational gesture transcription system. Tech. rep., University of Bielefeld (2002)
Jackendoff R.: Consciousness and the Computational Mind. MIT Press, Cambridge (1987)
Google Scholar
Kendon A.: Gesture: Visible Action as Utterance. Cambridge University Press, London (2004)
Google Scholar
Kipp, M.: Gesture generation by imitation—from human behavior to computer character animation. Boca Raton, Florida: Dissertation.com (2004)
Kipp, M.: Spatiotemporal coding in anvil. In: Proceedings of the 6th Language Resources and Evaluation Conference (2008)
Lin, C., Tseng, B., Smith, J.: Video collaborative annotation forum: Establishing ground-truth labels on large multimedia datasets. TRECVID Proceedings (2003)
Lindley, C., Davis, J., Nack, F., Rutledge, L.: The application of rhetorical structure theory to interactive news program generation from digital archives. Technical Report INS-R0101, Centrum voor Wiskunde en Informatica (2001)
Magno-Caldognetto, E., Poggio, I., Cosi, P., Cavicchio, F., Merola, G.: Multimedia score—an anvil-based annotation scheme for multimodal audio-video analysis. In: Proceedings of the LREC Workshop on Multimodal Corpora: Models of Human Behaviour for the Specification and Evaluation Of Multimodal Input And Output Interfaces, pp. 29–33 (2004)
Mann W., Thompson S.: Rhetorical structure theory: description and construction of text structures. In: Kempen, G.(eds) Natural Language Generation: New results in Artificial Intelligence, Psychology and Linguistics, pp. 85–95. Nijhoff, Dodrecht (1987)
Google Scholar
Marsh E., Domas-White M.: A taxonomy of relationships between image and text. J. Document. 59(6), 647–672 (2003)
Article Google Scholar
Martin, J., Grimard, S., Alexandri, K.: On the annotation of multimodal behavior and computation of cooperation between modalities. In: Proceedings of the International Conference on Autonomous Agents workshop on Representing, Annotating, Evaluating Non-verbal and Verbal Communicative Acts to Achieve Contextual Embodied Agents, pp. 1–7 (2001)
Martin, J., Julia, L., Cheyer, A.: A theoretical framework for multimodal user studies. In: Proceedings of the Second International Conference on Cooperative Multimodal Communication, pp. 104–110 (1998)
Martin, J., Kipp, M.: Annotating and measuring multimodal behaviour—tycoon metrics in the anvil tool. In: Proceedings of the Language Resources and Evaluation Conference 2002, pp. 31–35 (2002)
Martinec R., Salway A.: A system for image–text relations in new (and old) media. Vis. Commun. 4(3), 339–374 (2005)
Google Scholar
Maybury, M. (ed.): Intelligent Multimedia Interfaces. AAAI Press/MIT Press, Cambridge, MA (1993)
Google Scholar
Maybury, M.,Wahlster,W. (eds.): Intelligent User Interfaces. Morgan Kaufmann Publishers, San Francisco, CA (1998)
Google Scholar
McNeil D.: Gesture and Thought. The University of Chicago Press, Chicago, IL (2005)
Google Scholar
Minsky, M.: The Society of Mind. Simon and Schuster Inc., NY, USA (1986)
Moore J., Paris C.: Planning text for advisory dialogues: capturing intentional and rhetorical information. Comput. Linguist. 19(4), 651–695 (1993)
Google Scholar
Moore J., Pollack M.: Problem for RST: the need for multi-level discourse analysis. Comput. Linguist. 18(4), 537–544 (1992)
Google Scholar
Nicholas, N.: Parameters for rhetorical structure theory ontology. In: University of Melbourne Working Papers in Linguistics, vol. 15, pp. 77–93. University of Melbourne, Melbourne (1995)
Pastra, K.: The language of caricature: language and drawing interaction. Final year project, Department of Greek Philology and Linguistics, University of Athens (1999) (in Greek)
Pastra, K.: Viewing vision–language integration as a double-grounding case. In: Proceedings of the AAAI Fall Symposium on Achieving Human-Level Intelligence through Integrated Systems and Research, pp. 62–67 (2004)
Pastra, K.: Vision–language integration: a double-grounding case. Ph.D. thesis, University of Sheffield (2005)
Pastra, K.: Beyond multimedia integration: corpora and annotations for cross-media decision mechanisms. In: Proceedings of the 5th Language Resources and Evaluation Conference, pp. 499–504 (2006)
Pastra, K., Piperidis, S.: Video search: new challenges in the pervasive digital video era. J. Virtual Reality Broadcast. 3(11) (2006)
Pastra K., Saggion H., Wilks Y.: Intelligent indexing of crime-scene photographs. IEEE Intell. Syst. 18(1), 55–61 (2003)
Article Google Scholar
Pastra, K., Wilks, Y.: Vision–language integration in AI: a reality check. In: Proceedings of the 16th European Conference in Artificial Intelligence, pp. 937–941 (2004)
Radev, D.: A common theory of information fusion from multiple text sources. step one: cross document structure. In: Proceedings of the 1st SIGdial Workshop on Discourse and Dialogue, pp. 74–83 (2000)
Rocchi, C., Zancanaro, M.: Generation of video documentaries from discourse structures. In: Proceedings of the 9th European Workshop on Natural Language Generation (EWNLG 9) (2003)
Sanders T., Spooren W., Noordman L.: Toward a taxonomy of coherence relations. Discourse Process. 15, 1–35 (1992)
Article Google Scholar
Simou, N., Tzouvaras, V., Avrithis, Y., Stamou, G., Kollias, S.: A visual descriptor ontology for multimedia reasoning. In: Proceedings of the workshop on Image Analysis for Multimedia Interactive Services (WIAMIS) (2005)
Srikanth, M., Varner, J., Bowden, M., Moldovan, D.: Exploiting ontologies for authomatic image annotation. In: Proceedings of the ACM Special Interest Group in Information Retrieval (SIGIR), pp. 552–558 (2005)
Taboada M., Mann W.: Rhetorical structure theory: looking back and moving ahead. Discourse Stud. 8(3), 423–459 (2006)
Article Google Scholar
Wachsmuth, S., Stevenson, S., Dickinson, S.: Towards a framework for learning structured shape models from text-annotated images. In: Proceedings of the HLT-NAACL Workshop on Learning Word Meaning from non-linguistic Data (2003)
Whittaker, S., Walker, M.: Toward a theory of multi-modal interaction. In: Proceedings of the National Conference on Artificial Intelligence Workshop on Multi-modal Interaction (1991)

Download references

Author information

Authors and Affiliations

Department of Language Technology Applications, Institute for Language and Speech Processing, Artemidos 6 and Epidavrou, 15125, Athens, Greece
Katerina Pastra

Authors

Katerina Pastra
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Katerina Pastra.

Additional information

Communicated by B. Bailey.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pastra, K. COSMOROE: a cross-media relations framework for modelling multimedia dialectics. Multimedia Systems 14, 299–323 (2008). https://doi.org/10.1007/s00530-008-0142-0

Download citation

Received: 27 February 2007
Accepted: 19 August 2008
Published: 13 September 2008
Issue Date: November 2008
DOI: https://doi.org/10.1007/s00530-008-0142-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

COSMOROE: a cross-media relations framework for modelling multimedia dialectics

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multimodal Corpora

Multimodal Analysis

Towards Pragmatic Understanding of Conversational Intent: A Multimodal Annotation Approach to Multiparty Informal Interaction – The EVA Corpus

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now