Multimedia indexing through multi-source and multi-language information extraction: the MUMIS project☆
Introduction
The vast amount of multimedia information available and the need to access its essential content accurately to satisfy users’ demands encourages the development of techniques for multimedia indexing and searching. It is well known that there are no effective methods for automatic indexing and retrieving of image and video fragments on the basis of analysis of their visual features. Some projects, like Pop-Eye and OLIVE [12], exploit collateral linguistic media as a means for the automatic identification and indexing of relevant multimedia material. Pop-Eye uses subtitles to index video streams offering time-stamped text to satisfy user queries. OLIVE uses ASR to generate transcriptions of news reports indexed in ways similar to Pop-Eye. Semantic Video Indexing [9] is a proposal emphasising the use of domain knowledge to index and retrieve video material, but has not been implemented. The Informedia Projects I and II (http://www.informedia.cs.cmu.edu) combine visual features and shallow language analysis (i.e., keyword spotting) to characterise multimedia content. The SOCIS project [13] applies parsing and logical rules to scene-of-crime descriptions of images in order to produce conceptual triples describing objects and their relations at the scene.
The Multimedia Indexing and Searching Environment (MUMIS) Project1 is the first multimedia indexing project which carries out indexing by applying information extraction to multimedia and multi-lingual information sources in Dutch, English, and German, merging information from many sources to improve indexing quality, and combining database queries with direct access to multimedia fragments on the multimedia programme.
In this paper, we present our approach to Information Extraction from English texts that is based on the use of finite state machinery pipelined with full semantic analysis and discourse interpretation. The rest of the paper is organised as follows: in Section 2 we give an overview of the MUMIS project, then in Section 3 we introduce the English information extraction system. In Section 4, we describe our approach to syntactic parsing and semantic interpretation. Discourse interpretation and coreference resolution is presented in Section 5, and finally, in Section 7 we present our conclusions.
Section snippets
Project overview
In MUMIS various software components operate off-line to generate formal annotations from multi-source linguistic data in Dutch, English, and German to produce a composite index of the events on the multimedia programme [4], [5], [15]. The domain chosen for tuning the software components and for testing is football, and in particular the Euro2000 championships. This subject was chosen because of the huge amount of information available on the domain, as well as for the economic and public
Overview of the English information extraction system
Our system is conceptualised as a Java front-end system based on finite state transduction followed by a Prolog back-end system for inference over a classification hierarchy implemented in SICStus Prolog. The system architecture is shown in Fig. 5 and our development environment at work is shown in Fig. 6.
The finite state machinery is based on ANNIE, a free IE system available as part of GATE, a General Architecture for Text Engineering [3] (see http://gate.ac.uk/). The input to the process is
Parsing and semantic interpretation
While the software components being developed are adaptable to any kind of football report, in this paper we focus on the analysis of tickers. Ticker reports are in essence dynamic texts: a verbal account of events over time stamps, and this fact is taken into account during analysis: scores change as well as players leaving or entering the game. These texts also have a specific text structure that we take into account when parsing. We have developed a simple pre-processing step that identifies
Domain modelling and discourse interpretation
The discourse interpreter is based on a World Model representing the ontological (or hierarchical) knowledge about a particular domain. The interpreter works by mapping the information produced by the parsing and semantic interpretation into an evolving Discourse Model of the input text. The World Model contains rules allowing the deduction of new knowledge from the “explicit” information found in the text, and also the connection between new and old instances mentioned in the input text
Evaluation of the information extraction task
We have carried out a small and preliminary evaluation of the complete information extraction task. We used as test documents a subset of statistic reports published by Soccernet Euro2000 http://www.soccernet.com/euro2000. These reports are not part of our initial corpus and can be considered formal texts because they contain short tabular descriptions about the following events: goals, own goals, penalties, substitutions, yellow cards, and red cards.
The set of texts report the following
Conclusion and future work
The huge amount of multimedia information accessible directly to the end users require a new generation of tools to provide “intelligent” access to specific information. MUMIS is the first multimedia indexing project which carries out indexing by applying information extraction to multimedia and multi-lingual information sources, merging information from many sources to improve the quality of the annotation database, and combining database queries with direct access to multimedia fragments.
In
Dr. Horacio Saggion received his Ph.D. from Universite de Montreal, Canada, in 2000, and his Master degree from the Universidade Estadual de Campinas, UNICAMP, Brazil, in 1995. He studied Computer Science in the Computer Science Department at Universidad de Buenos Aires, Argentina. He worked many years as teaching assistant and research assistant at the Computer Science Department and as System Programmer for the industry. He is currently working as research associate in the Natural Language
References (19)
- G. Burnage, CELEX: a guide for users, Centre for Lexical Information, Nijmegen,...
- H. Cunningham, R.G. Gaizauskas, K. Humphreys, Y. Wilks, Experience with a language engineering architecture: three...
- H. Cunningham, D. Maynard, K. Bontcheva, V. Tablan, C. Ursu, The GATE user guide, 2002. Available from...
- Franciska de Jong, Thijs Westerveld, MUMIS: multimedia indexing and searching, in: Proceedings of the Content-Based...
- T. Declerck, P. Wittenburg, H. Cunningham, The automatic generation of formal annotations in a multimedia indexing and...
- R. Gaizauskas, K. Humphreys. Quantitative evaluation of coreference algorithms in an information extraction system, in:...
- R. Gaizauskas, K. Humphreys, XI: a simple prolog-based language for cross-classification and inheritance, in:...
- et al.
Natural Language Processing in Prolog: An Introduction to Computational Linguistics
(1989) Semantic video indexing: approach and issues
SIGMOD-Record
(March 1999)
Cited by (35)
A semi-automatic text-based semantic video annotation system for Turkish facilitating multilingual retrieval
2013, Expert Systems with ApplicationsCitation Excerpt :Yet, it is widely acknowledged that an important proportion of semantic concepts in videos cannot be detected through the audio–visual analysis alone, that is, audio–visual components of videos are usually far from being sufficient for semantic information extraction. In order to bridge this semantic gap, defined as the lack of coincidence between automatically extracted information from multimedia data and the semantic interpretation of the same data by its users (Smeulders, Worring, Santini, Gupta, & Jain, 2000), video texts encompassing speech transcriptions, overlay/scene texts, or sliding texts offer a valuable source of information (Küçük, Özgür, Yazıcı, & Koyuncu, 2009; Saggion et al., 2004; Snoek & Worring, 2005; Yan & Hauptman, 2007) especially for generic domains such as news videos. Information extraction (IE) is a topic in natural language processing which targets at automatic extraction of semantic information including entities, relations, and events in free natural language texts (Grishman, 2003, chap.
An ontology-based retrieval system using semantic indexing
2012, Information SystemsA hybrid named entity recognizer for Turkish
2012, Expert Systems with ApplicationsCitation Excerpt :Information extraction is usually defined as the task of determining important pieces of information such as entities and events in natural language texts. It is a long-studied topic with many application areas including information retrieval (Mihalcea & Moldovan, 2001), automatic summarization (Lee, Chen, & Jian, 2003), and semantic multimedia annotation (Dowman, Tablan, Cunningham, & Popov, 2005; Saggion et al., 2004) in several domains such as financial documents (Seng & Lai, 2010), business information documents (Sung & Chang, 2004), and biomedical texts (Tsai et al., 2006). Named entity recognition is a subtask of information extraction where identifiers of people, locations, and organizations are extracted as well as some temporal and numeric expressions (Nadeau & Sekine, 2007).
Exploiting information extraction techniques for automatic semantic video indexing with an application to Turkish news videos
2011, Knowledge-Based SystemsCitation Excerpt :However, it is widely acknowledged that an important proportion of semantic concepts in the videos cannot be detected through the AV analysis of the videos alone and that text cues are proved to be useful, when employed along with the AV features of the videos, in improving semantic video indexing [7,10–12]. The utilization of video texts such as speech transcriptions, overlay/sliding texts, and Webcast texts – semi-structured natural language texts provided by some sports news sites highlighting important events in sports videos – as an information source is a fruitful approach to the problem of automatic semantic video annotation for applicable domains including news videos and possibly excluding some surveillance videos as pointed out in [6,7,13–17]. For the applicable domains, video texts, which are known to be reliable in handling semantic queries over news videos [7], can usually be obtained through techniques such as automatic speech recognition (ASR), video optical character recognition, or sliding text recognition, if not already available as the associated texts.
Enhancing TV programmes with additional contents using MPEG-7 segmentation information
2010, Expert Systems with ApplicationsA fuzzy conceptual model for multimedia data with a text-based automatic annotation scheme
2009, International Journal of Uncertainty, Fuzziness and Knowldege-Based Systems
Dr. Horacio Saggion received his Ph.D. from Universite de Montreal, Canada, in 2000, and his Master degree from the Universidade Estadual de Campinas, UNICAMP, Brazil, in 1995. He studied Computer Science in the Computer Science Department at Universidad de Buenos Aires, Argentina. He worked many years as teaching assistant and research assistant at the Computer Science Department and as System Programmer for the industry. He is currently working as research associate in the Natural Language Processing group at the Department of Computer Science, University of Sheffield, UK. His main interests in Natural Language Processing are text summarization, shallow natural language processing, text structure, discourse interpretation, and natural language generation.
Dr. Hamish Cunningham has been a programmer, systems administrator, and, for the last decade or so, a researcher in Natural Language Engineering. He lead development of widely used GATE system, and currently holds the post of Senior Research Scientist in Computer Science at the University of Sheffield.
Dr. Kalina Bontcheva received her Ph.D. in Natural Language Processing (NLP) from the University of Sheffield, UK, in 2001, and her Master degree in Computer Science from Sofia University, Bulgaria, in 1995. She studied Computer Science in the Computer Science Department after three years as a software engineer, she is currently working as a research associate at the University of Sheffield. Her main interests in NLP are software architectures for NLP (GATE––gate.ac.uk), information extraction, natural language generation, and text summarization.
Dr. Diana Maynard received a Ph.D. in Automatic Term Recognition from Manchester Metropolitan University, UK in 2000, and has been involved in research in Computational Linguistics for the last 10 years in France and the UK. She is currently working as a Research Associate in the NLP group at the University of Sheffield, UK, where her main interests are in Information Extraction, robust and adaptable tools for language engineering, and terminology.
Oana Hamza worked as research associate at the University of Sheffield.
Yorick Wilks is a professor of computer science at the University of Sheffield, head of the Natural Language Processing group, and director of the Institute for Language, Speech and Hearing (ILASH). His research interests include information extraction, dialogue systems, and machine translation. He is member of the UK’s Engineering and Physical Sciences Research Council College of Computing and a fellow of the American Association for Artificial Intelligence and the European AI Association. Department of Computer Science, University of Sheffield, 211 Portobello Street, S1 4DP Sheffield, UK. [email protected].
- ☆
MUMIS is a EU-funded project within the Information Society Program (IST) of the European Union, section Human Language Technology (HLT).