IXIR: A statistical information distillation system

https://doi.org/10.1016/j.csl.2009.03.006Get rights and content

Abstract

The task of information distillation is to extract snippets from massive multilingual audio and textual document sources that are relevant for a given templated query. We present an approach that focuses on the sentence extraction phase of the distillation process. It selects document sentences with respect to their relevance to a query via statistical classification with support vector machines. The distinguishing contribution of the approach is a novel method to generate classification features. The features are extracted from charts, compilations of elements from various annotation layers, such as word transcriptions, syntactic and semantic parses, and information extraction (IE) annotations. We describe a procedure for creating charts from documents and queries, while paying special attention to query slots (free-text descriptions of names, organizations, topic, events and so on, around which templates are centered), and suggest various types of classification features that can be extracted from these charts. While observing a 30% relative improvement due to non-lexical annotation layers, we perform a detailed analysis of the contributions of each of these layers to classification performance.

Introduction

We present an approach that, given a templated query and a collection of documents, selects those document sentences that are relevant to the query. The task is essential for template-based question answering (termed “information distillation” in the DARPA-funded GALE program BAE, 2006) where one is asked to find answers to queries from a possibly very large collection of documents. A finite number of query templates are agreed upon in advance and have variable slots that are filled with free-text descriptions at runtime. For instance, the template “describe the prosecution of [person] for [crime]” has two slots that could be filled with person = Saddam Hussein”, crime = crimes against humanity”. One other example is the open-domain template “describe the facts about [event]” where slot event might contain “civil unrest in France” or “bird flu outbreak in China”. The output of the information distillation system is a list of snippets (sentences or phrases) relevant to the query.

The online part of a typical question answering (distillation) system consists of the following stages:

  • (1)

    Query Processing decides what kinds of information to look for in the documents.

  • (2)

    Information Retrieval (IR) retrieves documents that are likely to contain answers.

  • (3)

    Snippet Extraction uses the retrieved documents to select sentences (or parts thereof) that contain answers.

  • (4)

    Answer Formulation combines (ranks, removes redundancy and possibly modifies) the extracted snippets to form an output.

In addition, offline document preprocessing can be used to simplify and speed up online processing. In this paper, we are mostly concerned with the snippet extraction (stage 3) and, in particular, with selecting relevant sentences from documents.

Several factors have influenced question answering research in recent years. The size of data sets has increased to reach about 500 K documents in GALE (BAE, 2006). Starting with the European clef project (Magnini et al., 2003), multilingual and cross-language question answering began gaining importance. While finding answers in a source language remains an interesting option for future research, most current systems translate documents automatically and do question answering on the translated output. However, machine translation adds noise to the texts and makes the task more difficult. Moreover, when dealing with speech, automatic speech recognition results in additional noise.

While manually designed lexical patterns were shown to be very useful in extracting relevant sentences (e.g. Soubbotin and Soubbotin, 2001), the recent trends listed above suggest that robust, trainable statistical mechanisms should augment (if not entirely replace) pattern-based models. These methods are cheaper as they only require a minimum amount of human expertise (mostly yes/no relevance transcriptions) and better suited to deal with noise by learning from the actual data.

For instance, Ravichandran and Hovy (2002) learn lexical patterns from the Internet, while focusing on precision, and our group reported earlier on a system that relied on a discriminative model to select relevant sentences for queries using word (and named entity) n-grams as classification features (Hakkani-Tür and Tur, 2007). The system was applied downstream of the University of Massachusetts indri search engine (Strohman et al., 2005) that retrieved relevant documents for a query. It showed a clear improvement in distillation F-measure compared to a baseline keyword spotting strategy.

The present paper extends this approach in several significant ways. First, our classification features are composed of not only word transcriptions and names, but also of other representations associated with the sentences. Among those we count syntactic parses, semantic predicate-argument structures, and various elements of information extraction (IE) annotations. By incorporating all these representations into one coherent structure, a chart, we are able to evaluate their contributions and combine them to optimize classification performance. Second, we also extract classification features from query slots. Eight query slot types were suggested for GALE Year-II evaluations: person, organization, country, location, crime, event, topic, date.1

Our current approach treats slots that are named entities, as defined by message understanding conference (MUC) and automatic content extraction (ACE) evaluation tasks (LDC, 2005), (such as person, organization, country) differently from the slots that do not (e.g. topic and event2), which can be any noun phrase, hence can be expressed with a much greater lexical variability than named entities (see the examples above).

We first reported on the classification advantages achieved through these enhancements in Levit et al. (2007a). The present paper, being a full version of the previous publication, is intended to offer a formal justification for the statistical sentence extraction, explain details of the adopted classification approach such as feature extraction and determination of alternative names. We also explore individual contributions of various types of classification features to the overall system’s performance and make an effort to place our approach in the context of other state of the art question answering systems.

The remainder of the paper is organized as follows: Section 2 formalizes a statistical model of sentence extraction in the question answering domain. In Section 3 we introduce the central notion of a chart, itemize annotation layers used for chart creation, and describe features that are extracted from charts including the topicality features, a special kind of slot-related features used to evaluate a sentence’s relationship to query slots. Next, we present a series of distillation experiments designed to evaluate the contribution to classification of various features. We review related work in Section 5, and conclude the paper suggesting future directions in Section 6.

Section snippets

Statistical sentence extraction in IXIR

In accordance with the structural description of a typical distillation system presented in the previous section, IXIR’s3 general architecture, as shown in Fig. 1, comprises an IR stage for document retrieval (indri; Strohman et al., 2005), which is followed by a statistical sentence extraction mechanism and concluded by redundancy removal and snippet reranking performed at Columbia University.

In this paper, we focus on the sentence

Sentence charts and extraction of classification features

From a practical point of view, it might be reasonable to assume that all information needed to ascertain relevance of a sentence with respect to a query can be reduced to the wording of the sentence (and possibly its context). Given a suitable training set, word n-gram models have been shown to provide reasonable performance for various NLP tasks as text categorization (Schapire and Singer, 2000, Joachims, 1998), calltype classification (Haffner et al., 2003), named entity detection (Levit et

Data sets and experimental setup

All the experiments in this paper are conducted on GALE data. We considered five query templates that are shown in Table 1. For each of the templates in this table, several training and test queries have been provided along with answer keys: relevant sentences (with document IDs) selected by humans. These are then converted to labels “relevant” and “irrelevant” for each sentence in the affected documents. For our experiments we used documents that (according to the labelers) contain at least

Related work

Question answering (QA) has a rich history of previous research. At the latest when TREC-8 introduced QA-track (Voorhees, 1999) to extend and refine existing document retrieval (DR) functionality, sentence-level question answering stepped into the spotlight. Since then, several commercial and government-sponsored programs appeared that are primarily concerned with finding pinpoint answers to user queries from large collections of documents (e.g. Voorhees, 2003, Magnini et al., 2003, BAE, 2006,

Conclusion and future work

We presented a system for information distillation that extracts relevant sentences for templated queries with variable slots. The system introduces a novel model for generating classification features. We extract n-grams and inclusion features from sentence charts which are directed acyclic graphs of word-aligned elements from various annotation layers (such as lexical representations, syntactic parses, semantic role labels and ACE annotations), and also compute sentence topicality features as

Acknowledgements

The authors would like to thank the SRI GALE Distillation team members at Columbia University, University of Massachusetts, NYU, and Fair Isaac. This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-06-C-0023. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of DARPA.

References (45)

  • BAE, 2006. Go/No-Go Formal Distillation Evaluation Plan for GALE. Technical Report....
  • Baker, C.F., Fillmore, C.J., Lowe, J.B., 1998. The Berkeley FrameNet Project. In: Proceedings of COLING–ACL’98....
  • Bird, S., Liberman, M., 1999. A Formal Framework for Linguistic Annotation. Technical Report. MS-CIS-99-01,...
  • S. Blair-Goldensohn et al.

    Answering definitional questions: a hybrid approach

  • Charniak, E., 2000. A maximum-entropy-inspired parser. In: Proceedings of...
  • Dagan, I., Glickman, O., Magnini, B., 2006. The PASCAL recognising textual entailment challenge. In: Lecture Notes in...
  • Dang, H.T., 2005. Overview of DUC-2005. In: Proceedings of the 2005 Document Understanding Conference at...
  • de Marneffe, M.-C., MacCartney, B., Grenager, T., Cer, D., Rafferty, A., Manning, C.D., 2006a. Learning to distinguish...
  • de Marneffe, M.-C., MacCartney, B., Manning, C.D., 2006b. Generating typed dependency parses from phrase structure...
  • de Salvo Braz, R., Girju, R., Punyakanok, V., Roth, D., Sammons, M., 2005. Knowledge representation for semantic...
  • Gillick, D., Hakkani-Tür, D., Levit, M., 2008. Unsupervised learning of edit parameters for matching name variants. In:...
  • Grishman, R., Westbrook, D., Meyers, A., 2005. NYU’s English ACE 2005 System Description. Technical Report. 05-019,...
  • Haffner, P., Tur, G., Wright, J., 2003. Optimizing SVMs for complex call classification. In: Proceedings of...
  • Hakkani-Tür, D., Tur, G., 2007. Statistical sentence extraction for information distillation. In: Proceedings of...
  • Hakkani-Tür, D., Tur, G., Chotimongkol, A., 2005. Using syntactic and semantic graphs for call classification. In:...
  • Hakkani-Tür, D., Tur, G., Levit, M., 2007. Exploiting information extraction annotations for document retrieval in...
  • Joachims, T., 1998. Text categorization with support vector machines: learning with many relevant features. In:...
  • T. Joachims

    Making large-scale SVM learning practical

  • Kaisser, M., Webber, B., 2007. Question answering based on semantic roles. In: Proceedings of the ACL-2007, Deep...
  • Katz, B., Lin, J., 2003. Selectively using relations to improve precision in question answering. In: Proceedings of the...
  • Kingsbury, P., Palmer, M., Marcus, M., 2002. Adding semantic annotation to the Penn TreeBank. In: Proceedings of...
  • LDC, 2005. ACE English Annotation Guidelines for Entities. Technical Report....
  • Cited by (0)

    View full text