Hostname: page-component-76fb5796d-45l2p Total loading time: 0 Render date: 2024-04-26T07:02:37.613Z Has data issue: false hasContentIssue false

Reading comprehension tests for computer-based understanding evaluation

Published online by Cambridge University Press:  06 December 2005

BEN WELLNER
Affiliation:
The MITRE Corporation, 202 Burlington Road, Bedford. MA 01730, USA e-mail: wellner@mitre.org
LISA FERRO
Affiliation:
The MITRE Corporation, 202 Burlington Road, Bedford. MA 01730, USA e-mail: wellner@mitre.org
WARREN GREIFF
Affiliation:
The MITRE Corporation, 202 Burlington Road, Bedford. MA 01730, USA e-mail: wellner@mitre.org
LYNETTE HIRSCHMAN
Affiliation:
The MITRE Corporation, 202 Burlington Road, Bedford. MA 01730, USA e-mail: wellner@mitre.org

Abstract

Reading comprehension (RC) tests involve reading a short passage of text and answering a series of questions pertaining to that text. We present a methodology for evaluation of the application of modern natural language technologies to the task of responding to RC tests. Our work is based on ABCs (Abduction Based Comprehension system), an automated system for taking tests requiring short answer phrases as responses. A central goal of ABCs is to serve as a testbed for understanding the role that various linguistic components play in responding to reading comprehension questions. The heart of ABCs is an abductive inference engine that provides three key capabilities: (1) first-order logical representation of relations between entities and events in the text and rules to perform inference over such relations, (2) graceful degradation due to the inclusion of abduction in the reasoning engine, which avoids the brittleness that can be problematic in knowledge representation and reasoning systems and (3) system transparency such that the types of abductive inferences made over an entire corpus provide cues as to where the system is performing poorly and indications as to where existing knowledge is inaccurate or new knowledge is required. ABCs, with certain sub-components not yet automated, finds the correct answer phrase nearly 35 percent of the time using a strict evaluation metric and 45 percent of the time using a looser inexact metric on held out evaluation data. Performance varied for the different question types, ranging from over 50 percent on who questions to over 10 percent on what questions. We present analysis of the roles of individual components and analysis of the impact of various characteristics of the abductive proof procedure on overall system performance.

Type
Papers
Copyright
2005 Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)