Reading comprehension tests for computer-based understanding evaluation

BEN WELLNER; LISA FERRO; WARREN GREIFF; LYNETTE HIRSCHMAN

doi:10.1017/S1351324905004018

Reading comprehension tests for computer-based understanding evaluation

Published online by Cambridge University Press: 06 December 2005

BEN WELLNER ,

LISA FERRO ,

WARREN GREIFF and

LYNETTE HIRSCHMAN

Show author details

BEN WELLNER: Affiliation:
The MITRE Corporation, 202 Burlington Road, Bedford. MA 01730, USA e-mail: wellner@mitre.org
LISA FERRO: Affiliation:
The MITRE Corporation, 202 Burlington Road, Bedford. MA 01730, USA e-mail: wellner@mitre.org
WARREN GREIFF: Affiliation:
The MITRE Corporation, 202 Burlington Road, Bedford. MA 01730, USA e-mail: wellner@mitre.org
LYNETTE HIRSCHMAN: Affiliation:
The MITRE Corporation, 202 Burlington Road, Bedford. MA 01730, USA e-mail: wellner@mitre.org

Article contents

Abstract

Get access

Rights & Permissions

Abstract

Reading comprehension (RC) tests involve reading a short passage of text and answering a series of questions pertaining to that text. We present a methodology for evaluation of the application of modern natural language technologies to the task of responding to RC tests. Our work is based on ABCs (Abduction Based Comprehension system), an automated system for taking tests requiring short answer phrases as responses. A central goal of ABCs is to serve as a testbed for understanding the role that various linguistic components play in responding to reading comprehension questions. The heart of ABCs is an abductive inference engine that provides three key capabilities: (1) first-order logical representation of relations between entities and events in the text and rules to perform inference over such relations, (2) graceful degradation due to the inclusion of abduction in the reasoning engine, which avoids the brittleness that can be problematic in knowledge representation and reasoning systems and (3) system transparency such that the types of abductive inferences made over an entire corpus provide cues as to where the system is performing poorly and indications as to where existing knowledge is inaccurate or new knowledge is required. ABCs, with certain sub-components not yet automated, finds the correct answer phrase nearly 35 percent of the time using a strict evaluation metric and 45 percent of the time using a looser inexact metric on held out evaluation data. Performance varied for the different question types, ranging from over 50 percent on who questions to over 10 percent on what questions. We present analysis of the roles of individual components and analysis of the impact of various characteristics of the abductive proof procedure on overall system performance.

Type: Papers
Information: Natural Language Engineering , Volume 12 , Issue 4 , December 2006 , pp. 305 - 334

DOI: https://doi.org/10.1017/S1351324905004018 [Opens in a new window]

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article contents

Reading comprehension tests for computer-based understanding evaluation

Abstract

Access options

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests