research-article

Large-scale extraction and use of knowledge from text

Authors:
Peter Clark

The Boeing Company, Seattle, WA, USA

The Boeing Company, Seattle, WA, USA
View Profile

,
Phil Harrison

The Boeing Company, Seattle, WA, USA

The Boeing Company, Seattle, WA, USA
View Profile

K-CAP '09: Proceedings of the fifth international conference on Knowledge captureSeptember 2009Pages 153–160https://doi.org/10.1145/1597735.1597763

Published:01 September 2009Publication History

K-CAP '09: Proceedings of the fifth international conference on Knowledge capture

Pages 153–160

ABSTRACT

Many AI tasks, in particular natural language processing, require a large amount of world knowledge to create expectations, assess plausibility, and guide disambiguation. However, acquiring this world knowledge remains a formidable challenge. Building on ideas by Schubert, we have developed a system called DART (Discovery and Aggregation of Relations in Text) that extracts simple, semi-formal statements of world knowledge (e.g., "airplanes can fly", "people can drive cars") from text by abstracting from a parser's output, and we have used it to create a database of 23 million propositions of this kind. An evaluation of the DART database on two language processing tasks (parsing and textual entailment) shows that it improves performance, and a human evaluation shows that over half the facts in it are considered true or partially true, rising to 70% for facts seen with high frequency. The significance of this work is two-fold: First it has created a new, publically available knowledge resource for language processing and other data interpretation tasks, and second it provides empirical evidence of the utility of this type of knowledge, going beyond Schubert et al's earlier evaluations which were based solely on human inspection of its contents.

References

Alshawi, H., Carter, D. 1994. Training and Scaling Preference Functions for Disambiguation. Computational Linguistics 20 (4) pp635--648. Google ScholarDigital Library
Baker, C., Fillmore, C., and Lowe, J. 1998. "The Berkeley FrameNet Project." in Proc 36th ACL, pp86--90. CA:Kaufmann. Google ScholarDigital Library
Banko M., Cafarella, M., Soderland, S., Broadhead, M., Etzioni, O. 2007. Open Information Extraction from the Web. IJCAI'07. Google ScholarDigital Library
P. Clark, P. Harrison. Recognizing Textual Entailment with Logical Inference. In Proceedings of 2008 Text Analysis Conference (TAC'08), Gaithsburg, Maryland, 2008.Google Scholar
Fellbaum, C. 1998. WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.Google Scholar
Harrison, P., and Maxwell, M. 1986. A New Implementation of GPSG, Proc. 6th Canadian Conf on AI (CSCSI'86), pp78--83.Google Scholar
Havasi, C., Speer, R.&Alonso, J. 2007. ConceptNet3: a Flexible, Multilingual Semantic Network for Common Sense Knowledge. Proceedings of Recent Advances in Natural Languges Processing.Google Scholar
Lenat, D. B., and Guha, R. V. 1990. Building Large Knowledg Based Systems: Representation and Inference in the Cyc Project. Reading, MA: Addison-Wesley. Google ScholarDigital Library
Lieberman, H., Liu, H., Singh. P., Barry, B. 2004. Beating some common sense into interactive applications. AI Magazine.Google Scholar
Lin, D. 1998. Extracting Collocations from Text Corpora. Workshop on Computational Terminology. pp. 57--63.Google Scholar
Lin, D., and Pantel, P. 2001. Discovery of Inference Rules for Question Answering. Natural Language Engineering 7 (4) pp 343--360. Google ScholarDigital Library
Marcus, M., Santorini, B., Marcinkiewicz, M. 1993. Building a Large Annotated Corpus of English : The Penn Treebank. Computational Linguistics, 19 (2). 313--330. Google ScholarDigital Library
Nelson, F., Kucera, H. 1982. Frequency analysis of English usage. Houghton Mifflin Company, Boston.Google Scholar
Pantel, P., Bhagat, R., Coppola, B., Chklovski, T., Hovy, E. 2007. ISP: Learning Inferential Selectional Preferences. In Human Language Technologies, NAACL HLT 2007.Google Scholar
Ratnaparkhi, A. 1998. Unsupervised Statistical Models for Prepositional Phrase Attachment. Proc. COLING-ACL'98 Google ScholarDigital Library
Resnik, P. 1997. Selectional preference and sense disambiguation. In Proceedings of the ACL SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How?, pages 52--57.Google Scholar
Schubert, L. 2002. "Can we derive general world knowledge from texts?", M. Marcus (ed.), Proc. of the 2nd Int. Conf. on Human Language Technology Research (HLT 2002), Google ScholarDigital Library
Schubert, L. and Tong, M. 2003. Extracting and evaluating general world knowledge from the Brown corpus, Proc. of the HLT/NAACL 2003 Workshop on Text Meaning. Google ScholarDigital Library
Szpektor, I., Dagan, I., Bar-Haim, R., Goldberger, J. 2008. Contextual Preferences. Proceedings of ACL 2008.Google Scholar
Van Durme, B., Michalak, P., Schubert, L. 2009. Deriving Generalized Knowledge from Corpora using WordNet Abstraction. Proc. EACL'09. Google ScholarDigital Library
Van Durme, B., Schubert, L. Open Knowledge Extraction through Compositional Language Processing. Symposium on Semantics in Systems for Text Processing (STEP'08). Venice, Italy. September 22--24, 2008. Google ScholarDigital Library
Voorhees E., and Harman, D. 1999. Overview of the seventh text retrieval conference. In Proceedings of the Seventh Text Retrieval Conference (TREC-7). NIST Special Publication.Google Scholar

Index Terms

Large-scale extraction and use of knowledge from text
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
    1. Learning settings

Recommendations

Textual entailment classification using syntactic structures and semantic relations

In this paper, we propose a method exploiting syntactic structure, semantic relations and word embeddings for recognizing textual entailment. The sentence pairs are analyzed using their syntactic structure and categorization of sentences in active voice, ...
Read More
Integrating statistical and lexical information for recognizing textual entailments in text

Recognizing textual entailment is to infer that a given text span follows from the meaning of a given hypothesis. To have better recognition capability, it is necessary to employ deep text processing units such as syntactic parsers and semantic taggers. ...
Read More
A Compositional Distributional Inclusion Hypothesis
Logical Aspects of Computational Linguistics. Celebrating 20 Years of LACL (1996–2016)
Abstract
The distributional inclusion hypothesis provides a pragmatic way of evaluating entailment between word vectors as represented in a distributional model of meaning. In this paper, we extend this hypothesis to the realm of compositional ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
K-CAP '09: Proceedings of the fifth international conference on Knowledge capture
September 2009
222 pages
ISBN:9781605586588
DOI:10.1145/1597735
General Chair:
Yolanda Gil
USC Information Sciences Institute, USA
,
Program Chair:
Natasha Noy
Stanford University, USA
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 September 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
knowledge acquisition
natural language processing
parsing
textual entailment
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate55of198submissions,28%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 24
  Total Citations
  View Citations
- 294
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Large-scale extraction and use of knowledge from text

K-CAP '09: Proceedings of the fifth international conference on Knowledge capture

ABSTRACT

References

Cited By

Index Terms

Recommendations

Textual entailment classification using syntactic structures and semantic relations

Integrating statistical and lexical information for recognizing textual entailments in text

A Compositional Distributional Inclusion Hypothesis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Large-scale extraction and use of knowledge from text

K-CAP '09: Proceedings of the fifth international conference on Knowledge capture

ABSTRACT

References

Cited By

Index Terms

Recommendations

Textual entailment classification using syntactic structures and semantic relations

Integrating statistical and lexical information for recognizing textual entailments in text

A Compositional Distributional Inclusion Hypothesis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media