ABSTRACT
Many AI tasks, in particular natural language processing, require a large amount of world knowledge to create expectations, assess plausibility, and guide disambiguation. However, acquiring this world knowledge remains a formidable challenge. Building on ideas by Schubert, we have developed a system called DART (Discovery and Aggregation of Relations in Text) that extracts simple, semi-formal statements of world knowledge (e.g., "airplanes can fly", "people can drive cars") from text by abstracting from a parser's output, and we have used it to create a database of 23 million propositions of this kind. An evaluation of the DART database on two language processing tasks (parsing and textual entailment) shows that it improves performance, and a human evaluation shows that over half the facts in it are considered true or partially true, rising to 70% for facts seen with high frequency. The significance of this work is two-fold: First it has created a new, publically available knowledge resource for language processing and other data interpretation tasks, and second it provides empirical evidence of the utility of this type of knowledge, going beyond Schubert et al's earlier evaluations which were based solely on human inspection of its contents.
- Alshawi, H., Carter, D. 1994. Training and Scaling Preference Functions for Disambiguation. Computational Linguistics 20 (4) pp635--648. Google ScholarDigital Library
- Baker, C., Fillmore, C., and Lowe, J. 1998. "The Berkeley FrameNet Project." in Proc 36th ACL, pp86--90. CA:Kaufmann. Google ScholarDigital Library
- Banko M., Cafarella, M., Soderland, S., Broadhead, M., Etzioni, O. 2007. Open Information Extraction from the Web. IJCAI'07. Google ScholarDigital Library
- P. Clark, P. Harrison. Recognizing Textual Entailment with Logical Inference. In Proceedings of 2008 Text Analysis Conference (TAC'08), Gaithsburg, Maryland, 2008.Google Scholar
- Fellbaum, C. 1998. WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.Google Scholar
- Harrison, P., and Maxwell, M. 1986. A New Implementation of GPSG, Proc. 6th Canadian Conf on AI (CSCSI'86), pp78--83.Google Scholar
- Havasi, C., Speer, R.&Alonso, J. 2007. ConceptNet3: a Flexible, Multilingual Semantic Network for Common Sense Knowledge. Proceedings of Recent Advances in Natural Languges Processing.Google Scholar
- Lenat, D. B., and Guha, R. V. 1990. Building Large Knowledg Based Systems: Representation and Inference in the Cyc Project. Reading, MA: Addison-Wesley. Google ScholarDigital Library
- Lieberman, H., Liu, H., Singh. P., Barry, B. 2004. Beating some common sense into interactive applications. AI Magazine.Google Scholar
- Lin, D. 1998. Extracting Collocations from Text Corpora. Workshop on Computational Terminology. pp. 57--63.Google Scholar
- Lin, D., and Pantel, P. 2001. Discovery of Inference Rules for Question Answering. Natural Language Engineering 7 (4) pp 343--360. Google ScholarDigital Library
- Marcus, M., Santorini, B., Marcinkiewicz, M. 1993. Building a Large Annotated Corpus of English : The Penn Treebank. Computational Linguistics, 19 (2). 313--330. Google ScholarDigital Library
- Nelson, F., Kucera, H. 1982. Frequency analysis of English usage. Houghton Mifflin Company, Boston.Google Scholar
- Pantel, P., Bhagat, R., Coppola, B., Chklovski, T., Hovy, E. 2007. ISP: Learning Inferential Selectional Preferences. In Human Language Technologies, NAACL HLT 2007.Google Scholar
- Ratnaparkhi, A. 1998. Unsupervised Statistical Models for Prepositional Phrase Attachment. Proc. COLING-ACL'98 Google ScholarDigital Library
- Resnik, P. 1997. Selectional preference and sense disambiguation. In Proceedings of the ACL SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How?, pages 52--57.Google Scholar
- Schubert, L. 2002. "Can we derive general world knowledge from texts?", M. Marcus (ed.), Proc. of the 2nd Int. Conf. on Human Language Technology Research (HLT 2002), Google ScholarDigital Library
- Schubert, L. and Tong, M. 2003. Extracting and evaluating general world knowledge from the Brown corpus, Proc. of the HLT/NAACL 2003 Workshop on Text Meaning. Google ScholarDigital Library
- Szpektor, I., Dagan, I., Bar-Haim, R., Goldberger, J. 2008. Contextual Preferences. Proceedings of ACL 2008.Google Scholar
- Van Durme, B., Michalak, P., Schubert, L. 2009. Deriving Generalized Knowledge from Corpora using WordNet Abstraction. Proc. EACL'09. Google ScholarDigital Library
- Van Durme, B., Schubert, L. Open Knowledge Extraction through Compositional Language Processing. Symposium on Semantics in Systems for Text Processing (STEP'08). Venice, Italy. September 22--24, 2008. Google ScholarDigital Library
- Voorhees E., and Harman, D. 1999. Overview of the seventh text retrieval conference. In Proceedings of the Seventh Text Retrieval Conference (TREC-7). NIST Special Publication.Google Scholar
Index Terms
- Large-scale extraction and use of knowledge from text
Recommendations
Textual entailment classification using syntactic structures and semantic relations
In this paper, we propose a method exploiting syntactic structure, semantic relations and word embeddings for recognizing textual entailment. The sentence pairs are analyzed using their syntactic structure and categorization of sentences in active voice, ...
Integrating statistical and lexical information for recognizing textual entailments in text
Recognizing textual entailment is to infer that a given text span follows from the meaning of a given hypothesis. To have better recognition capability, it is necessary to employ deep text processing units such as syntactic parsers and semantic taggers. ...
A Compositional Distributional Inclusion Hypothesis
Logical Aspects of Computational Linguistics. Celebrating 20 Years of LACL (1996–2016)AbstractThe distributional inclusion hypothesis provides a pragmatic way of evaluating entailment between word vectors as represented in a distributional model of meaning. In this paper, we extend this hypothesis to the realm of compositional ...
Comments