skip to main content
10.1145/1597735.1597763acmconferencesArticle/Chapter ViewAbstractPublication Pagesk-capConference Proceedingsconference-collections
research-article

Large-scale extraction and use of knowledge from text

Published:01 September 2009Publication History

ABSTRACT

Many AI tasks, in particular natural language processing, require a large amount of world knowledge to create expectations, assess plausibility, and guide disambiguation. However, acquiring this world knowledge remains a formidable challenge. Building on ideas by Schubert, we have developed a system called DART (Discovery and Aggregation of Relations in Text) that extracts simple, semi-formal statements of world knowledge (e.g., "airplanes can fly", "people can drive cars") from text by abstracting from a parser's output, and we have used it to create a database of 23 million propositions of this kind. An evaluation of the DART database on two language processing tasks (parsing and textual entailment) shows that it improves performance, and a human evaluation shows that over half the facts in it are considered true or partially true, rising to 70% for facts seen with high frequency. The significance of this work is two-fold: First it has created a new, publically available knowledge resource for language processing and other data interpretation tasks, and second it provides empirical evidence of the utility of this type of knowledge, going beyond Schubert et al's earlier evaluations which were based solely on human inspection of its contents.

References

  1. Alshawi, H., Carter, D. 1994. Training and Scaling Preference Functions for Disambiguation. Computational Linguistics 20 (4) pp635--648. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Baker, C., Fillmore, C., and Lowe, J. 1998. "The Berkeley FrameNet Project." in Proc 36th ACL, pp86--90. CA:Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Banko M., Cafarella, M., Soderland, S., Broadhead, M., Etzioni, O. 2007. Open Information Extraction from the Web. IJCAI'07. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Clark, P. Harrison. Recognizing Textual Entailment with Logical Inference. In Proceedings of 2008 Text Analysis Conference (TAC'08), Gaithsburg, Maryland, 2008.Google ScholarGoogle Scholar
  5. Fellbaum, C. 1998. WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.Google ScholarGoogle Scholar
  6. Harrison, P., and Maxwell, M. 1986. A New Implementation of GPSG, Proc. 6th Canadian Conf on AI (CSCSI'86), pp78--83.Google ScholarGoogle Scholar
  7. Havasi, C., Speer, R.&Alonso, J. 2007. ConceptNet3: a Flexible, Multilingual Semantic Network for Common Sense Knowledge. Proceedings of Recent Advances in Natural Languges Processing.Google ScholarGoogle Scholar
  8. Lenat, D. B., and Guha, R. V. 1990. Building Large Knowledg Based Systems: Representation and Inference in the Cyc Project. Reading, MA: Addison-Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Lieberman, H., Liu, H., Singh. P., Barry, B. 2004. Beating some common sense into interactive applications. AI Magazine.Google ScholarGoogle Scholar
  10. Lin, D. 1998. Extracting Collocations from Text Corpora. Workshop on Computational Terminology. pp. 57--63.Google ScholarGoogle Scholar
  11. Lin, D., and Pantel, P. 2001. Discovery of Inference Rules for Question Answering. Natural Language Engineering 7 (4) pp 343--360. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Marcus, M., Santorini, B., Marcinkiewicz, M. 1993. Building a Large Annotated Corpus of English : The Penn Treebank. Computational Linguistics, 19 (2). 313--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Nelson, F., Kucera, H. 1982. Frequency analysis of English usage. Houghton Mifflin Company, Boston.Google ScholarGoogle Scholar
  14. Pantel, P., Bhagat, R., Coppola, B., Chklovski, T., Hovy, E. 2007. ISP: Learning Inferential Selectional Preferences. In Human Language Technologies, NAACL HLT 2007.Google ScholarGoogle Scholar
  15. Ratnaparkhi, A. 1998. Unsupervised Statistical Models for Prepositional Phrase Attachment. Proc. COLING-ACL'98 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Resnik, P. 1997. Selectional preference and sense disambiguation. In Proceedings of the ACL SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How?, pages 52--57.Google ScholarGoogle Scholar
  17. Schubert, L. 2002. "Can we derive general world knowledge from texts?", M. Marcus (ed.), Proc. of the 2nd Int. Conf. on Human Language Technology Research (HLT 2002), Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Schubert, L. and Tong, M. 2003. Extracting and evaluating general world knowledge from the Brown corpus, Proc. of the HLT/NAACL 2003 Workshop on Text Meaning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Szpektor, I., Dagan, I., Bar-Haim, R., Goldberger, J. 2008. Contextual Preferences. Proceedings of ACL 2008.Google ScholarGoogle Scholar
  20. Van Durme, B., Michalak, P., Schubert, L. 2009. Deriving Generalized Knowledge from Corpora using WordNet Abstraction. Proc. EACL'09. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Van Durme, B., Schubert, L. Open Knowledge Extraction through Compositional Language Processing. Symposium on Semantics in Systems for Text Processing (STEP'08). Venice, Italy. September 22--24, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Voorhees E., and Harman, D. 1999. Overview of the seventh text retrieval conference. In Proceedings of the Seventh Text Retrieval Conference (TREC-7). NIST Special Publication.Google ScholarGoogle Scholar

Index Terms

  1. Large-scale extraction and use of knowledge from text

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        K-CAP '09: Proceedings of the fifth international conference on Knowledge capture
        September 2009
        222 pages
        ISBN:9781605586588
        DOI:10.1145/1597735
        • General Chair:
        • Yolanda Gil,
        • Program Chair:
        • Natasha Noy

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 September 2009

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate55of198submissions,28%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader