skip to main content
10.1145/2488388.2488420acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

ClausIE: clause-based open information extraction

Published:13 May 2013Publication History

ABSTRACT

We propose ClausIE, a novel, clause-based approach to open information extraction, which extracts relations and their arguments from natural language text. ClausIE fundamentally differs from previous approaches in that it separates the detection of ``useful'' pieces of information expressed in a sentence from their representation in terms of extractions. In more detail, ClausIE exploits linguistic knowledge about the grammar of the English language to first detect clauses in an input sentence and to subsequently identify the type of each clause according to the grammatical function of its constituents. Based on this information, ClausIE is able to generate high-precision extractions; the representation of these extractions can be flexibly customized to the underlying application. ClausIE is based on dependency parsing and a small set of domain-independent lexica, operates sentence by sentence without any post-processing, and requires no training data (whether labeled or unlabeled). Our experimental study on various real-world datasets suggests that ClausIE obtains higher recall and higher precision than existing approaches, both on high-quality text as well as on noisy text as found in the web.

References

  1. Alan Akbik and Jürgen Broß. Wanderlust: Extracting Semantic Relations from Natural Language Text Using Dependency Grammar Patterns. In 1st Workshop on Semantic Search at 18th. WWWW Conference, 2009.Google ScholarGoogle Scholar
  2. Alan Akbik and Alexander Löser. Kraken: N-ary facts in open information extraction. In Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction, pages 52--56, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Michele Banko, Michael J Cafarella, Stephen Soderl, Matt Broadhead, and Oren Etzioni. Open information extraction from the web. In Proceedings of Conference on Artificial Intelligence, pages 2670--2676, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr., and Tom M. Mitchell. Toward an architecture for never-ending language learning. In Proceedings of the Conference on Artificial Intelligence, 2010.Google ScholarGoogle Scholar
  5. Janara Christensen, Mausam, Stephen Soderland, and Oren Etzioni. Semantic role labeling for open information extraction. In Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pages 52--60, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Marie-Catherine de Marnee and Christopher D. Manning. Stanford typed dependencies manual.Google ScholarGoogle Scholar
  7. Oren Etzioni, Anthony Fader, Janara Christensen, Stephen Soderland, and Mausam. Open information extraction: The second generation. In Proceedings of the Conference on Artificial Intelligence, pages 3--10, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Richard J. Evans. Comparing methods for the syntactic simplification of sentences in information extraction. Literary and Linguistic Computing, 26(4):371--388, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  9. Anthony Fader, Stephen Soderland, and Oren Etzioni. Identifying relations for open information extraction. In Proceedings of the Conference of Empirical Methods in Natural Language Processing, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Pablo Gamallo, Marcos Garcia, and Santiago Fernández-Lanza. Dependency-based open information extraction. In Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP, pages 10--18, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Dan Klein and Christopher D. Manning. Accurate unlexicalized parsing. In Proceedings of Association of computational linguistics, pages 423--430, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Thomas Lin, Mausam, and Oren Etzioni. No noun phrase left behind: detecting and typing unlinkable entities. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 893--903, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Mausam, Michael Schmitz, Stephen Soderland, Robert Bart, and Oren Etzioni. Open language learning for information extraction. In Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 523--534, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ndapandula Nakashole, Gerhard Weikum, and Fabian Suchanek. PATTY: a taxonomy of relational patterns with semantic types. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Randolph Quirk, Sidney Greenbaum, Geoffrey Leech, and Jan Svartvik. A Comprehensive Grammar of the English Language. Longman, 1985.Google ScholarGoogle Scholar
  16. Evan Sandhaus. The New York Times Annotated Corpus, 2008.Google ScholarGoogle Scholar
  17. Fabian M. Suchanek, Mauro Sozio, and Gerhard Weikum. Sofie: a self-organizing framework for information extraction. In Proceedings of WWW, pages 631--640, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Petros Venetis, Alon Y. Halevy, Jayant Madhavan, Marius Pasca, Warren Shen, Fei Wu, Gengxin Miao, and Chung Wu. Recovering semantics of tables on the web. PVLDB, 4(9):528--538, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Fei Wu and Daniel S. Weld. Open information extraction using wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 118--127, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Amal zouaq. An overview of shallow and deep natural language processing for ontology learning. In W. Wong, W. Liu, and M. Bennamoun, editors, Ontology Learning and Knowledge Discovery Using the Web: Challenges and Recent Advances. 2011.Google ScholarGoogle Scholar

Index Terms

  1. ClausIE: clause-based open information extraction

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WWW '13: Proceedings of the 22nd international conference on World Wide Web
      May 2013
      1628 pages
      ISBN:9781450320351
      DOI:10.1145/2488388

      Copyright © 2013 Copyright is held by the International World Wide Web Conference Committee (IW3C2).

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 May 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      WWW '13 Paper Acceptance Rate125of831submissions,15%Overall Acceptance Rate1,899of8,196submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader