research-article

ClausIE: clause-based open information extraction

Authors:
Luciano Del Corro

Max-Planck-Institute für Informatik, Saarbrücken, Germany

Max-Planck-Institute für Informatik, Saarbrücken, Germany
View Profile

,
Rainer Gemulla

Max-Planck-Institute für Informatik, Saarbrücken, Germany

Max-Planck-Institute für Informatik, Saarbrücken, Germany
View Profile

WWW '13: Proceedings of the 22nd international conference on World Wide WebMay 2013Pages 355–366https://doi.org/10.1145/2488388.2488420

Published:13 May 2013Publication History

WWW '13: Proceedings of the 22nd international conference on World Wide Web

Pages 355–366

ABSTRACT

We propose ClausIE, a novel, clause-based approach to open information extraction, which extracts relations and their arguments from natural language text. ClausIE fundamentally differs from previous approaches in that it separates the detection of ``useful'' pieces of information expressed in a sentence from their representation in terms of extractions. In more detail, ClausIE exploits linguistic knowledge about the grammar of the English language to first detect clauses in an input sentence and to subsequently identify the type of each clause according to the grammatical function of its constituents. Based on this information, ClausIE is able to generate high-precision extractions; the representation of these extractions can be flexibly customized to the underlying application. ClausIE is based on dependency parsing and a small set of domain-independent lexica, operates sentence by sentence without any post-processing, and requires no training data (whether labeled or unlabeled). Our experimental study on various real-world datasets suggests that ClausIE obtains higher recall and higher precision than existing approaches, both on high-quality text as well as on noisy text as found in the web.

References

Alan Akbik and Jürgen Broß. Wanderlust: Extracting Semantic Relations from Natural Language Text Using Dependency Grammar Patterns. In 1st Workshop on Semantic Search at 18th. WWWW Conference, 2009.Google Scholar
Alan Akbik and Alexander Löser. Kraken: N-ary facts in open information extraction. In Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction, pages 52--56, 2012. Google ScholarDigital Library
Michele Banko, Michael J Cafarella, Stephen Soderl, Matt Broadhead, and Oren Etzioni. Open information extraction from the web. In Proceedings of Conference on Artificial Intelligence, pages 2670--2676, 2007. Google ScholarDigital Library
Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr., and Tom M. Mitchell. Toward an architecture for never-ending language learning. In Proceedings of the Conference on Artificial Intelligence, 2010.Google Scholar
Janara Christensen, Mausam, Stephen Soderland, and Oren Etzioni. Semantic role labeling for open information extraction. In Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pages 52--60, 2010. Google ScholarDigital Library
Marie-Catherine de Marnee and Christopher D. Manning. Stanford typed dependencies manual.Google Scholar
Oren Etzioni, Anthony Fader, Janara Christensen, Stephen Soderland, and Mausam. Open information extraction: The second generation. In Proceedings of the Conference on Artificial Intelligence, pages 3--10, 2011. Google ScholarDigital Library
Richard J. Evans. Comparing methods for the syntactic simplification of sentences in information extraction. Literary and Linguistic Computing, 26(4):371--388, 2011.Google ScholarCross Ref
Anthony Fader, Stephen Soderland, and Oren Etzioni. Identifying relations for open information extraction. In Proceedings of the Conference of Empirical Methods in Natural Language Processing, 2011. Google ScholarDigital Library
Pablo Gamallo, Marcos Garcia, and Santiago Fernández-Lanza. Dependency-based open information extraction. In Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP, pages 10--18, 2012. Google ScholarDigital Library
Dan Klein and Christopher D. Manning. Accurate unlexicalized parsing. In Proceedings of Association of computational linguistics, pages 423--430, 2003. Google ScholarDigital Library
Thomas Lin, Mausam, and Oren Etzioni. No noun phrase left behind: detecting and typing unlinkable entities. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 893--903, 2012. Google ScholarDigital Library
Mausam, Michael Schmitz, Stephen Soderland, Robert Bart, and Oren Etzioni. Open language learning for information extraction. In Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 523--534, 2012. Google ScholarDigital Library
Ndapandula Nakashole, Gerhard Weikum, and Fabian Suchanek. PATTY: a taxonomy of relational patterns with semantic types. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2012. Google ScholarDigital Library
Randolph Quirk, Sidney Greenbaum, Geoffrey Leech, and Jan Svartvik. A Comprehensive Grammar of the English Language. Longman, 1985.Google Scholar
Evan Sandhaus. The New York Times Annotated Corpus, 2008.Google Scholar
Fabian M. Suchanek, Mauro Sozio, and Gerhard Weikum. Sofie: a self-organizing framework for information extraction. In Proceedings of WWW, pages 631--640, 2009. Google ScholarDigital Library
Petros Venetis, Alon Y. Halevy, Jayant Madhavan, Marius Pasca, Warren Shen, Fei Wu, Gengxin Miao, and Chung Wu. Recovering semantics of tables on the web. PVLDB, 4(9):528--538, 2011. Google ScholarDigital Library
Fei Wu and Daniel S. Weld. Open information extraction using wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 118--127, 2010. Google ScholarDigital Library
Amal zouaq. An overview of shallow and deep natural language processing for ontology learning. In W. Wong, W. Liu, and M. Bennamoun, editors, Ontology Learning and Knowledge Discovery Using the Web: Challenges and Recent Advances. 2011.Google Scholar

Index Terms

ClausIE: clause-based open information extraction
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Vietnamese Open Information Extraction
SoICT '17: Proceedings of the 8th International Symposium on Information and Communication Technology

Open information extraction (OIE) is the process to extract relations and their arguments automatically from textual documents without the need to restrict the search to predefined relations. In recent years, several OIE systems for the English language ...
Read More
Lexicon-Grammar based open information extraction from natural language sentences in Italian
Highlights
- An OIE approach for Italian language, based on verb behavior patterns.
- Verb ...
Abstract
In the last decade, the quantity of readily accessible text has grown rapidly and enormously, long exceeding the capacity of humans to read and understand it. One of the most interesting strategies proposed to fulfill this need is ...
Read More
CrossOIE: Cross-Lingual Classifier for Open Information Extraction
Computational Processing of the Portuguese Language
Abstract
Open information extraction (Open IE) is the task of extracting open-domain assertions from natural language sentences. Considering the low availability of datasets and tools for this task in languages other than English, recently it has been ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '13: Proceedings of the 22nd international conference on World Wide Web
May 2013
1628 pages
ISBN:9781450320351
DOI:10.1145/2488388
General Chairs:
Daniel Schwabe
PUC-Rio - Brazil
,
Virgílio Almeida
UFMG - Brazil
,
Hartmut Glaser
CGI.br - Brazil
,
Program Chairs:
Ricardo Baeza-Yates
Yahoo! Labs - Spain & Chile
,
Sue Moon
KAIST - South Korea
Copyright © 2013 Copyright is held by the International World Wide Web Conference Committee (IW3C2).
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 May 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
open information extraction
relation extraction
Qualifiers
- research-article
Conference

Acceptance Rates
WWW '13 Paper Acceptance Rate125of831submissions,15%Overall Acceptance Rate1,899of8,196submissions,23%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 220
  Total Citations
  View Citations
- 1,593
  Total Downloads
- Downloads (Last 12 months)72
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ClausIE: clause-based open information extraction

WWW '13: Proceedings of the 22nd international conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Vietnamese Open Information Extraction

Lexicon-Grammar based open information extraction from natural language sentences in Italian

CrossOIE: Cross-Lingual Classifier for Open Information Extraction