ABSTRACT
We propose ClausIE, a novel, clause-based approach to open information extraction, which extracts relations and their arguments from natural language text. ClausIE fundamentally differs from previous approaches in that it separates the detection of ``useful'' pieces of information expressed in a sentence from their representation in terms of extractions. In more detail, ClausIE exploits linguistic knowledge about the grammar of the English language to first detect clauses in an input sentence and to subsequently identify the type of each clause according to the grammatical function of its constituents. Based on this information, ClausIE is able to generate high-precision extractions; the representation of these extractions can be flexibly customized to the underlying application. ClausIE is based on dependency parsing and a small set of domain-independent lexica, operates sentence by sentence without any post-processing, and requires no training data (whether labeled or unlabeled). Our experimental study on various real-world datasets suggests that ClausIE obtains higher recall and higher precision than existing approaches, both on high-quality text as well as on noisy text as found in the web.
- Alan Akbik and Jürgen Broß. Wanderlust: Extracting Semantic Relations from Natural Language Text Using Dependency Grammar Patterns. In 1st Workshop on Semantic Search at 18th. WWWW Conference, 2009.Google Scholar
- Alan Akbik and Alexander Löser. Kraken: N-ary facts in open information extraction. In Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction, pages 52--56, 2012. Google ScholarDigital Library
- Michele Banko, Michael J Cafarella, Stephen Soderl, Matt Broadhead, and Oren Etzioni. Open information extraction from the web. In Proceedings of Conference on Artificial Intelligence, pages 2670--2676, 2007. Google ScholarDigital Library
- Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr., and Tom M. Mitchell. Toward an architecture for never-ending language learning. In Proceedings of the Conference on Artificial Intelligence, 2010.Google Scholar
- Janara Christensen, Mausam, Stephen Soderland, and Oren Etzioni. Semantic role labeling for open information extraction. In Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pages 52--60, 2010. Google ScholarDigital Library
- Marie-Catherine de Marnee and Christopher D. Manning. Stanford typed dependencies manual.Google Scholar
- Oren Etzioni, Anthony Fader, Janara Christensen, Stephen Soderland, and Mausam. Open information extraction: The second generation. In Proceedings of the Conference on Artificial Intelligence, pages 3--10, 2011. Google ScholarDigital Library
- Richard J. Evans. Comparing methods for the syntactic simplification of sentences in information extraction. Literary and Linguistic Computing, 26(4):371--388, 2011.Google ScholarCross Ref
- Anthony Fader, Stephen Soderland, and Oren Etzioni. Identifying relations for open information extraction. In Proceedings of the Conference of Empirical Methods in Natural Language Processing, 2011. Google ScholarDigital Library
- Pablo Gamallo, Marcos Garcia, and Santiago Fernández-Lanza. Dependency-based open information extraction. In Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP, pages 10--18, 2012. Google ScholarDigital Library
- Dan Klein and Christopher D. Manning. Accurate unlexicalized parsing. In Proceedings of Association of computational linguistics, pages 423--430, 2003. Google ScholarDigital Library
- Thomas Lin, Mausam, and Oren Etzioni. No noun phrase left behind: detecting and typing unlinkable entities. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 893--903, 2012. Google ScholarDigital Library
- Mausam, Michael Schmitz, Stephen Soderland, Robert Bart, and Oren Etzioni. Open language learning for information extraction. In Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 523--534, 2012. Google ScholarDigital Library
- Ndapandula Nakashole, Gerhard Weikum, and Fabian Suchanek. PATTY: a taxonomy of relational patterns with semantic types. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2012. Google ScholarDigital Library
- Randolph Quirk, Sidney Greenbaum, Geoffrey Leech, and Jan Svartvik. A Comprehensive Grammar of the English Language. Longman, 1985.Google Scholar
- Evan Sandhaus. The New York Times Annotated Corpus, 2008.Google Scholar
- Fabian M. Suchanek, Mauro Sozio, and Gerhard Weikum. Sofie: a self-organizing framework for information extraction. In Proceedings of WWW, pages 631--640, 2009. Google ScholarDigital Library
- Petros Venetis, Alon Y. Halevy, Jayant Madhavan, Marius Pasca, Warren Shen, Fei Wu, Gengxin Miao, and Chung Wu. Recovering semantics of tables on the web. PVLDB, 4(9):528--538, 2011. Google ScholarDigital Library
- Fei Wu and Daniel S. Weld. Open information extraction using wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 118--127, 2010. Google ScholarDigital Library
- Amal zouaq. An overview of shallow and deep natural language processing for ontology learning. In W. Wong, W. Liu, and M. Bennamoun, editors, Ontology Learning and Knowledge Discovery Using the Web: Challenges and Recent Advances. 2011.Google Scholar
Index Terms
- ClausIE: clause-based open information extraction
Recommendations
Vietnamese Open Information Extraction
SoICT '17: Proceedings of the 8th International Symposium on Information and Communication TechnologyOpen information extraction (OIE) is the process to extract relations and their arguments automatically from textual documents without the need to restrict the search to predefined relations. In recent years, several OIE systems for the English language ...
Lexicon-Grammar based open information extraction from natural language sentences in Italian
Highlights- An OIE approach for Italian language, based on verb behavior patterns.
- Verb ...
AbstractIn the last decade, the quantity of readily accessible text has grown rapidly and enormously, long exceeding the capacity of humans to read and understand it. One of the most interesting strategies proposed to fulfill this need is ...
CrossOIE: Cross-Lingual Classifier for Open Information Extraction
Computational Processing of the Portuguese LanguageAbstractOpen information extraction (Open IE) is the task of extracting open-domain assertions from natural language sentences. Considering the low availability of datasets and tools for this task in languages other than English, recently it has been ...
Comments