Self-supervised relation extraction from the Web

Rozenfeld, Benjamin; Feldman, Ronen

doi:10.1007/s10115-007-0110-6

Self-supervised relation extraction from the Web

Regular Paper
Published: 20 November 2007

Volume 17, pages 17–33, (2008)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Benjamin Rozenfeld¹ &
Ronen Feldman¹

311 Accesses
23 Citations
Explore all metrics

Abstract

Web extraction systems attempt to use the immense amount of unlabeled text in the Web in order to create large lists of entities and relations. Unlike traditional Information Extraction methods, the Web extraction systems do not label every mention of the target entity or relation, instead focusing on extracting as many different instances as possible while keeping the precision of the resulting list reasonably high. SRES is a self-supervised Web relation extraction system that learns powerful extraction patterns from unlabeled text, using short descriptions of the target relations and their attributes. SRES automatically generates the training data needed for its pattern-learning component. The performance of SRES is further enhanced by classifying its output instances using the properties of the instances and the patterns. The features we use for classification and the trained classification model are independent from the target relation, which we demonstrate in a series of experiments. We also compare the performance of SRES to the performance of the state-of-the-art KnowItAll system, and to the performance of its pattern learning component, which learns simpler pattern language than SRES.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agichtein E, Gravano L (2000) Snowball: extracting relations from large plain-text collections. In: Proceedings of the 5th ACM international conference on digital libraries (DL)
Brin S (1998) Extracting patterns and relations from the World Wide Web. In: WebDB workshop at 6th international conference on extending database technology, EDBT’98, Valencia
Chen J, Ji D et al (2005) Unsupervised feature selection for relation extraction IJCNLP-05, Jeju Island
Ciravegna F (2001) Adaptive information extraction from text by rule induction and generalization. In: Proceedings of the 17th IJCAI, Seattle
Cowie J and Lehnert W (1996). Information extraction. Commun Assoc Comput Mach 39(1): 80–91
Google Scholar
Downey D, Etzioni O et al (2004) Learning text patterns for web information extraction and assessment (extended version). Technical Report UW-CSE-04-05-01
Etzioni O and Cafarella M et al (2005). Unsupervised named-entity extraction from the Web: an experimental study. Artif Intell 165(1): 91–134
Article Google Scholar
Feldman R and Rosenfeld B et al (2006). TEG—a hybrid approach to information extraction. Knowl Inf Syst 9(1): 1–18
Article Google Scholar
Freitag D (1998) Machine learning for information extraction in informal domains. Computer Science Department, Carnegie Mellon University, Pittsburgh p 188
Freitag D, McCallum AK (1999) Information extraction with HMMs and shrinkage. In: Proceedings of the AAAI-99 workshop on machine learning for information extraction
Genkin A, Lewis DD et al (2004) Large-scale bayesian logistic regression for text categorization. DIMACS, New Brunswick pp 1–41
Grishman R (1996) The role of syntax in information extraction. In: Advances in Text Processing: Tipster Program Phase II. Morgan Kaufmann
Grishman R (1997) Information extraction: techniques and challenges. SCIE: 10–27
Hasegawa T, Sekine S et al (2004) Discovering relations among named entities from large corpora. ACL 2004
Kushmerick N and Weld DS et al (1997). Wrapper induction for information extraction. IJCAI 97: 729–737
Google Scholar
Li Z and Ng WK et al (2005). Web data extraction based on structural similarity. Knowl Inf Syst 8(4): 438–461
Article MathSciNet Google Scholar
Miller G (1990). WordNet: an on-line lexical database. Int J Lexicogr 3(4): 235–312
Article Google Scholar
Ravichandran D, Hovy E (2002) Learning surface text patterns for a question answering system. 40th ACL Conference
Riloff E (1993) Automatically constructing a dictionary for information extraction tasks. AAAI-93
Riloff E, Jones R (1999) Learning dictionaries for information extraction by multi-level boot-strapping. AAAI-99
Soderland S (1999). Learning information extraction rules for semi-structured and free text. Mach Learn 34(1–3): 233–272
Article MATH Google Scholar
Wong T-L, Lam W (2007) Learning to extract and summarize hot item features from multiple auction web sites. Knowl Inf Syst

Download references

Author information

Authors and Affiliations

Information Systems, HU School of Business Administration, Hebrew University, Jerusalem, Israel
Benjamin Rozenfeld & Ronen Feldman

Authors

Benjamin Rozenfeld
View author publications
You can also search for this author inPubMed Google Scholar
Ronen Feldman
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Ronen Feldman.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rozenfeld, B., Feldman, R. Self-supervised relation extraction from the Web. Knowl Inf Syst 17, 17–33 (2008). https://doi.org/10.1007/s10115-007-0110-6

Download citation

Received: 17 January 2007
Revised: 05 July 2007
Accepted: 08 September 2007
Published: 20 November 2007
Issue Date: October 2008
DOI: https://doi.org/10.1007/s10115-007-0110-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Self-supervised relation extraction from the Web

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Classifier-Based Pattern Selection Approach for Relation Instance Extraction

Weakly Supervised, Data-Driven Acquisition of Rules for Open Information Extraction

On Extracting Relations Using Distributional Semantics and a Tree Generalization

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Self-supervised relation extraction from the Web

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Classifier-Based Pattern Selection Approach for Relation Instance Extraction

Weakly Supervised, Data-Driven Acquisition of Rules for Open Information Extraction

On Extracting Relations Using Distributional Semantics and a Tree Generalization

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now