Learning information extraction patterns from tabular Web pages without manual labelling | IEEE Conference Publication | IEEE Xplore

Learning information extraction patterns from tabular Web pages without manual labelling


Abstract:

We describe a domain independent approach to automatically constructing information extraction patterns for semistructured Web pages. The approach was tested on three cor...Show More

Abstract:

We describe a domain independent approach to automatically constructing information extraction patterns for semistructured Web pages. The approach was tested on three corpora containing a series of tabular Web sites from different domains and achieved a success rate of at least 80%. A significant strength of the system is that it can infer extraction patterns from a single training page and does not require any manual labeling of the training page.
Date of Conference: 13-17 October 2003
Date Added to IEEE Xplore: 27 October 2003
Print ISBN:0-7695-1932-6
Conference Location: Halifax, NS, Canada

Contact IEEE to Subscribe

References

References is not available for this document.