Conferences >Proceedings IEEE/WIC Internat...

Learning information extraction patterns from tabular Web pages without manual labelling

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

We describe a domain independent approach to automatically constructing information extraction patterns for semistructured Web pages. The approach was tested on three cor...Show More

Metadata

Abstract:

We describe a domain independent approach to automatically constructing information extraction patterns for semistructured Web pages. The approach was tested on three corpora containing a series of tabular Web sites from different domains and achieved a success rate of at least 80%. A significant strength of the system is that it can infer extraction patterns from a single training page and does not require any manual labeling of the training page.

Published in: Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003)

Date of Conference: 13-17 October 2003

Date Added to IEEE Xplore: 27 October 2003

Print ISBN:0-7695-1932-6

DOI: 10.1109/WI.2003.1241249

Conference Location: Halifax, NS, Canada