Abstract:
We describe a domain independent approach to automatically constructing information extraction patterns for semistructured Web pages. The approach was tested on three cor...Show MoreMetadata
Abstract:
We describe a domain independent approach to automatically constructing information extraction patterns for semistructured Web pages. The approach was tested on three corpora containing a series of tabular Web sites from different domains and achieved a success rate of at least 80%. A significant strength of the system is that it can infer extraction patterns from a single training page and does not require any manual labeling of the training page.
Date of Conference: 13-17 October 2003
Date Added to IEEE Xplore: 27 October 2003
Print ISBN:0-7695-1932-6