Sub Node Extraction with Tree Based Wrappers

Raeymaekers, Stefan; Bruynooghe, Maurice

doi:10.3233/978-1-58603-891-5-137

Abstract

String based as well as tree based methods have been used to learn wrappers for extraction from semi-structured documents (e.g., HTML documents). Previous work has shown that tree based approaches perform better while needing less examples than string based approaches. A disadvantage is that they can only extract complete text nodes, whereas string based approaches can extract within text nodes. This paper proposes a hybrid approach that combines the advantages of both systems and compares it experimentally with a string based approach on some sub node extraction tasks.

Contact

IOS Press Copyright 2024

Contact

IOS Press Copyright 2024

This website uses cookies

This website uses cookies