Abstract:
Extracting loosely structured data records (DRs) has wide applications in many domains, such as forum pattern recognition, blog data analysis, and books and news review a...Show MoreMetadata
Abstract:
Extracting loosely structured data records (DRs) has wide applications in many domains, such as forum pattern recognition, blog data analysis, and books and news review analysis. Currently existing methods work well for strongly structured DRs only. In this paper, we address the problem of extracting loosely structured DRs through mining strict patterns. In our method, we utilize both content feature and tag tree feature to recognize the loosely structured DRs, and propose a new approach to extract the DRs automatically. Through experimental study we demonstrate that this method is both effective and robust in practice.
Date of Conference: 07-12 April 2008
Date Added to IEEE Xplore: 25 April 2008
ISBN Information: