Abstract.
Document image processing is a crucial process in office automation and begins at the ‘OCR’ phase with difficulties in document ‘analysis’ and ‘understanding’. This paper presents a hybrid and comprehensive approach to document structure analysis. Hybrid in the sense that it makes use of layout (geometrical) as well as textual features of a given document. These features are the base for potential conditions which in turn are used to express fuzzy matched rules of an underlying rule base. Rules can be formulated based on features which might be observed within one specific layout object. However, rules can also express dependencies between different layout objects. In addition to its rule driven analysis, which allows an easy adaptation to specific domains with their specific logical objects, the system contains domain-independent markup algorithms for common objects (e.g., lists).
Similar content being viewed by others
Author information
Authors and Affiliations
Additional information
Received June 19, 2000 / Revised November 8, 2000
Rights and permissions
About this article
Cite this article
Klink, S., Kieninger, T. Rule-based document structure understanding with a fuzzy combination of layout and textual features. IJDAR 4, 18–26 (2001). https://doi.org/10.1007/PL00013570
Issue Date:
DOI: https://doi.org/10.1007/PL00013570