Paper
23 March 1994 Tabular document recognition
M. Armon Rahgozar, Zhigang Fan, Emil V. Rainero
Author Affiliations +
Proceedings Volume 2181, Document Recognition; (1994) https://doi.org/10.1117/12.171096
Event: IS&T/SPIE 1994 International Symposium on Electronic Imaging: Science and Technology, 1994, San Jose, CA, United States
Abstract
In this paper, we propose an efficient algorithm for recognizing the grid structure within a tabular document. The algorithm has two parts: first a row labeling algorithm groups similar rows into clusters then, a column labeling algorithm identifies the column structure within each cluster. Each column structure is identified by a set of column separation intervals that are computed from the intervals representing the extent of the white spacing between consecutive word fragments. We formally describe a method for finding column separation intervals based on word fragment separation intervals. This method is based on constructing a closure of a set of line intervals under the operation of line intersection. The closure is maintained dynamically in a data structure which facilitates easy access to the elements within the closure. This technique is computationally less expensive than projection and search at the pixel level since word fragment acquisition is already required for document recognition applications.
© (1994) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
M. Armon Rahgozar, Zhigang Fan, and Emil V. Rainero "Tabular document recognition", Proc. SPIE 2181, Document Recognition, (23 March 1994); https://doi.org/10.1117/12.171096
Lens.org Logo
CITATIONS
Cited by 8 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Detection and tracking algorithms

Raster graphics

Optical character recognition

Computing systems

Data storage

Structural design

Visualization

Back to Top