Paper
24 January 2011 Improved document image segmentation algorithm using multiresolution morphology
Author Affiliations +
Proceedings Volume 7874, Document Recognition and Retrieval XVIII; 78740D (2011) https://doi.org/10.1117/12.873461
Event: IS&T/SPIE Electronic Imaging, 2011, San Francisco Airport, California, United States
Abstract
Page segmentation into text and non-text elements is an essential preprocessing step before optical character recognition (OCR) operation. In case of poor segmentation, an OCR classification engine produces garbage characters due to the presence of non-text elements. This paper describes modifications to the text/non-text segmentation algorithm presented by Bloomberg,1 which is also available in his open-source Leptonica library.2The modifications result in significant improvements and achieved better segmentation accuracy than the original algorithm for UW-III, UNLV, ICDAR 2009 page segmentation competition test images and circuit diagram datasets.
© (2011) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Syed Saqib Bukhari, Faisal Shafait, and Thomas M. Breuel "Improved document image segmentation algorithm using multiresolution morphology", Proc. SPIE 7874, Document Recognition and Retrieval XVIII, 78740D (24 January 2011); https://doi.org/10.1117/12.873461
Lens.org Logo
CITATIONS
Cited by 48 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Image segmentation

Image processing algorithms and systems

Halftones

Reconstruction algorithms

Image processing

Optical character recognition

Detection and tracking algorithms

RELATED CONTENT

Locally adaptive document skew detection
Proceedings of SPIE (April 03 1997)
Text segmentation for automatic document processing
Proceedings of SPIE (January 07 1999)
Segmenting Intersecting And Incomplete Boundaries
Proceedings of SPIE (March 29 1988)

Back to Top