Paper
28 January 2008 Extracting curved text lines using the chain composition and the expanded grouping method
Author Affiliations +
Proceedings Volume 6815, Document Recognition and Retrieval XV; 68150U (2008) https://doi.org/10.1117/12.766057
Event: Electronic Imaging, 2008, San Jose, California, United States
Abstract
In this paper, we present a method to extract the text lines in poorly structured documents. The text lines may have different orientations, considerably curved shapes, and there are possibly a few wide inter-word gaps in a text line. Those text lines can be found in posters, blocks of addresses, artistic documents. Our method is an expansion of the traditional perceptual grouping. We develop novel solutions to overcome the problems of insufficient seed points and varied orientations in a single line. In this paper, we assume that text lines consists of connected components, in which each connected components is a set of black pixels within a letter, or some touched letters. In our scheme, the connected components closer than an iteratively incremented threshold will be combined to make chains of connected components. Elongate chains are identified as the seed chains of lines. Then the seed chains are extended to the left and the right regarding the local orientations. The local orientations will be reevaluated at each side of the chains when it is extended. By this process, all text lines are finally constructed. The advantage of the proposed method over prior works in extraction of curved text lines is that this method can both deal with more than a specific language and extract text lines containing some wide inter-word gaps. The proposed method is good for extraction of the considerably curved text lines from logos and slogans in our experiment; 98% and 94% for the straight-line extraction and the curved-line extraction, respectively.
© (2008) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Nguyen Noi Bai, Kim Nam, and Youngjun Song "Extracting curved text lines using the chain composition and the expanded grouping method", Proc. SPIE 6815, Document Recognition and Retrieval XV, 68150U (28 January 2008); https://doi.org/10.1117/12.766057
Lens.org Logo
CITATIONS
Cited by 3 scholarly publications and 1 patent.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Error analysis

Image processing

Databases

Fuzzy logic

Communication engineering

Document image analysis

Electronic imaging

Back to Top