Reference Hub1
Script-Independent Text Segmentation from Document Images

Script-Independent Text Segmentation from Document Images

Parul Sahare, Jitendra V. Tembhurne, Mayur R. Parate, Tausif Diwan, Sanjay B. Dhok
Copyright: © 2022 |Volume: 13 |Issue: 1 |Pages: 21
ISSN: 1941-6237|EISSN: 1941-6245|EISBN13: 9781683180647|DOI: 10.4018/IJACI.313967
Cite Article Cite Article

MLA

Sahare, Parul, et al. "Script-Independent Text Segmentation from Document Images." IJACI vol.13, no.1 2022: pp.1-21. http://doi.org/10.4018/IJACI.313967

APA

Sahare, P., Tembhurne, J. V., Parate, M. R., Diwan, T., & Dhok, S. B. (2022). Script-Independent Text Segmentation from Document Images. International Journal of Ambient Computing and Intelligence (IJACI), 13(1), 1-21. http://doi.org/10.4018/IJACI.313967

Chicago

Sahare, Parul, et al. "Script-Independent Text Segmentation from Document Images," International Journal of Ambient Computing and Intelligence (IJACI) 13, no.1: 1-21. http://doi.org/10.4018/IJACI.313967

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

Document image analysis finds broad application in the digital world for the purpose of information retrieval. This includes optical character recognition (OCR), indexing of digital libraries, web image processing, etc. One of the important steps in this field is text segmentation. This segmentation becomes complicated for the documents containing text of uneven spacing and characters of varying font sizes. In this paper, script-independent text-line segmentation and word segmentation algorithms are presented. Fast marching method is used for text-line segmentation, whereas wavelet transform with connected components (CCs) labeling is used for word segmentation. Fast marching method is used as a region growing process that detects potential text-lines. For word segmentation, energy map is calculated using wavelet transform to create text-blocks. Both the proposed algorithms are evaluated on different databases containing documents of different scripts, where highest text-line and word segmentation accuracies of 98.9% and 99.1%, respectively, are obtained.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.