skip to main content
10.1145/2432553.2432555acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdarConference Proceedingsconference-collections
research-article

Automatic localization and correction of line segmentation errors

Published: 16 December 2012 Publication History

Abstract

Text line segmentation is a basic step in any OCR system. Its failure deteriorates the performance of OCR engines. This is especially true for the Indian languages due to the nature of scripts. Many segmentation algorithms are proposed in literature. Often these algorithms fail to adapt dynamically to a given page and thus tend to yield poor segmentation for some specific regions or some specific pages. In this work we design a text line segmentation post processor which automatically localizes and corrects the segmentation errors. The proposed segmentation post processor, which works in a "learning by examples" framework, is not only independent to segmentation algorithms but also robust to the diversity of scanned pages.
We show over 5% improvement in text line segmentation on a large dataset of scanned pages for multiple Indian languages.

References

[1]
M. Agrawal and D. S. Doermann. Voronoi++: A dynamic page segmentation approach based on voronoi and docstrum features. In ICDAR, 2009.
[2]
A. Antonacopoulos, D. Bridson, and B. Gatos. Page segmentation competition. In ICDAR, 2005.
[3]
A. Antonacopoulos, B. Gatos, and D. Bridson. Page segmentation competition. In ICDAR, 2007.
[4]
A. Antonacopoulos, S. Pletschacher, D. Bridson, and C. Papadopoulos. Icdar 2009 page segmentation competition. In ICDAR, 2009.
[5]
H. S. Baird, S. E. jones, and S. J. Fortune. Image segmentation by shape-directed covers. In ICPR, 1990.
[6]
V. Govindaraju and S. Setlur. Guide to OCR for Indic Scripts. Springer, 2009.
[7]
C. V. Jawahar and A. Kumar. Content-level annotation of large collection of printed document images. In ICDAR, 2007.
[8]
K. Kise, A. Sato, and M. Iwata. Segmentation of page images using the area voronoi diagram. CVIU, 1998.
[9]
V. K. Koppula and A. Negi. Fringe map based text line segmentation of printed telugu document images. In ICDAR, 2011.
[10]
K. S. S. Kumar, S. Kumar, and C. V. Jawahar. On segmentation of documents in complex scripts. In ICDAR, 2007.
[11]
http://www.leptonica.com/.
[12]
D. Mundhra, A. Mishra, and C. V. Jawahar. Automatic localization of page segmentation errors. In J-MOCR-AND (ICDAR Workshop), 2011.
[13]
G. Nagy, S. C. Seth, and M. Viswanathan. A prototype document image analysis system for technical journals. IEEE Computer, 1992.
[14]
L. O'Gorman. The document spectrum for page layout analysis. IEEE TPAMI, 1993.
[15]
F. Shafait, D. Keysers, and T. M. Breuel. Performance evaluation and benchmarking of six-page segmentation algorithms. IEEE TPAMI, 2008.
[16]
K. Y. Wong, R. G. Casey, and F. M. Wahl. Document analysis system. IBM Journal of research and development, 1982.
[17]
H. Zhang, J. E. Fritts, and S. A. Goldman. Image segmentation evaluation: A survey of unsupervised methods. CVIU, 2008.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
DAR '12: Proceeding of the workshop on Document Analysis and Recognition
December 2012
162 pages
ISBN:9781450317979
DOI:10.1145/2432553
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 December 2012

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

DAR '12

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 90
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media