Using Scale-Space Anisotropic Smoothing for Text Line Extraction in Historical Documents

Cohen, Rafi; Dinstein, Itshak; El-Sana, Jihad; Kedem, Klara

doi:10.1007/978-3-319-11758-4_38

Rafi Cohen¹⁷,
Itshak Dinstein¹⁸,
Jihad El-Sana¹⁷ &
…
Klara Kedem¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8814))

Included in the following conference series:

International Conference Image Analysis and Recognition

2164 Accesses
10 Citations

Abstract

Text line extraction is vital pre-requisite for various document processing tasks. This paper presents a novel approach for text line extraction which is based on Gaussian scale space and dedicated binarization that utilize the inherent structure of smoothed text document images. It enhances the text lines in the image using multi-scale anisotropic second derivative of Gaussian filter bank at the average height of the text line. It then applies a binarization, which is based on component-tree and is tailored towards line extraction. The final stage of the algorithm is based on an energy minimization framework for removing spurious text line and assigning connected components to lines. We have tested our approach on various datasets written in different languages at range of image quality and received high detection rates, which outperform state-of-the-art algorithms. Our MATLAB code is publicly available. (http://www.cs.bgu.ac.il/~rafico/LineExtraction.zip)

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baechler, M., Liwicki, M., Ingold, R.: Text line extraction using dmlp classifiers for historical manuscripts. In: ICDAR, pp. 1029–1033 (2013)
Google Scholar
Bar-Yosef, I., Hagbi, N., Kedem, K., Dinstein, I.: Text Line Segmentation for Degraded Handwritten Historical Documents. In: ICDAR, pp. 1161–1165 (2009)
Google Scholar
Biller, O., Kedem, K., Dinstein, I., El-Sana, J.: Evolution maps for connected components in text documents. In: ICFHR, pp. 403–408 (2012)
Google Scholar
Bukhari, S.S., Shafait, F., Breuel, T.M.: Script-independent handwritten textlines segmentation using active contours. In: ICDAR, pp. 446–450 (2009)
Google Scholar
Cohen, R., Asi, A., Kedem, K., El-Sana, J., Dinstein, I.: Robust text and drawing segmentation algorithm for historical documents. In: HIP, pp. 110–117 (2013)
Google Scholar
Delong, A., Osokin, A., Isack, H.N., Boykov, Y.: Fast approximate energy minimization with label costs. IJCV 96(1), 1–27 (2012)
Article MathSciNet MATH Google Scholar
Diem, M., Kleber, F., Sablatnig, R.: Text line detection for heterogeneous documents. In: ICDAR, pp. 743–747 (2013)
Google Scholar
Gatos, B., Stamatopoulos, N., Louloudis, G.: ICDAR2009 handwriting segmentation contest. IJDAR 14(1), 25–33 (2011)
Article Google Scholar
Gatos, B., Stamatopoulos, N., Louloudis, G.: ICFHR 2010 handwriting segmentation contest. In: ICFHR, pp. 737–742 (2010)
Google Scholar
Li, Y., Zheng, Y., Doermann, D., Jaeger, S.: Script-independent text line segmentation in freestyle handwritten documents. IEEE TPAMI 30(8), 1313–1329 (2008)
Article Google Scholar
Lindeberg, T.: Feature detection with automatic scale selection. IJCV 30(2), 79–116 (1998)
Article Google Scholar
Naegel, B., Wendling, L.: A document binarization method based on connected operators. Pattern Recognition Letters 31(11), 1251–1259 (2010)
Article Google Scholar
Rabaev, I., Biller, O., El-Sana, J., Kedem, K., Dinstein, I.: Text line detection in corrupted and damaged historical manuscripts. In: ICDAR, pp. 812–816 (2013)
Google Scholar
Saabni, R., Asi, A., El-Sana, J.: Text line extraction for historical document images. Pattern Recognition Letters 35, 23–33 (2014)
Article Google Scholar
Shi, Z., Setlur, S., Govindaraju, V.: A steerable directional local profile technique for extraction of handwritten arabic text lines. In: ICDAR, pp. 176–180 (2009)
Google Scholar
Stamatopoulos, N., Gatos, B., Louloudis, G., Pal, U., Alaei, A.: ICDAR 2013 handwriting segmentation contest. In: ICDAR, pp. 1402–1406 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Ben-Gurion University, Beer-Sheva, Israel
Rafi Cohen, Jihad El-Sana & Klara Kedem
Department of Electrical and Computer Engineering, Ben-Gurion University, Beer-Sheva, Israel
Itshak Dinstein

Authors

Rafi Cohen
View author publications
You can also search for this author in PubMed Google Scholar
Itshak Dinstein
View author publications
You can also search for this author in PubMed Google Scholar
Jihad El-Sana
View author publications
You can also search for this author in PubMed Google Scholar
Klara Kedem
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rafi Cohen .

Editor information

Editors and Affiliations

Faculty of Engineering, University of Porto, Porto, Portugal
Aurélio Campilho
Dept. of Electrical and Computer Eng., University of Waterloo, Waterloo, Ontario, Canada
Mohamed Kamel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cohen, R., Dinstein, I., El-Sana, J., Kedem, K. (2014). Using Scale-Space Anisotropic Smoothing for Text Line Extraction in Historical Documents. In: Campilho, A., Kamel, M. (eds) Image Analysis and Recognition. ICIAR 2014. Lecture Notes in Computer Science(), vol 8814. Springer, Cham. https://doi.org/10.1007/978-3-319-11758-4_38

Download citation

DOI: https://doi.org/10.1007/978-3-319-11758-4_38
Published: 10 October 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11757-7
Online ISBN: 978-3-319-11758-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics