skip to main content
10.1145/2809544.2809561acmotherconferencesArticle/Chapter ViewAbstractPublication PageshipConference Proceedingsconference-collections
research-article

Document Image Binarization using LSTM: A Sequence Learning Approach

Published: 22 August 2015 Publication History

Abstract

We propose to address the problem of Document Image Binarization (DIB) using Long Short-Term Memory (LSTM) which is specialized in processing very long sequences. Thus, the image is considered as a 2D sequence of pixels and in accordance to this a 2D LSTM is employed for the classification of each pixel as text or background. The proposed approach processes the information using local context and then propagates the information globally in order to achieve better visual coherence. The method is robust against most of the document artifacts. We show that with a very simple network without any feature extraction and with limited amount of data the proposed approach works reasonably well for the DIBCO 2013 dataset. Furthermore a synthetic dataset is considered to measure the performance of the proposed approach with both binarization and OCR groundtruth. The proposed approach significantly outperforms standard binarization approaches both for F-Measure and OCR accuracy with the availability of enough training samples.

References

[1]
M. Z. Afzal, M. Krämer, S. S. Bukhari, M. R. Yousefi, F. Shafait, and T. M. Breuel. Robust binarization of stereo and monocular document images using percentile filter. In C CBDAR 2013, Washington, DC, USA, August 23, 2013, pages 139--149, 2013.
[2]
N. Babaguchi and K. Yamada. Connectionist model binarization. In Pattern Recognition, pages 51--56, 1990.
[3]
E. Badekas, N. A. Nikolaou, and N. Papamarkos. Text localization and binarization in complex color documents. In MLDM Posters, pages 1--15, 2007.
[4]
R. Chamchong and C. Fung. Optimal selection of binarization techniques for the processing of ancient palm leaf manuscripts. Systems Man and Cybernetics (SMC), pages 3796--3800, 2010.
[5]
C.-H. Chou, W.-H. Lin, and F. Chang. A binarization method with learning-built rules for document images produced by cameras. Pattern Recognition, 43(4):1518--1530, Apr. 2010.
[6]
B. Gatos, K. Ntirogiannis, and I. Pratikakis. DIBCO 2009: document image binarization contest. IJDAR, 14(1):35--44, 2011.
[7]
A. Graves, M. Liwicki, S. Fernández, R. Bertolami, H. Bunke, and J. Schmidhuber. A novel connectionist system for unconstrained handwriting recognition. IEEE transactions on pattern analysis and machine intelligence, 31(5):855--68, May 2009.
[8]
A. Graves and J. Schmidhuber. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural networks: the official journal of the International Neural Network Society, 18(5--6):602--10, 2005.
[9]
A. Graves and J. Schmidhuber. Offline handwriting recognition with multidimensional recurrent neural networks. In Advances in Neural Information Processing Systems, pages 545--552, 2009.
[10]
J. Kuk, N. Cho, and K. Lee. MAP-MRF approach for binarization of degraded document image. Image Processing, 2008. ICIP 2008, pages 2612--2615, 2008.
[11]
T. Lelore and F. Bouchara. Document Image Binarisation Using Markov Field Model. In 2009 10th International Conference on Document Analysis and Recognition, number 2, pages 551--555. Ieee, 2009.
[12]
G. Nagy. Twenty years of document image analysis in pami. IEEE Trans. Pattern Anal. Mach. Intell., 22(1):38--62, Jan. 2000.
[13]
W. Niblack. An Introduction to digital image processing. Prentice-Hall, Englewood Cliffs, NJ, 1986.
[14]
K. Ntirogiannis, B. Gatos, and I. Pratikakis. A combined approach for the binarization of handwritten document images. Pattern Recognition Letters, 35(0):3--15, 2014. Frontiers in Handwriting Processing.
[15]
H. Orii, H. Kawano, H. Maeda, and N. Ikoma. Text-color-independent binarization for degraded document image based on map-mrf approach. IEICE Transactions, 94-A(11):2342--2349, 2011.
[16]
N. Otsu. A threshold selection method from gray-level histograms. Automatica, C(1):62--66, 1975.
[17]
I. Pratikakis, B. Gatos, and K. Ntirogiannis. H-DIBCO 2010 - handwritten document image binarization competition. In ICFHR 2010, Kolkata, India, 16--18 November 2010, pages 727--732, 2010.
[18]
I. Pratikakis, B. Gatos, and K. Ntirogiannis. Icdar 2011 document image binarization contest (dibco 2011). In Document Analysis and Recognition (ICDAR), 2011 International Conference on, pages 1506 --1510, sept. 2011.
[19]
I. Pratikakis, B. Gatos, and K. Ntirogiannis. ICFHR 2012 competition on handwritten document image binarization (h-dibco 2012). In ICFHR, pages 817--822, 2012.
[20]
I. Pratikakis, B. Gatos, and K. Ntirogiannis. ICDAR 2013 document image binarization contest (dibco 2013). In ICDAR, pages 1471--1476, 2013.
[21]
T. Sari, A. Kefali, and H. Bahi. An MLP for binarizing images of old manuscripts. Frontiers in Handwriting Recognition, pages 247--251, 2012.
[22]
J. Sauvola and M. Pietikäinen. Adaptive Document Image Binarization. Pattern Recognition, 33:225--236, 2000.
[23]
J. Schmidhuber, D. Wierstra, and F. J. Gomez. Evolino: Hybrid Neuroevolution / Optimal Linear Search for Sequence Learning Recurrent Neural Network. In 19th International Joint Conference on Artificial Intelligence (IJCAI), pages 853--858, 2005.
[24]
H. Tanaka. Threshold Correction of Document Image Binarization for Ruled-line Extraction. 2009 10th International Conference on Document Analysis and Recognition, pages 541--545, 2009.
[25]
O. Trier and A. Jain. Goal-directed evaluation of binarization methods. Pattern Analysis and Machine Intelligence, ..., 17(12):1191--1201, 1995.
[26]
C.-M. Tsai and H.-J. Lee. Binarization of color document images via luminance and saturation color features. IEEE Transactions on Image Processing, 11(4):434--451, 2002.
[27]
S. Wu and A. Amin. Automatic thresholding of gray-level using multistage approach. In Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on, pages 493--497 vol.1, 2003.
[28]
F. Yang, Z. Ma, and M. Xie. A novel binarization approach for license plate. 2006 1ST IEEE Conference on Industrial Electronics and Applications, pages 1--4, May 2006.

Cited By

View all
  • (2024)Research and application of methods to prevent external force damage to underground cables based on the IoTProceedings of the 2024 International Conference on Power Electronics and Artificial Intelligence10.1145/3674225.3674244(99-103)Online publication date: 19-Jan-2024
  • (2024)Applicability of OCR Engines for Text Recognition in Vehicle Number Plates, Receipts and HandwritingJournal of Circuits, Systems and Computers10.1142/S021812662350321832:18Online publication date: 2-Feb-2024
  • (2024)Research on Document Image Binarization: A Survey2024 IEEE 7th International Conference on Electronic Information and Communication Technology (ICEICT)10.1109/ICEICT61637.2024.10671186(457-462)Online publication date: 31-Jul-2024
  • Show More Cited By
  1. Document Image Binarization using LSTM: A Sequence Learning Approach

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    HIP '15: Proceedings of the 3rd International Workshop on Historical Document Imaging and Processing
    August 2015
    155 pages
    ISBN:9781450336024
    DOI:10.1145/2809544
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • FamilySearch: FamilySearch

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 August 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Document Image Binarization
    2. Long Short Term Memory
    3. Neural Network
    4. Optical Character Recognition

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    HIP '15

    Acceptance Rates

    Overall Acceptance Rate 52 of 90 submissions, 58%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Research and application of methods to prevent external force damage to underground cables based on the IoTProceedings of the 2024 International Conference on Power Electronics and Artificial Intelligence10.1145/3674225.3674244(99-103)Online publication date: 19-Jan-2024
    • (2024)Applicability of OCR Engines for Text Recognition in Vehicle Number Plates, Receipts and HandwritingJournal of Circuits, Systems and Computers10.1142/S021812662350321832:18Online publication date: 2-Feb-2024
    • (2024)Research on Document Image Binarization: A Survey2024 IEEE 7th International Conference on Electronic Information and Communication Technology (ICEICT)10.1109/ICEICT61637.2024.10671186(457-462)Online publication date: 31-Jul-2024
    • (2023)Historical Text Image Enhancement Using Image Scaling and Generative Adversarial NetworksSensors10.3390/s2308400323:8(4003)Online publication date: 14-Apr-2023
    • (2023)TransDocUNet: A Transformer-based UNet Architecture for Degraded Document Image BinarizationProceedings of the Fourteenth Indian Conference on Computer Vision, Graphics and Image Processing10.1145/3627631.3627639(1-9)Online publication date: 15-Dec-2023
    • (2023)A hybrid CNN-Transformer model for Historical Document Image BinarizationProceedings of the 7th International Workshop on Historical Document Imaging and Processing10.1145/3604951.3605508(79-84)Online publication date: 25-Aug-2023
    • (2023)A Novel Degraded Document Binarization Model through Vision Transformer NetworkInformation Fusion10.1016/j.inffus.2022.12.01193(159-173)Online publication date: May-2023
    • (2023)Scene text understanding: recapitulating the past decadeArtificial Intelligence Review10.1007/s10462-023-10530-356:12(15301-15373)Online publication date: 18-Jun-2023
    • (2023)Document Image BinarizationDocument Layout Analysis10.1007/978-981-99-4277-0_2(11-30)Online publication date: 1-Aug-2023
    • (2023)Test-Time Augmentation for Document Image BinarizationPattern Recognition and Image Analysis10.1007/978-3-031-36616-1_13(158-169)Online publication date: 25-Jun-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media