skip to main content
10.1145/3151509.3151527acmotherconferencesArticle/Chapter ViewAbstractPublication PageshipConference Proceedingsconference-collections
research-article

Training LSTM-RNN with Imperfect Transcription: Limitations and Outcomes

Published: 10 November 2017 Publication History

Abstract

Bidirectional LSTM-RNN have become one of the standard methods for sequence learning, especially in the context of OCR due to its ability to process unsegmented data and its inherent statistical language modeling [5]. It has recently been shown that training LSTM-RNNs even with imperfect transcriptions can lead to improved transcription results [7, 14]. The statistical nature of the LSTM's inherent language modeling can compensate for some of the errors in the ground truth and learn the correct temporal relations. In this paper we systematically explore the limits of the LSTM's language modeling ability by comparing the impact of imperfect transcriptions with various hand crafted error types and real erroneous data created through segmentation and clustering. We show that training LSTM-RNN with imperfect transcriptions can produce useful OCR models even if the ground truth error is up to 20%. Further we show that it can compensate for some handcrafted error types with error rates of up to 40% almost perfectly.

References

[1]
[n. d.]. Kallimachos. ([n. d.]). http://kallimachos.de/kallimachos/index.php/Narragonien:Main
[2]
[n. d.]. OCRopus -- Open Source Document Analysis and OCR System. ([n. d.]). https://github.com/tmbdev/ocropy
[3]
Felix A. Gers, JÃijrgen Schmidhuber, and Fred Cummins. [n. d.]. Learning to Forget: Continual Prediction with LSTM. 12, 10 ([n. d.]), 2451--2471.
[4]
Alex Graves, Douglas Eck, Nicole Beringer, and Juergen Schmidhuber. [n. d.]. Biologically plausible speech recognition with LSTM neural nets. In International Workshop on Biologically Inspired Approaches to Advanced Information Technology (2004). 127--136.
[5]
Graves, Alex and FernÃąndez, Santiago and Gomez, Faustino J. and Schmidhuber, JÃijrgen. 2006. Connectionist temporal classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. In ICML'06.
[6]
Sepp Hochreiter and JÃijrgen Schmidhuber. [n. d.]. Long Short-Term Memory. ([n. d.]), 1735--1780.
[7]
Martin Jenckel, Saqib Syed Bukhari, and Andreas Dengel. 2016. anyOCR: A Sequence Learning Based OCR System for Unlabeled Historical Documents. In 23rd International Conference on Pattern Recognition (ICPR'16). Mexiko.
[8]
David Rolnick, Andreas Veit, Serge J. Belongie, and Nir Shavit. 2017. Deep Learning is Robust to Massive Label Noise. CoRR abs/1705.10694 (2017). http://arxiv.org/abs/1705.10694
[9]
Uwe Springmann, Florian Fink, and Klaus U. Schulz. 2016. Automatic quality evaluation and (semi-) automatic improvement of mixed models for OCR on historical documents. CoRR abs/1606.05157 (2016). http://arxiv.org/abs/1606.05157
[10]
Sainbayar Sukhbaatar and Rob Fergus. 2014. Learning from Noisy Labels with Deep Neural Networks. CoRR abs/1406.2080 (2014). http://arxiv.org/abs/1406.2080
[11]
T. M. Breuel, A. Ul-Hasan, M. Al Azawi, F. Shafait. 2013. High Performance OCR for Printed English and Fraktur using LSTM Networks. In ICDAR. Washington D.C. USA.
[12]
A. Ul-Hasan, S. S. Bukhari, and A. Dengel. 2016. Meaningless Text OCR Model for Medieval Scripts. In 2nd International Conference on Natural Sciences and Technology in Manuscript Analysis. Germany.
[13]
Adnan Ul-Hasan, Faisal Shafait, and Thomas Breuel. 2013. High-Performance OCR for Printed English and Fraktur using LSTM Networks. (08 2013).
[14]
Adnan UlHasan, Saqib Syed Bukhari, and Andreas Dengel. 2016. OCRoRACT: A Sequence Learning OCR System Trained on Isolated Characters. In The 12th IAPR Workshop on Document Analysis Systems (DAS'16). Greece, 174--179.
[15]
M. R Yousefi, M. R. Soheili, T. M. Breuel, and D. Stricker. 2015. A Comparison of 1D and 2D LSTM Architectures for Recognition of Handwritten Arabic. In DRR-XXI. USA.

Cited By

View all
  • (2024)An intrusion detection algorithm based on joint symmetric uncertainty and hyperparameter optimized fusion neural networkExpert Systems with Applications10.1016/j.eswa.2023.123014244(123014)Online publication date: Jun-2024
  • (2023)Network intrusion detection based on the temporal convolutional modelComputers & Security10.1016/j.cose.2023.103465(103465)Online publication date: Sep-2023
  • (2022)Predicting Discharges in Sewer Pipes Using an Integrated Long Short-Term Memory and Entropy A-TOPSIS Modeling FrameworkWater10.3390/w1403030014:3(300)Online publication date: 19-Jan-2022
  • Show More Cited By

Index Terms

  1. Training LSTM-RNN with Imperfect Transcription: Limitations and Outcomes

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    HIP '17: Proceedings of the 4th International Workshop on Historical Document Imaging and Processing
    November 2017
    129 pages
    ISBN:9781450353908
    DOI:10.1145/3151509
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • FamilySearch: FamilySearch

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 November 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Historical Document Image Processing
    2. Imperfect Transcription
    3. LSTM
    4. OCR
    5. RNN

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    HIP2017

    Acceptance Rates

    HIP '17 Paper Acceptance Rate 19 of 33 submissions, 58%;
    Overall Acceptance Rate 52 of 90 submissions, 58%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)52
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)An intrusion detection algorithm based on joint symmetric uncertainty and hyperparameter optimized fusion neural networkExpert Systems with Applications10.1016/j.eswa.2023.123014244(123014)Online publication date: Jun-2024
    • (2023)Network intrusion detection based on the temporal convolutional modelComputers & Security10.1016/j.cose.2023.103465(103465)Online publication date: Sep-2023
    • (2022)Predicting Discharges in Sewer Pipes Using an Integrated Long Short-Term Memory and Entropy A-TOPSIS Modeling FrameworkWater10.3390/w1403030014:3(300)Online publication date: 19-Jan-2022
    • (2021)Unified Deep Learning approach for Efficient Intrusion Detection System using Integrated Spatial–Temporal FeaturesKnowledge-Based Systems10.1016/j.knosys.2021.107132226(107132)Online publication date: Aug-2021
    • (2019) C k LR Algorithm for Improvement of Data Prediction and Accuracy Based on Clustering Data International Journal of Software Engineering and Knowledge Engineering10.1142/S021819401940001129:05(631-652)Online publication date: 20-May-2019
    • (2018)Deep Word Association: A Flexible Chinese Word Association Method with Iterative Attention MechanismPattern Recognition and Computer Vision10.1007/978-3-030-03338-5_10(112-123)Online publication date: 3-Nov-2018

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media