research-article

Training LSTM-RNN with Imperfect Transcription: Limitations and Outcomes

Authors:

Martin Jenckel,

Syed Saqib Bukhari,

Andreas DengelAuthors Info & Claims

HIP '17: Proceedings of the 4th International Workshop on Historical Document Imaging and Processing

Pages 48 - 53

https://doi.org/10.1145/3151509.3151527

Published: 10 November 2017 Publication History

Abstract

Bidirectional LSTM-RNN have become one of the standard methods for sequence learning, especially in the context of OCR due to its ability to process unsegmented data and its inherent statistical language modeling [5]. It has recently been shown that training LSTM-RNNs even with imperfect transcriptions can lead to improved transcription results [7, 14]. The statistical nature of the LSTM's inherent language modeling can compensate for some of the errors in the ground truth and learn the correct temporal relations. In this paper we systematically explore the limits of the LSTM's language modeling ability by comparing the impact of imperfect transcriptions with various hand crafted error types and real erroneous data created through segmentation and clustering. We show that training LSTM-RNN with imperfect transcriptions can produce useful OCR models even if the ground truth error is up to 20%. Further we show that it can compensate for some handcrafted error types with error rates of up to 40% almost perfectly.

References

[1]

[n. d.]. Kallimachos. ([n. d.]). http://kallimachos.de/kallimachos/index.php/Narragonien:Main

[2]

[n. d.]. OCRopus -- Open Source Document Analysis and OCR System. ([n. d.]). https://github.com/tmbdev/ocropy

[3]

Felix A. Gers, JÃijrgen Schmidhuber, and Fred Cummins. [n. d.]. Learning to Forget: Continual Prediction with LSTM. 12, 10 ([n. d.]), 2451--2471.

[4]

Alex Graves, Douglas Eck, Nicole Beringer, and Juergen Schmidhuber. [n. d.]. Biologically plausible speech recognition with LSTM neural nets. In International Workshop on Biologically Inspired Approaches to Advanced Information Technology (2004). 127--136.

[5]

Graves, Alex and FernÃąndez, Santiago and Gomez, Faustino J. and Schmidhuber, JÃijrgen. 2006. Connectionist temporal classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. In ICML'06.

[6]

Sepp Hochreiter and JÃijrgen Schmidhuber. [n. d.]. Long Short-Term Memory. ([n. d.]), 1735--1780.

[7]

Martin Jenckel, Saqib Syed Bukhari, and Andreas Dengel. 2016. anyOCR: A Sequence Learning Based OCR System for Unlabeled Historical Documents. In 23rd International Conference on Pattern Recognition (ICPR'16). Mexiko.

[8]

David Rolnick, Andreas Veit, Serge J. Belongie, and Nir Shavit. 2017. Deep Learning is Robust to Massive Label Noise. CoRR abs/1705.10694 (2017). http://arxiv.org/abs/1705.10694

[9]

Uwe Springmann, Florian Fink, and Klaus U. Schulz. 2016. Automatic quality evaluation and (semi-) automatic improvement of mixed models for OCR on historical documents. CoRR abs/1606.05157 (2016). http://arxiv.org/abs/1606.05157

[10]

Sainbayar Sukhbaatar and Rob Fergus. 2014. Learning from Noisy Labels with Deep Neural Networks. CoRR abs/1406.2080 (2014). http://arxiv.org/abs/1406.2080

[11]

T. M. Breuel, A. Ul-Hasan, M. Al Azawi, F. Shafait. 2013. High Performance OCR for Printed English and Fraktur using LSTM Networks. In ICDAR. Washington D.C. USA.

Digital Library

[12]

A. Ul-Hasan, S. S. Bukhari, and A. Dengel. 2016. Meaningless Text OCR Model for Medieval Scripts. In 2nd International Conference on Natural Sciences and Technology in Manuscript Analysis. Germany.

[13]

Adnan Ul-Hasan, Faisal Shafait, and Thomas Breuel. 2013. High-Performance OCR for Printed English and Fraktur using LSTM Networks. (08 2013).

[14]

Adnan UlHasan, Saqib Syed Bukhari, and Andreas Dengel. 2016. OCRoRACT: A Sequence Learning OCR System Trained on Isolated Characters. In The 12th IAPR Workshop on Document Analysis Systems (DAS'16). Greece, 174--179.

[15]

M. R Yousefi, M. R. Soheili, T. M. Breuel, and D. Stricker. 2015. A Comparison of 1D and 2D LSTM Architectures for Recognition of Handwritten Arabic. In DRR-XXI. USA.

Cited By

Wang QJiang HRen JLiu HWang XZhang B(2024)An intrusion detection algorithm based on joint symmetric uncertainty and hyperparameter optimized fusion neural networkExpert Systems with Applications10.1016/j.eswa.2023.123014244(123014)Online publication date: Jun-2024
https://doi.org/10.1016/j.eswa.2023.123014
Lopes IZou DAbdulqadder IAkbar SLi ZRuambo FPereira W(2023)Network intrusion detection based on the temporal convolutional modelComputers & Security10.1016/j.cose.2023.103465(103465)Online publication date: Sep-2023
https://doi.org/10.1016/j.cose.2023.103465
Nguyen LTornyeviadzi HBui DSeidu R(2022)Predicting Discharges in Sewer Pipes Using an Integrated Long Short-Term Memory and Entropy A-TOPSIS Modeling FrameworkWater10.3390/w1403030014:3(300)Online publication date: 19-Jan-2022
https://doi.org/10.3390/w14030300
Show More Cited By

Index Terms

Training LSTM-RNN with Imperfect Transcription: Limitations and Outcomes
1. Applied computing
  1. Document management and text processing
    1. Document capture
      1. Optical character recognition

Recommendations

High-Performance OCR for Printed English and Fraktur Using LSTM Networks
ICDAR '13: Proceedings of the 2013 12th International Conference on Document Analysis and Recognition

Long Short-Term Memory (LSTM) networks have yielded excellent results on handwriting recognition. This paper describes an application of bidirectional LSTM networks to the problem of machine-printed Latin and Fraktur recognition. Latin and Fraktur ...
RNN based online handwritten word recognition in Devanagari and Bengali scripts using horizontal zoning
Highlights
- This article proposes a novel approach for online handwritten cursive and non-cursive word recognition in two of the most popular Indian scripts—Devanagari ...
Abstract
Devanagari and Bengali scripts are two of the most popular scripts in India. Most of the existing word recognition studies in these two scripts have relied upon the widely used Hidden Markov Model (HMM), in spite of its familiar ...
Meta features-based scale invariant OCR decision making using LSTM-RNN

Urdu optical character recognition (OCR) is a complex problem due to the nature of its script, which is cursive. Recognizing characters of different font sizes further complicates the problem. In this research, long short term memory-recurrent neural ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

HIP '17: Proceedings of the 4th International Workshop on Historical Document Imaging and Processing

November 2017

129 pages

ISBN:9781450353908

DOI:10.1145/3151509

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

FamilySearch: FamilySearch

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 November 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

HIP2017

HIP2017: The 4th International Workshop on Historical Document Imaging and Processing

November 10 - 11, 2017

Kyoto, Japan

Acceptance Rates

HIP '17 Paper Acceptance Rate 19 of 33 submissions, 58%;

Overall Acceptance Rate 52 of 90 submissions, 58%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
297
Total Downloads

Downloads (Last 12 months)52
Downloads (Last 6 weeks)1

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang QJiang HRen JLiu HWang XZhang B(2024)An intrusion detection algorithm based on joint symmetric uncertainty and hyperparameter optimized fusion neural networkExpert Systems with Applications10.1016/j.eswa.2023.123014244(123014)Online publication date: Jun-2024
https://doi.org/10.1016/j.eswa.2023.123014
Lopes IZou DAbdulqadder IAkbar SLi ZRuambo FPereira W(2023)Network intrusion detection based on the temporal convolutional modelComputers & Security10.1016/j.cose.2023.103465(103465)Online publication date: Sep-2023
https://doi.org/10.1016/j.cose.2023.103465
Nguyen LTornyeviadzi HBui DSeidu R(2022)Predicting Discharges in Sewer Pipes Using an Integrated Long Short-Term Memory and Entropy A-TOPSIS Modeling FrameworkWater10.3390/w1403030014:3(300)Online publication date: 19-Jan-2022
https://doi.org/10.3390/w14030300
Rajesh Kanna PSanthi P(2021)Unified Deep Learning approach for Efficient Intrusion Detection System using Integrated Spatial–Temporal FeaturesKnowledge-Based Systems10.1016/j.knosys.2021.107132226(107132)Online publication date: Aug-2021
https://doi.org/10.1016/j.knosys.2021.107132
Jung SKim J(2019) C k LR Algorithm for Improvement of Data Prediction and Accuracy Based on Clustering Data International Journal of Software Engineering and Knowledge Engineering10.1142/S021819401940001129:05(631-652)Online publication date: 20-May-2019
https://doi.org/10.1142/S0218194019400011
Huang YXie ZLiu MZhang SJin L(2018)Deep Word Association: A Flexible Chinese Word Association Method with Iterative Attention MechanismPattern Recognition and Computer Vision10.1007/978-3-030-03338-5_10(112-123)Online publication date: 3-Nov-2018
https://doi.org/10.1007/978-3-030-03338-5_10

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten