Combining Self-training and Minimal Annotations for Handwritten Word Recognition

Wolf, Fabian; Fink, Gernot A.

doi:10.1007/978-3-031-21648-0_21

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13639))

Included in the following conference series:

International Conference on Frontiers in Handwriting Recognition

1058 Accesses
1 Citations

Abstract

Handwritten Text Recognition (HTR) relies on deep learning to achieve high performances. Its success is substantially driven by large annotated training datasets resulting in powerful recognition models. Performances suffer considerably when applied to document collections with a distinctive style that is not well represented by training data. Applying a recognition model to a new data collection poses a tremendous annotation effort, which is often out of scope, for example considering historic collections. To overcome this limitation, we propose a training scheme that combines multiple data sources. Synthetically generated samples are used to train an initial model. Self-training offers the possibility to exploit unlabeled samples. We further investigate the question of how a small number of manually annotated samples can be integrated to achieve maximal performance with limited annotation effort. Therefore, we add labeled samples at different stages of self-training and propose two criteria, namely confidence and diversity, for the selection of samples to annotate. In our experiments, we show that the proposed training scheme is able to considerably close the gap to fully-supervised training on the designated training set with less than ten percent of the labeling demand.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aberdam, A., et al.: Sequence-to-sequence contrastive learning for text recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, pp. 15302–15312 (2021)
Google Scholar
Berthelot, D., Carlini, N., Goodfellow, I.J., Papernot, N., Oliver, A., Raffel, C.: MixMatch: a holistic approach to semi-supervised learning. In: Proceedings of International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, pp. 5050–5060 (2019)
Google Scholar
Brown, L.D., Cai, T.T., DasGupta, A.: Interval estimation for a binomial proportion. Stat. Sci. 16(2), 101–133 (2001)
Article MathSciNet MATH Google Scholar
Das, D., Jawahar, C.V.: Adapting OCR with limited supervision. In: Proceedings of International Workshop on Document Analysis Systems, Wuhan, China, pp. 30–44 (2020)
Google Scholar
Diaz, D.H., Qin, S., Ingle, R.R., Fujii, Y., Bissacco, A.: Rethinking text line recognition models. CoRR abs/2104.07787 (2021). https://arxiv.org/abs/2104.07787
Graves, A., Fernández, S., Gomez, F.J., Schmidhuber, J.: Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of International Conference on Machine Learning, Pittsburgh, PA, USA, vol. 148, pp. 369–376 (2006)
Google Scholar
Gurjar, N., Sudholt, S., Fink, G.A.: Learning deep representations for word spotting under weak supervision. In: Proceedings of International Workshop on Document Analysis Systems, Vienna, Austria, pp. 7–12 (2018)
Google Scholar
Jaramillo, J.C.A., Murillo-Fuentes, J.J., Olmos, P.M.: Boosting handwriting text recognition in small databases with transfer learning. In: Proceedings of International Conference on Frontiers in Handwriting Recognition, Niagara Falls, NY, USA, pp. 429–434 (2018)
Google Scholar
Kang, L., Riba, P., Rusiñol, M., Fornés, A., Villegas, M.: Pay attention to what you read: non-recurrent handwritten text-line recognition. Pattern Recogn. 129, 108766 (2022)
Article Google Scholar
Kang, L., Rusinol, M., Fornés, A., Riba, P., Villegas, M.: Unsupervised writer adaptation for synthetic-to-real handwritten word recognition. In: Winter Conference on Applications of Computer Vision, Snowmass Village, Co, USA, pp. 3502–3511 (2020)
Google Scholar
Kang, L., Toledo, J.I., Riba, P., Villegas, M., Fornés, A., Rusiñol, M.: Convolve, attend and spell: an attention-based sequence-to-sequence model for handwritten word recognition. In: German Conference on Pattern Recognition, Stuttgart, Germany, vol. 11269, pp. 459–472 (2018)
Google Scholar
Kiss, M., Benes, K., Hradis, M.: AT-ST: self-training adaptation strategy for OCR in domains with limited transcriptions. In: Proceedings of International Conference on Document Analysis and Recognition, Lausanne, Switzerland, vol. 12824, pp. 463–477 (2021)
Google Scholar
Kleber, F., Fiel, S., Diem, M., Sablatnig, R.: CVL-database: an off-line database for writer retrieval, writer identification and word spotting. In: Proceedings International Conference on Document Analysis and Recognition, Washington, DC, USA, pp. 560–564 (2013)
Google Scholar
Krishnan, P., Dutta, K., Jawahar, C.V.: Word spotting and recognition using deep embedding. In: Proceedings of International Workshop on Document Analysis Systems, Vienna, Austria, pp. 1–6 (2018)
Google Scholar
Krishnan, P., Jawahar, C.V.: HWNet v2: an efficient word image representation for handwritten documents. Int. J. Doc. Anal. Recogn. 22(4), 387–405 (2019)
Google Scholar
Lavrenko, V., Rath, T.M., Manmatha, R.: Holistic word recognition for handwritten historical documents. In: International Workshop on Document Image Analysis for Libraries, Palo Alto, CA, USA, pp. 278–287 (2004)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Lee, D.: Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In: ICML Workshop on Challenges in Representation Learning, Atlanta, GA, USA (2013)
Google Scholar
Li, M., et al.: Trocr: transformer-based optical character recognition with pre-trained models. CoRR abs/2109.10282 (2021). https://arxiv.org/abs/2109.10282
Marti, U., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5(1), 39–46 (2002)
Article MATH Google Scholar
Nair, R., Sankaran, N., Kota, B., Tulyakov, S., Setlur, S., Govindaraju, V.: Knowledge transfer using neural network based approach for handwritten text recognition. In: Proceedings of International Workshop on Document Analysis Systems, Vienna, Austria, pp. 441–446 (2018)
Google Scholar
Plötz, T., Fink, G.A.: Markov models for offline handwriting recognition: a survey. Int. J. Doc. Anal. Recogn. 12(4), 269–298 (2009)
Article Google Scholar
Retsinas, G., Sfikas, G., Nikou, C.: Iterative weighted transductive learning for handwriting recognition. In: Proceedings of International Conference on Document Analysis and Recognition, Lausanne, Switzerland, vol. 12824, pp. 587–601 (2021)
Google Scholar
Sohn, K., et al.: FixMatch: simplifying semi-supervised learning with consistency and confidence, vol. 33, pp. 596–608 (2020)
Google Scholar
Stuner, B., Chatelain, C., Paquet, T.: Self-training of BLSTM with lexicon verification for handwriting recognition. In: Proceedings of International Conference on Document Analysis and Recognition, Kyoto, Japan, pp. 633–638 (2017)
Google Scholar
Sueiras, J., Ruíz, V., Sánchez, Á., Vélez, J.F.: Offline continuous handwriting recognition using sequence to sequence neural networks. Neurocomputing 289, 119–128 (2018)
Article Google Scholar
Tensmeyer, C., Wigington, C., Davis, B.L., Stewart, S., Martinez, T.R., Barrett, W.: Language model supervision for handwriting recognition model adaptation. In: Proceedings of International Conference on Frontiers in Handwriting Recognition, Niagara Falls, NY, USA, pp. 133–138 (2018)
Google Scholar
Wigington, C., Stewart, S., Davis, B.L., Barrett, B., Price, B.L., Cohen, S.: Data augmentation for recognition of handwritten words and lines using a CNN-LSTM network. In: Proceedings of International Conference on Document Analysis and Recognition, Kyoto, Japan, pp. 639–645 (2017)
Google Scholar
Wolf, F., Fink, G.A.: Annotation-free learning of deep representations for word spotting using synthetic data and self labeling. In: Proceedings of International Workshop on Document Analysis Systems, Wuhan, China, pp. 293–308 (2020)
Google Scholar
Wolf, F., Fink, G.A.: Self-training of handwritten word recognition for synthetic-to-real adaptation. CoRR abs/2206.03149 (2022). https://arxiv.org/abs/2206.03149
Zhang, Y., Nie, S., Liu, W., Xu, X., Zhang, D., Shen, H.T.: Sequence-to-sequence domain adaptation network for robust text image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, pp. 2740–2749 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, TU Dortmund University, 44227, Dortmund, Germany
Fabian Wolf & Gernot A. Fink

Authors

Fabian Wolf
View author publications
You can also search for this author in PubMed Google Scholar
Gernot A. Fink
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fabian Wolf .

Editor information

Editors and Affiliations

Walmart Inc., Hoboken, NJ, USA
Utkarsh Porwal
Universitat Autònoma de Barcelona, Barcelona, Spain
Alicia Fornés
National University of Sciences and Technology (NUST), Islamabad, Pakistan
Faisal Shafait

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wolf, F., Fink, G.A. (2022). Combining Self-training and Minimal Annotations for Handwritten Word Recognition. In: Porwal, U., Fornés, A., Shafait, F. (eds) Frontiers in Handwriting Recognition. ICFHR 2022. Lecture Notes in Computer Science, vol 13639. Springer, Cham. https://doi.org/10.1007/978-3-031-21648-0_21

Download citation

DOI: https://doi.org/10.1007/978-3-031-21648-0_21
Published: 25 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21647-3
Online ISBN: 978-3-031-21648-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Combining Self-training and Minimal Annotations for Handwritten Word Recognition