Skip to main content

Combining Self-training and Minimal Annotations for Handwritten Word Recognition

  • Conference paper
  • First Online:
Frontiers in Handwriting Recognition (ICFHR 2022)

Abstract

Handwritten Text Recognition (HTR) relies on deep learning to achieve high performances. Its success is substantially driven by large annotated training datasets resulting in powerful recognition models. Performances suffer considerably when applied to document collections with a distinctive style that is not well represented by training data. Applying a recognition model to a new data collection poses a tremendous annotation effort, which is often out of scope, for example considering historic collections. To overcome this limitation, we propose a training scheme that combines multiple data sources. Synthetically generated samples are used to train an initial model. Self-training offers the possibility to exploit unlabeled samples. We further investigate the question of how a small number of manually annotated samples can be integrated to achieve maximal performance with limited annotation effort. Therefore, we add labeled samples at different stages of self-training and propose two criteria, namely confidence and diversity, for the selection of samples to annotate. In our experiments, we show that the proposed training scheme is able to considerably close the gap to fully-supervised training on the designated training set with less than ten percent of the labeling demand.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aberdam, A., et al.: Sequence-to-sequence contrastive learning for text recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, pp. 15302–15312 (2021)

    Google Scholar 

  2. Berthelot, D., Carlini, N., Goodfellow, I.J., Papernot, N., Oliver, A., Raffel, C.: MixMatch: a holistic approach to semi-supervised learning. In: Proceedings of International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, pp. 5050–5060 (2019)

    Google Scholar 

  3. Brown, L.D., Cai, T.T., DasGupta, A.: Interval estimation for a binomial proportion. Stat. Sci. 16(2), 101–133 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  4. Das, D., Jawahar, C.V.: Adapting OCR with limited supervision. In: Proceedings of International Workshop on Document Analysis Systems, Wuhan, China, pp. 30–44 (2020)

    Google Scholar 

  5. Diaz, D.H., Qin, S., Ingle, R.R., Fujii, Y., Bissacco, A.: Rethinking text line recognition models. CoRR abs/2104.07787 (2021). https://arxiv.org/abs/2104.07787

  6. Graves, A., Fernández, S., Gomez, F.J., Schmidhuber, J.: Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of International Conference on Machine Learning, Pittsburgh, PA, USA, vol. 148, pp. 369–376 (2006)

    Google Scholar 

  7. Gurjar, N., Sudholt, S., Fink, G.A.: Learning deep representations for word spotting under weak supervision. In: Proceedings of International Workshop on Document Analysis Systems, Vienna, Austria, pp. 7–12 (2018)

    Google Scholar 

  8. Jaramillo, J.C.A., Murillo-Fuentes, J.J., Olmos, P.M.: Boosting handwriting text recognition in small databases with transfer learning. In: Proceedings of International Conference on Frontiers in Handwriting Recognition, Niagara Falls, NY, USA, pp. 429–434 (2018)

    Google Scholar 

  9. Kang, L., Riba, P., Rusiñol, M., Fornés, A., Villegas, M.: Pay attention to what you read: non-recurrent handwritten text-line recognition. Pattern Recogn. 129, 108766 (2022)

    Article  Google Scholar 

  10. Kang, L., Rusinol, M., Fornés, A., Riba, P., Villegas, M.: Unsupervised writer adaptation for synthetic-to-real handwritten word recognition. In: Winter Conference on Applications of Computer Vision, Snowmass Village, Co, USA, pp. 3502–3511 (2020)

    Google Scholar 

  11. Kang, L., Toledo, J.I., Riba, P., Villegas, M., Fornés, A., Rusiñol, M.: Convolve, attend and spell: an attention-based sequence-to-sequence model for handwritten word recognition. In: German Conference on Pattern Recognition, Stuttgart, Germany, vol. 11269, pp. 459–472 (2018)

    Google Scholar 

  12. Kiss, M., Benes, K., Hradis, M.: AT-ST: self-training adaptation strategy for OCR in domains with limited transcriptions. In: Proceedings of International Conference on Document Analysis and Recognition, Lausanne, Switzerland, vol. 12824, pp. 463–477 (2021)

    Google Scholar 

  13. Kleber, F., Fiel, S., Diem, M., Sablatnig, R.: CVL-database: an off-line database for writer retrieval, writer identification and word spotting. In: Proceedings International Conference on Document Analysis and Recognition, Washington, DC, USA, pp. 560–564 (2013)

    Google Scholar 

  14. Krishnan, P., Dutta, K., Jawahar, C.V.: Word spotting and recognition using deep embedding. In: Proceedings of International Workshop on Document Analysis Systems, Vienna, Austria, pp. 1–6 (2018)

    Google Scholar 

  15. Krishnan, P., Jawahar, C.V.: HWNet v2: an efficient word image representation for handwritten documents. Int. J. Doc. Anal. Recogn. 22(4), 387–405 (2019)

    Google Scholar 

  16. Lavrenko, V., Rath, T.M., Manmatha, R.: Holistic word recognition for handwritten historical documents. In: International Workshop on Document Image Analysis for Libraries, Palo Alto, CA, USA, pp. 278–287 (2004)

    Google Scholar 

  17. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  18. Lee, D.: Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In: ICML Workshop on Challenges in Representation Learning, Atlanta, GA, USA (2013)

    Google Scholar 

  19. Li, M., et al.: Trocr: transformer-based optical character recognition with pre-trained models. CoRR abs/2109.10282 (2021). https://arxiv.org/abs/2109.10282

  20. Marti, U., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5(1), 39–46 (2002)

    Article  MATH  Google Scholar 

  21. Nair, R., Sankaran, N., Kota, B., Tulyakov, S., Setlur, S., Govindaraju, V.: Knowledge transfer using neural network based approach for handwritten text recognition. In: Proceedings of International Workshop on Document Analysis Systems, Vienna, Austria, pp. 441–446 (2018)

    Google Scholar 

  22. Plötz, T., Fink, G.A.: Markov models for offline handwriting recognition: a survey. Int. J. Doc. Anal. Recogn. 12(4), 269–298 (2009)

    Article  Google Scholar 

  23. Retsinas, G., Sfikas, G., Nikou, C.: Iterative weighted transductive learning for handwriting recognition. In: Proceedings of International Conference on Document Analysis and Recognition, Lausanne, Switzerland, vol. 12824, pp. 587–601 (2021)

    Google Scholar 

  24. Sohn, K., et al.: FixMatch: simplifying semi-supervised learning with consistency and confidence, vol. 33, pp. 596–608 (2020)

    Google Scholar 

  25. Stuner, B., Chatelain, C., Paquet, T.: Self-training of BLSTM with lexicon verification for handwriting recognition. In: Proceedings of International Conference on Document Analysis and Recognition, Kyoto, Japan, pp. 633–638 (2017)

    Google Scholar 

  26. Sueiras, J., Ruíz, V., Sánchez, Á., Vélez, J.F.: Offline continuous handwriting recognition using sequence to sequence neural networks. Neurocomputing 289, 119–128 (2018)

    Article  Google Scholar 

  27. Tensmeyer, C., Wigington, C., Davis, B.L., Stewart, S., Martinez, T.R., Barrett, W.: Language model supervision for handwriting recognition model adaptation. In: Proceedings of International Conference on Frontiers in Handwriting Recognition, Niagara Falls, NY, USA, pp. 133–138 (2018)

    Google Scholar 

  28. Wigington, C., Stewart, S., Davis, B.L., Barrett, B., Price, B.L., Cohen, S.: Data augmentation for recognition of handwritten words and lines using a CNN-LSTM network. In: Proceedings of International Conference on Document Analysis and Recognition, Kyoto, Japan, pp. 639–645 (2017)

    Google Scholar 

  29. Wolf, F., Fink, G.A.: Annotation-free learning of deep representations for word spotting using synthetic data and self labeling. In: Proceedings of International Workshop on Document Analysis Systems, Wuhan, China, pp. 293–308 (2020)

    Google Scholar 

  30. Wolf, F., Fink, G.A.: Self-training of handwritten word recognition for synthetic-to-real adaptation. CoRR abs/2206.03149 (2022). https://arxiv.org/abs/2206.03149

  31. Zhang, Y., Nie, S., Liu, W., Xu, X., Zhang, D., Shen, H.T.: Sequence-to-sequence domain adaptation network for robust text image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, pp. 2740–2749 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fabian Wolf .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wolf, F., Fink, G.A. (2022). Combining Self-training and Minimal Annotations for Handwritten Word Recognition. In: Porwal, U., Fornés, A., Shafait, F. (eds) Frontiers in Handwriting Recognition. ICFHR 2022. Lecture Notes in Computer Science, vol 13639. Springer, Cham. https://doi.org/10.1007/978-3-031-21648-0_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-21648-0_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-21647-3

  • Online ISBN: 978-3-031-21648-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics