Abstract
Handwritten Text Recognition (HTR) relies on deep learning to achieve high performances. Its success is substantially driven by large annotated training datasets resulting in powerful recognition models. Performances suffer considerably when applied to document collections with a distinctive style that is not well represented by training data. Applying a recognition model to a new data collection poses a tremendous annotation effort, which is often out of scope, for example considering historic collections. To overcome this limitation, we propose a training scheme that combines multiple data sources. Synthetically generated samples are used to train an initial model. Self-training offers the possibility to exploit unlabeled samples. We further investigate the question of how a small number of manually annotated samples can be integrated to achieve maximal performance with limited annotation effort. Therefore, we add labeled samples at different stages of self-training and propose two criteria, namely confidence and diversity, for the selection of samples to annotate. In our experiments, we show that the proposed training scheme is able to considerably close the gap to fully-supervised training on the designated training set with less than ten percent of the labeling demand.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aberdam, A., et al.: Sequence-to-sequence contrastive learning for text recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, pp. 15302–15312 (2021)
Berthelot, D., Carlini, N., Goodfellow, I.J., Papernot, N., Oliver, A., Raffel, C.: MixMatch: a holistic approach to semi-supervised learning. In: Proceedings of International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, pp. 5050–5060 (2019)
Brown, L.D., Cai, T.T., DasGupta, A.: Interval estimation for a binomial proportion. Stat. Sci. 16(2), 101–133 (2001)
Das, D., Jawahar, C.V.: Adapting OCR with limited supervision. In: Proceedings of International Workshop on Document Analysis Systems, Wuhan, China, pp. 30–44 (2020)
Diaz, D.H., Qin, S., Ingle, R.R., Fujii, Y., Bissacco, A.: Rethinking text line recognition models. CoRR abs/2104.07787 (2021). https://arxiv.org/abs/2104.07787
Graves, A., Fernández, S., Gomez, F.J., Schmidhuber, J.: Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of International Conference on Machine Learning, Pittsburgh, PA, USA, vol. 148, pp. 369–376 (2006)
Gurjar, N., Sudholt, S., Fink, G.A.: Learning deep representations for word spotting under weak supervision. In: Proceedings of International Workshop on Document Analysis Systems, Vienna, Austria, pp. 7–12 (2018)
Jaramillo, J.C.A., Murillo-Fuentes, J.J., Olmos, P.M.: Boosting handwriting text recognition in small databases with transfer learning. In: Proceedings of International Conference on Frontiers in Handwriting Recognition, Niagara Falls, NY, USA, pp. 429–434 (2018)
Kang, L., Riba, P., Rusiñol, M., Fornés, A., Villegas, M.: Pay attention to what you read: non-recurrent handwritten text-line recognition. Pattern Recogn. 129, 108766 (2022)
Kang, L., Rusinol, M., Fornés, A., Riba, P., Villegas, M.: Unsupervised writer adaptation for synthetic-to-real handwritten word recognition. In: Winter Conference on Applications of Computer Vision, Snowmass Village, Co, USA, pp. 3502–3511 (2020)
Kang, L., Toledo, J.I., Riba, P., Villegas, M., Fornés, A., Rusiñol, M.: Convolve, attend and spell: an attention-based sequence-to-sequence model for handwritten word recognition. In: German Conference on Pattern Recognition, Stuttgart, Germany, vol. 11269, pp. 459–472 (2018)
Kiss, M., Benes, K., Hradis, M.: AT-ST: self-training adaptation strategy for OCR in domains with limited transcriptions. In: Proceedings of International Conference on Document Analysis and Recognition, Lausanne, Switzerland, vol. 12824, pp. 463–477 (2021)
Kleber, F., Fiel, S., Diem, M., Sablatnig, R.: CVL-database: an off-line database for writer retrieval, writer identification and word spotting. In: Proceedings International Conference on Document Analysis and Recognition, Washington, DC, USA, pp. 560–564 (2013)
Krishnan, P., Dutta, K., Jawahar, C.V.: Word spotting and recognition using deep embedding. In: Proceedings of International Workshop on Document Analysis Systems, Vienna, Austria, pp. 1–6 (2018)
Krishnan, P., Jawahar, C.V.: HWNet v2: an efficient word image representation for handwritten documents. Int. J. Doc. Anal. Recogn. 22(4), 387–405 (2019)
Lavrenko, V., Rath, T.M., Manmatha, R.: Holistic word recognition for handwritten historical documents. In: International Workshop on Document Image Analysis for Libraries, Palo Alto, CA, USA, pp. 278–287 (2004)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Lee, D.: Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In: ICML Workshop on Challenges in Representation Learning, Atlanta, GA, USA (2013)
Li, M., et al.: Trocr: transformer-based optical character recognition with pre-trained models. CoRR abs/2109.10282 (2021). https://arxiv.org/abs/2109.10282
Marti, U., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5(1), 39–46 (2002)
Nair, R., Sankaran, N., Kota, B., Tulyakov, S., Setlur, S., Govindaraju, V.: Knowledge transfer using neural network based approach for handwritten text recognition. In: Proceedings of International Workshop on Document Analysis Systems, Vienna, Austria, pp. 441–446 (2018)
Plötz, T., Fink, G.A.: Markov models for offline handwriting recognition: a survey. Int. J. Doc. Anal. Recogn. 12(4), 269–298 (2009)
Retsinas, G., Sfikas, G., Nikou, C.: Iterative weighted transductive learning for handwriting recognition. In: Proceedings of International Conference on Document Analysis and Recognition, Lausanne, Switzerland, vol. 12824, pp. 587–601 (2021)
Sohn, K., et al.: FixMatch: simplifying semi-supervised learning with consistency and confidence, vol. 33, pp. 596–608 (2020)
Stuner, B., Chatelain, C., Paquet, T.: Self-training of BLSTM with lexicon verification for handwriting recognition. In: Proceedings of International Conference on Document Analysis and Recognition, Kyoto, Japan, pp. 633–638 (2017)
Sueiras, J., Ruíz, V., Sánchez, Á., Vélez, J.F.: Offline continuous handwriting recognition using sequence to sequence neural networks. Neurocomputing 289, 119–128 (2018)
Tensmeyer, C., Wigington, C., Davis, B.L., Stewart, S., Martinez, T.R., Barrett, W.: Language model supervision for handwriting recognition model adaptation. In: Proceedings of International Conference on Frontiers in Handwriting Recognition, Niagara Falls, NY, USA, pp. 133–138 (2018)
Wigington, C., Stewart, S., Davis, B.L., Barrett, B., Price, B.L., Cohen, S.: Data augmentation for recognition of handwritten words and lines using a CNN-LSTM network. In: Proceedings of International Conference on Document Analysis and Recognition, Kyoto, Japan, pp. 639–645 (2017)
Wolf, F., Fink, G.A.: Annotation-free learning of deep representations for word spotting using synthetic data and self labeling. In: Proceedings of International Workshop on Document Analysis Systems, Wuhan, China, pp. 293–308 (2020)
Wolf, F., Fink, G.A.: Self-training of handwritten word recognition for synthetic-to-real adaptation. CoRR abs/2206.03149 (2022). https://arxiv.org/abs/2206.03149
Zhang, Y., Nie, S., Liu, W., Xu, X., Zhang, D., Shen, H.T.: Sequence-to-sequence domain adaptation network for robust text image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, pp. 2740–2749 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wolf, F., Fink, G.A. (2022). Combining Self-training and Minimal Annotations for Handwritten Word Recognition. In: Porwal, U., Fornés, A., Shafait, F. (eds) Frontiers in Handwriting Recognition. ICFHR 2022. Lecture Notes in Computer Science, vol 13639. Springer, Cham. https://doi.org/10.1007/978-3-031-21648-0_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-21648-0_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21647-3
Online ISBN: 978-3-031-21648-0
eBook Packages: Computer ScienceComputer Science (R0)