Abstract
In this paper, we propose a straight-forward and effective Font-based Synthetic Text Generator (FbSTG) to alleviate the need for annotated data required for not just Cyrillic handwritten text recognition. Unlike standard GAN-based methods, the FbSTG does not have to be trained to learn new characters and styles; all it needs is the fonts, the text, and sampled page backgrounds. In order to show the benefits of the newly proposed method, we train and test two different OCR systems (Tesseract, and TrOCR) on the Handwritten Kazakh and Russian dataset (HKR) both with and without synthetic data. Besides, we evaluate both systems’ performance on a private NKVD dataset containing historical documents from Ukraine with a high amount of out-of-vocabulary (OoV) words representing an extremely challenging task for current state-of-the-art methods. We decreased the CER and WER significantly by adding the synthetic data with the TrOCR-Base-384 model on both datasets. More precisely, we reduced the relative error in terms of CER/WER on (i) HKR-Test1 with OoV samples by around \(20\%\)/\(10\%\), and (ii) NKVD dataset by \(24\%\) CER and \(8\%\) WER. The FbSTG code is available at: https://github.com/mhlzcu/doc_gen.
This research was supported by the Ministry of Culture Czech Republic, project No. DG20P02OVV018. The work described herein has also been supported by the Ministry of Education, Youth and Sports of the Czech Republic, Project No. LM2023062 LINDAT/CLARIAH-CZ. Access to computing and storage facilities owned by parties and projects contributing to the National Grid Infrastructure MetaCentrum provided under the programme “Projects of Large Research, Development, and Innovations Infrastructures” (CESNET LM2015042), is greatly appreciated.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abdallah, A., Hamada, M., Nurseitov, D.: Attention-based fully gated CNN-BGRU for Russian handwritten text. J. Imaging 6(12), 141 (2020)
Bao, H., Dong, L., Wei, F.: BEiT: BERT pre-training of image transformers. arXiv preprint arXiv:2106.08254 (2021)
Bluche, T., Louradour, J., Messina, R.O.: Scan, attend and read: end-to-end handwritten paragraph recognition with MDLSTM attention. CoRR abs/1604.03286 (2016). http://arxiv.org/abs/1604.03286
Bluche, T., Messina, R.: Gated convolutional recurrent neural networks for multilingual handwriting recognition. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 646–651. IEEE (2017)
Bureš, L., Neduchal, P., Hlaváč, M., Hrúz, M.: Generation of synthetic images of full-text documents. In: Karpov, A., Jokisch, O., Potapova, R. (eds.) SPECOM 2018. LNCS (LNAI), vol. 11096, pp. 68–75. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99579-3_8
Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116 (2019)
Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., Bharath, A.A.: Generative adversarial networks: an overview. IEEE Sig. Process. Mag. 35(1), 53–65 (2018)
Davis, B., Tensmeyer, C., Price, B., Wigington, C., Morse, B., Jain, R.: Text and style conditioned GAN for generation of offline handwriting lines. arXiv preprint arXiv:2009.00678 (2020)
Dosovitskiy, A., et al.: An image is worth \(16\times 16\) words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Fogel, S., Averbuch-Elor, H., Cohen, S., Mazor, S., Litman, R.: ScrabbleGAN: semi-supervised varying length handwritten text generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4324–4333 (2020)
Graves, A., Liwicki, M., Fernandez, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 855–868 (2009). https://doi.org/10.1109/tpami.2008.137
Graves, A.: Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013)
Gruber, I., et al.: OCR improvements for images of multi-page historical documents. In: Karpov, A., Potapova, R. (eds.) SPECOM 2021. LNCS (LNAI), vol. 12997, pp. 226–237. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87802-3_21
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324 (2016)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227 (2014)
Kang, L., Riba, P., Rusiñol, M., Fornés, A., Villegas, M.: Pay attention to what you read: non-recurrent handwritten text-line recognition. arXiv preprint arXiv:2005.13044. CoRR abs/2005.13044 (2020). http://arxiv.org/abs/2005.13044
Kang, L., Toledo, J.I., Riba, P., Villegas, M., Fornés, A., Rusiñol, M.: Convolve, attend and spell: an attention-based sequence-to-sequence model for handwritten word recognition. In: Brox, T., Bruhn, A., Fritz, M. (eds.) GCPR 2018. LNCS, vol. 11269, pp. 459–472. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-12939-2_32
Kay, A.: Tesseract: an open-source optical character recognition engine. Linux J. 2007(159), 2 (2007)
Krishnan, P., Dutta, K., Jawahar, C.: Word spotting and recognition using deep embedding. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 1–6 (2018). https://doi.org/10.1109/DAS.2018.70
Li, M., et al.: TrOCR: transformer-based optical character recognition with pre-trained models. arXiv preprint arXiv:2109.10282 (2021)
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5(1), 39–46 (2002). https://doi.org/10.1007/s100320200071
Nurseitov, D., Bostanbekov, K., Kurmankhojayev, D., Alimova, A., Abdallah, A., Tolegenov, R.: Handwritten Kazakh and Russian (HKR) database for text recognition. Multimed. Tools Appl. 80, 33075–33097 (2021). https://doi.org/10.1007/s11042-021-11399-6
Perlin, K.: An image synthesizer. SIGGRAPH Comput. Graph. 19(3), 287–296 (1985). https://doi.org/10.1145/325165.325247
Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 67–72. IEEE (2017)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Shonenkov, A., Karachev, D., Novopoltsev, M., Potanin, M., Dimitrov, D.: StackMix and blot augmentations for handwritten text recognition. arXiv preprint arXiv:2108.11667 (2021)
Smith, R.: An overview of the tesseract OCR engine. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 2, pp. 629–633. IEEE (2007)
Stuner, B., Chatelain, C., Paquet, T.: Cohort of LSTM and lexicon verification for handwriting recognition with gigantic lexicon. CoRR abs/1612.07528 (2016). http://arxiv.org/abs/1612.07528
Sueiras, J., Ruiz, V., Sanchez, A., Velez, J.F.: Offline continuous handwriting recognition using sequence to sequence neural networks. Neurocomputing 289(1), 119–128 (2018). https://doi.org/10.1016/j.neucom.2018.02.008
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning, pp. 10347–10357. PMLR (2021)
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS 2017, pp. 6000–6010. Curran Associates Inc., Red Hook (2017)
Wigington, C., Stewart, S., Davis, B., Barrett, B., Price, B., Cohen, S.: Data augmentation for recognition of handwritten words and lines using a CNN-LSTM network. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 01, pp. 639–645 (2017). https://doi.org/10.1109/ICDAR.2017.110
Wolf, T., et al.: Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45. Association for Computational Linguistics (2020). https://www.aclweb.org/anthology/2020.emnlp-demos.6
Zdenek, J., Nakayama, H.: JokerGAN: memory-efficient model for handwritten text generation with text line awareness. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 5655–5663 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gruber, I., Picek, L., Hlaváč, M., Neduchal, P., Hrúz, M. (2024). Improving Handwritten Cyrillic OCR by Font-Based Synthetic Text Generator. In: Moosaei, H., Hladík, M., Pardalos, P.M. (eds) Dynamics of Information Systems. DIS 2023. Lecture Notes in Computer Science, vol 14321. Springer, Cham. https://doi.org/10.1007/978-3-031-50320-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-50320-7_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-50319-1
Online ISBN: 978-3-031-50320-7
eBook Packages: Computer ScienceComputer Science (R0)