Advancing Handwritten Text Detection by Synthetic Text

Muth, Markus; Peer, Marco; Kleber, Florian; Sablatnig, Robert

doi:10.1007/978-3-031-78495-8_8

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15319))

Included in the following conference series:

International Conference on Pattern Recognition

222 Accesses

Abstract

In this paper, the usability of synthetic handwritten text to improve machine learning models is examined for the domain of handwritten text detection. We generate synthetic handwritten text by using an existing model based on a style conditioned GAN, and add those texts to scanned documents to mimic handwritten annotations. Object detection models (YOLOv5 and YOLOv8) are trained using synthetic data as a baseline to distinguish handwritten text from remaining content. We study different granularity labels (word-, line- and paragraph-level) and model sizes in our evaluation and show that applying those models to real data results in a mAP@50 of 0.88 and a pixel-level F1@50 of 0.96 for the CVL dataset, and a mAP@50 of 0.72 and F1@50 of 0.89 for SCAN, a custom dataset, created by adding real handwritten annotations to a scientific paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.99; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Maximizing Data Efficiency of HTR Models by Synthetic Text

Improving Handwriting Recognition for Historical Documents Using Synthetic Text Lines

The Learnable Typewriter: A Generative Approach to Text Analysis

Notes

1.
The value of $\pm 1^{\circ }$ is empirically defined to keep a balance between clearly visible line rotations and extreme line spacings for paragraphs with long lines.
2.
Each document has a randomly chosen number of 1 .. 12 paragraphs (those boundaries are empirically defined). The algorithm stops after 1000 iterations (also empirically defined), hence less paragraphs than this randomly chosen upper limit are possible as well.
3.
https://datatorch.io, last accessed on 2023-07-16.

References

Apostolos, A., Bridson, D., Papadopoulos, C., Pletschacher, S.: A realistic dataset for performance evaluation of document layout analysis. In: Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR2009), pp. 296–300 (2009)
Google Scholar
Bhunia, A.K., Khan, S., Cholakkal, H., Anwer, R.M., Khan, F.S., Shah, M.: Handwriting Transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1086–1094 (2021)
Google Scholar
Carbonell, M., Mas, J., Villegas, M., Fornés, A., Lladós, J.: End-to-End Handwritten Text Detection and Transcription in Full Pages. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 5, pp. 29–34 (2019)
Google Scholar
Davis, B.L., Morse, B.S., Price, B.L., Tensmeyer, C., Wigington, C., Jain, R.: Text and style conditioned GAN for generation of offline handwriting lines. In: 31st British Machine Vision Conference 2020, BMVC 2020, Virtual Event, UK (2020)
Google Scholar
Fiel, S., Sablatnig, R.: Writer identification and retrieval using a convolutional neural network. In: CAIP 2015, Part II, pp. 26–37 (2015)
Google Scholar
Fogel, S., Averbuch-Elor, H., Cohen, S., Mazor, S., Litman, R.: ScrabbleGAN: semi-supervised varying length handwritten text generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Gholamian, S., Vahdat, A.: Handwritten and printed text segmentation: a signature case study. In: IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, 1–6 October 2023, pp. 582–592 (2023)
Google Scholar
Graves, A.: Generating sequences with recurrent neural networks. CoRR abs/1308.0850 (2013)
Google Scholar
Jo, J., Koo, H.I., Soh, J.W., Cho, N.I.: Handwritten text segmentation via end-to-end learning of convolutional neural networks. Multimed. Tools Appl. 79(43–44), 32137–32150 (2020)
Article Google Scholar
Jocher, G., et al.: Ultralytics/YOLOv5: v7.0 - YOLOv5 SOTA realtime instance segmentation (2022). https://zenodo.org/record/7347926. Accessed 2023-09-24
Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. version 8.0.0 (2023). https://github.com/ultralytics/ultralytics. Accessed 24 Sept 2023
Kang, L., Riba, P., Wang, Y., Rusiñol, M., Fornés, A., Villegas, M.: GANwriting: content-conditioned generation of styled handwritten word images. In: ECCV 2020, pp. 273–289 (2020)
Google Scholar
Kleber, F., Fiel, S., Diem, M., Sablatnig, R.: CVL-DataBase: an off-line database for writer retrieval, writer identification and word spotting. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 560–564 (2013)
Google Scholar
Krishnan, P., Dutta, K., Jawahar, C.: Deep feature embedding for accurate recognition and retrieval of handwritten text. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 289–294 (2016)
Google Scholar
Lin, T.Y., et al.: Microsoft COCO: common objects in context. In: ECCV 2014, pp. 740–755 (2014)
Google Scholar
Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5(1), 39–46 (2002)
Article Google Scholar
Marti, U.V., Messerli, R., Bunke, H.: Writer identification using text line based features. In: Proceedings of Sixth International Conference on Document Analysis and Recognition, pp. 101–105 (2001)
Google Scholar
Muth, M.M.: Synthetic data for applications in document analysis. Diploma Thesis (2023). https://repositum.tuwien.at/handle/20.500.12708/188733. Artwork Size: 69 pages, TU Wien
Shen, Q., Luan, F., Yuan, S.: Multi-scale residual based Siamese neural network for writer-independent online signature verification. Appl. Intell. 52(12), 14571–14589 (2022)
Article Google Scholar
Stig, J., Leech, G.N., Goodluck, H.: Manual of information to accompany the Lancaster-Oslo/Bergen Corpus of British English, for use with digital computers. Department of English, University of Oslo (1978)
Google Scholar
Tolkien, J.R.R.: The fellowship of the ring. The Lord of the Rings, HarperCollins, London, England (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Vision Lab, Institute of Visual Computing and Human-Centered Technology, TU Wien, 1040, Wien, Austria
Markus Muth, Marco Peer, Florian Kleber & Robert Sablatnig

Authors

Markus Muth
View author publications
You can also search for this author in PubMed Google Scholar
Marco Peer
View author publications
You can also search for this author in PubMed Google Scholar
Florian Kleber
View author publications
You can also search for this author in PubMed Google Scholar
Robert Sablatnig
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Florian Kleber .

Editor information

Editors and Affiliations

University of Salford, Salford, Lancashire, UK
Apostolos Antonacopoulos
IIT Bombay, Powai, Mumbai, Maharashtra, India
Subhasis Chaudhuri
Johns Hopkins University, Baltimore, MD, USA
Rama Chellappa
Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
IIT Kharagpur, Kharagpur, West Bengal, India
Saumik Bhattacharya
ISI Kolkata, kolkata, West Bengal, India
Umapada Pal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Muth, M., Peer, M., Kleber, F., Sablatnig, R. (2025). Advancing Handwritten Text Detection by Synthetic Text. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15319. Springer, Cham. https://doi.org/10.1007/978-3-031-78495-8_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-78495-8_8
Published: 04 December 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78494-1
Online ISBN: 978-3-031-78495-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Advancing Handwritten Text Detection by Synthetic Text