Abstract
This study evaluates the recently proposed Document Attention Network (DAN) for extracting key-value information from Uruguayan birth certificates, handwritten in Spanish. We investigate two annotation strategies for automatically transcribing handwritten documents, fine-tuning DAN with minimal training data and annotation effort. Experiments were conducted on two datasets containing the same images (201 scans of birth certificates written by more than 15 different writers) but with different annotation methods. Our findings indicate that normalized annotation is more effective for fields that can be standardized, such as dates and places of birth, whereas diplomatic annotation performs much better for fields containing names and surnames, which can not be standardized.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
References
Dan implementation repository by TEKLIA. https://gitlab.teklia.com/atr/dan, release: 0.2.0rc6
Abadie, N., Carlinet, E., Chazalon, J., Duménieu, B.: A benchmark of named entity recognition approaches in historical documents application to 19th century French directories. In: Uchida, S., Barney, E., Eglin, V. (eds) Document Analysis Systems. DAS 2022. Lecture Notes in Computer Science, vol 13237, pp. 445–460. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-06555-2_30
Akbik, A., Bergmann, T., Blythe, D., Rasul, K., Schweter, S., Vollgraf, R.: FLAIR: an easy-to-use framework for state-of-the-art NLP. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), pp. 54–59 (2019)
Arora, A., et al.: Using ASR methods for OCR. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 663–668. IEEE (2019)
Bluche, T., Louradour, J., Messina, R.: Scan, attend and read: end-to-end handwritten paragraph recognition with MDLSTM attention. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1050–1055. IEEE (2017)
Boillet, M., Tarride, S., Schneider, Y., Abadie, B., Kesztenbaum, L., Kermorvant, C.: The Socface project: large-scale collection, processing, and analysis of a century of French censuses (2024)
Cheplygina, V., Varoquaux, G.: Artificial intelligence in science: lessons from shortcomings in machine learning for medical imaging. In: Artificial Intelligence in Science: Challenges, Opportunities and the Future of Research. Organization for Economic Co-operation and Development (OECD) (2023)
Clérice, T., et al.: CATMuS medieval: a multilingual large-scale cross-century dataset in Latin script for handwritten text recognition and beyond (2024)
Constum, T. et al.: Recognition and information extraction in historical handwritten tables: toward understanding early 20th century Paris census. In: Uchida, S., Barney, E., Eglin, V. (eds) Document Analysis Systems. DAS 2022. LNCS, vol 13237, pp. 143–157 Springer, Cham (2022). https://doi.org/10.1007/978-3-031-06555-2_10
Coquenet, D., Chatelain, C., Paquet, T.: End-to-end handwritten paragraph text recognition using a vertical attention network. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 508–524 (2022)
Coquenet, D., Chatelain, C., Paquet, T.: DAN: a segmentation-free document attention network for handwritten document recognition. IEEE Trans. Pattern Anal. Mach. Intell. 45(7), 8227–8243 (2023)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Graves, A., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks. In: Proceedings of the 21st International Conference on Neural Information Processing Systems, NIPS 2008, pp. 545–552. Curran Associates Inc., Red Hook, NY, USA (2008)
Grosicki, E., Carré, M., Brodin, J.M., Geoffrois, E.: Results of the RIMES evaluation campaign for handwritten mail processing. In: 2009 10th International Conference on Document Analysis and Recognition, pp. 941–945. IEEE (2009)
Huang, Y., Lv, T., Cui, L., Lu, Y., Wei, F.: LayoutLMv3: pre-training for document AI with unified text and image masking. In: Proceedings of the 30th ACM International Conference on Multimedia, MM 2022, pp. 4083–4091. ACM, New York, NY, USA (2022). https://doi.org/10.1145/3503161.3548112
Kim, G., et al.: OCR-free document understanding transformer. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Comput. Vision - ECCV 2022, pp. 498–517. Springer Nature Switzerland, Cham (2022)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Proc. Syst . 25 (2012)
Li, M., et al.: TrOCR: transformer-based optical character recognition with pre-trained models. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 13094–13102 (2023)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. 26 (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. 26 (2013)
Monroc, C.B., Miret, B., Bonhomme, M.-L., Kermorvant, C.: A comprehensive study of open-source libraries for named entity recognition on handwritten historical documents. In: Uchida, S., Barney, E., Eglin, V. (eds.) Document Analysis Systems: 15th IAPR International Workshop, DAS 2022, La Rochelle, France, May 22–25, 2022, Proceedings, pp. 429–444. Springer International Publishing, Cham (2022). https://doi.org/10.1007/978-3-031-06555-2_29
Nion, T., et al.: Handwritten information extraction from historical census documents. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 822–826. IEEE (2013)
Oliveira, S.A., Seguin, B., Kaplan, F.: dhSegment: a generic deep-learning approach for document segmentation. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 7–12. IEEE (2018)
Peng, Q., et al.: ERNIE-layout: layout knowledge enhanced pre-training for visually-rich document understanding. In: Goldberg, Y., Kozareva, Z., Zhang, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2022, pp. 3744–3756. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (Dec 2022). https://doi.org/10.18653/v1/2022.findings-emnlp.274, https://aclanthology.org/2022.findings-emnlp.274
Petitpierre, R., Kramer, M., Rappo, L.: An end-to-end pipeline for historical censuses processing. Int. J. Doc. Anal. Recogn. (IJDAR) 26(4), 419–432 (2023)
Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 67–72. IEEE (2017)
Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 67–72. IEEE (2017)
Romero, V., et al.: The ESPOSALLES database: an ancient marriage license corpus for off-line handwriting recognition. Pattern Recogn. 46(6), 1658–1669 (2013). https://doi.org/10.1016/j.patcog.2012.11.024, https://www.sciencedirect.com/science/article/pii/S0031320312005080
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Singh, S.S., Karayev, S.: Full page handwriting recognition via image to sequence extraction. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12823, pp. 55–69. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86334-0_4
Tarride, S., Boillet, M., Kermorvant, C.: Key-Value Information Extraction from Full Handwritten Pages. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. LNCS, vol 14188, pp. 185–204 Springer, Cham (2023). https://doi.org/10.1007/978-3-031-41679-8_11
Tarride, S., Boillet, M., Moufflet, J.-F., Kermorvant, C.: SIMARA: a database for key-value information extraction from full-page handwritten documents. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds.) Document Analysis and Recognition - ICDAR 2023: 17th International Conference, San José, CA, USA, August 21–26, 2023, Proceedings, Part III, pp. 421–437. Springer Nature Switzerland, Cham (2023). https://doi.org/10.1007/978-3-031-41682-8_26
Tarride, S., Lemaitre, A., Coüasnon, B., Tardivel, S.: A comparative study of information extraction strategies using an attention-based neural network. In: Uchida, S., Barney, E., Eglin, V. (eds.) Document Analysis Systems: 15th IAPR International Workshop, DAS 2022, La Rochelle, France, May 22–25, 2022, Proceedings, pp. 644–658. Springer International Publishing, Cham (2022). https://doi.org/10.1007/978-3-031-06555-2_43
Tarride, S., et al.: Large-scale genealogical information extraction from handwritten Quebec parish records. Int. J. Doc. Anal. Recogn. (IJDAR) 26(3), 255–272 (2023). https://doi.org/10.1007/s10032-023-00427-w
Tu, Y., Guo, Y., Chen, H., Tang, J.: LayoutMask: enhance text-layout interaction in multi-modal pre-training for document understanding. In: Annual Meeting of the Association for Computational Linguistics (2023). https://api.semanticscholar.org/CorpusID:258967524
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing System, vol. 30 (2017)
Wigington, C., Tensmeyer, C., Davis, B., Barrett, W., Price, B., Cohen, S.: Start, follow, read: end-to-end full-page handwriting recognition. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 372–388. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_23
Acknowledgments
The research that originated the results presented in this publication was partly supported by the Agencia Nacional de Investigación e Innovación (ANII) and the France 2030 CollabNext project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bottaioli, N. et al. (2024). Normalized vs Diplomatic Annotation: A Case Study of Automatic Information Extraction from Handwritten Uruguayan Birth Certificates. In: Mouchère, H., Zhu, A. (eds) Document Analysis and Recognition – ICDAR 2024 Workshops. ICDAR 2024. Lecture Notes in Computer Science, vol 14935. Springer, Cham. https://doi.org/10.1007/978-3-031-70645-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-70645-5_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70644-8
Online ISBN: 978-3-031-70645-5
eBook Packages: Computer ScienceComputer Science (R0)