Writer Identification in Historical Handwritten Documents: A Latin Dataset and a Benchmark

Fagioli, Alessio; Avola, Danilo; Cinque, Luigi; Colombi, Emanuela; Foresti, Gian Luca

doi:10.1007/978-3-031-51026-7_39

Alessio Fagioli¹⁰,
Danilo Avola¹⁰,
Luigi Cinque¹⁰,
Emanuela Colombi¹¹ &
…
Gian Luca Foresti¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14366))

Included in the following conference series:

International Conference on Image Analysis and Processing

183 Accesses

Abstract

Writer identification refers to the process of determining or attributing the authorship of a document to a specific individual through the analysis of various elements such as writing style, linguistic characteristics, and other textual features. This is a relevant task in heterogeneous fields such as cybersecurity, forensics, or linguistics and becomes particularly challenging when considering historical documents. In fact, the latter might present deterioration due to time, often lack signatures, and could be authored by multiple people. Complicating matters further, scribes were trained to mimic handwriting meticulously when copying manuscripts, making author identification of such documents even more difficult. In this context, this paper introduces a curated collection of Latin documents from the Genesis and Gospel of Matthew specifically gathered for the purpose of exploring the writer identification task. In particular, the dataset comprises over 400 pages, written by nine distinct persons. The primary objective is to explore the efficacy of state-of-the-art deep learning architectures in accurately ascribing historical texts to their rightful authors. To this end, this paper conducts extensive experiments, utilizing varying training set sizes and employing diverse pre-processing techniques to assess the performance and capabilities of these renowned models on the writer identification task while also providing the community with a baseline on the introduced collection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Adam, K., Baig, A., Al-Maadeed, S., Bouridane, A., El-Menshawy, S.: KERTAS: dataset for automatic dating of ancient Arabic manuscripts. Int. J. Doc. Anal. Recogn. 21, 283–290 (2018)
Article Google Scholar
Amelin, K., Granichin, O., Kizhaeva, N., Volkovich, Z.: Patterning of writing style evolution by means of dynamic similarity. Pattern Recogn. 77, 45–64 (2018)
Article Google Scholar
Andronache, I., Liritzis, I., Jelinek, H.F.: Fractal algorithms and RGB image processing in scribal and ink identification on an 1819 secret initiation manuscript to the “Philike Hetaereia’’. Sci. Rep. 13(1), 1735 (2023)
Article Google Scholar
Avola, D., Bacciu, A., Cinque, L., Fagioli, A., Marini, M.R., Taiello, R.: Study on transfer learning capabilities for pneumonia classification in chest-x-rays images. Comput. Methods Programs Biomed. 221, 106833 (2022)
Article Google Scholar
Avola, D., Bigdello, M.J., Cinque, L., Fagioli, A., Marini, M.R.: R-signet: reduced space writer-independent feature learning for offline writer-dependent signature verification. Pattern Recogn. Lett. 150, 189–196 (2021)
Article Google Scholar
Avola, D., Cascio, M., Cinque, L., Fagioli, A., Foresti, G.L.: Affective action and interaction recognition by multi-view representation learning from handcrafted low-level skeleton features. Int. J. Neural Syst. 2250040 (2022)
Google Scholar
Avola, D., Cinque, L., Fagioli, A., Filetti, S., Grani, G., Rodolà, E.: Multimodal feature fusion and knowledge-driven learning via experts consult for thyroid nodule classification. IEEE Trans. Circuits Syst. Video Technol. 32(5), 2527–2534 (2021)
Article Google Scholar
Avola, D., Cinque, L., Fagioli, A., Foresti, G.L.: Sire-networks: convolutional neural networks architectural extension for information preservation via skip/residual connections and interlaced auto-encoders. Neural Netw. 153, 386–398 (2022)
Article Google Scholar
Avola, D., et al.: Medicinal boxes recognition on a deep transfer learning augmented reality mobile application. In: Proceedings of the International Conference on Image Analysis and Processing, pp. 489–499 (2022)
Google Scholar
Avola, D., Cinque, L., Fagioli, A., Foresti, G.L., Massaroni, C.: Deep temporal analysis for non-acted body affect recognition. IEEE Trans. Affect. Comput. 13(3), 1366–1377 (2020)
Article Google Scholar
Bulacu, M., Schomaker, L.: Text-independent writer identification and verification using textural and allographic features. IEEE Trans. Pattern Anal. Mach. Intell. 29(4), 701–717 (2007)
Article Google Scholar
Chammas, M., Makhoul, A., Demerjian, J.: Writer identification for historical handwritten documents using a single feature extraction method. In: International Conference on Machine Learning and Applications, pp. 1–6 (2020)
Google Scholar
Chen, Z., Yu, H.X., Wu, A., Zheng, W.S.: Level online writer identification. Int. J. Comput. Vis. 129(5), 1394–1409 (2021)
Article Google Scholar
Christlein, V., Nicolaou, A., Seuret, M., Stutzmann, D., Maier, A.: ICDAR 2019 competition on image retrieval for historical handwritten documents. In: International Conference on Document Analysis and Recognition, pp. 1505–1509 (2019)
Google Scholar
Cilia, N.D., De Stefano, C., Fontanella, F., Marrocco, C., Molinara, M., Di Freca, A.S.: An end-to-end deep learning system for medieval writer identification. Pattern Recogn. Lett. 129, 137–143 (2020)
Article Google Scholar
De Stefano, C., Fontanella, F., Maniaci, M., Scotto di Freca, A.: A method for scribe distinction in medieval manuscripts using page layout features. In: International Conference on Image Analysis and Processing, pp. 393–402 (2011)
Google Scholar
De Stefano, C., Maniaci, M., Fontanella, F., di Freca, A.S.: Reliable writer identification in medieval manuscripts through page layout features: the “Avila’’ bible case. Eng. Appl. Artif. Intell. 72, 99–110 (2018)
Article Google Scholar
Decker, S., Hassard, J., Rowlinson, M.: Rethinking history and memory in organization studies: the case for historiographical reflexivity. Hum. Relat. 74(8), 1123–1155 (2021)
Article Google Scholar
Dolfing, H.J., Bellegarda, J., Chorowski, J., Marxer, R., Laurent, A.: The “ScribbleLens” Dutch historical handwriting corpus. In: International Conference on Frontiers in Handwriting Recognition, pp. 67–72 (2020)
Google Scholar
Foltỳnek, T., Meuschke, N., Gipp, B.: Academic plagiarism detection: a systematic literature review. ACM Comput. Surv. (CSUR) 52(6), 1–42 (2019)
Article Google Scholar
Gan, J., Wang, W., Lu, K.: Compressing the CNN architecture for in-air handwritten Chinese character recognition. Pattern Recogn. Lett. 129, 190–197 (2020)
Article Google Scholar
He, S., Schomaker, L.: Deep adaptive learning for writer identification based on single handwritten word images. Pattern Recogn. 88, 64–74 (2019)
Article Google Scholar
He, S., Schomaker, L.: GR-RNN: global-context residual recurrent neural networks for writer identification. Pattern Recogn. 117, 107975 (2021)
Article Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Google Scholar
Kleber, F., Fiel, S., Diem, M., Sablatnig, R.: CVL-database: an off-line database for writer retrieval, writer identification and word spotting. In: International Conference on Document Analysis and Recognition, pp. 560–564 (2013)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105 (2012)
Google Scholar
Lastilla, L., Ammirati, S., Firmani, D., Komodakis, N., Merialdo, P., Scardapane, S.: Self-supervised learning for medieval handwriting identification: a case study from the Vatican apostolic library. Inf. Process. Manag. 59(3), 102875 (2022)
Article Google Scholar
Maarand, M., Beyer, Y., Kåsen, A., Fosseide, K.T., Kermorvant, C.: A comprehensive comparison of open-source libraries for handwritten text recognition in Norwegian. In: International Workshop on Document Analysis Systems, pp. 399–413 (2022)
Google Scholar
Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5, 39–46 (2002)
Article Google Scholar
Mohammed, H., Marthot-Santaniello, I., Märgner, V.: GRK-Papyri: a dataset of Greek handwriting on papyri for the task of writer identification. In: International Conference on Document Analysis and Recognition, pp. 726–731 (2019)
Google Scholar
Nasir, S., Siddiqi, I., Moetesum, M.: Writer characterization from handwriting on papyri using multi-step feature learning. In: International Conference on Document Analysis and Recognition Workshop, pp. 451–465 (2021)
Google Scholar
Nikolaidou, K., Seuret, M., Mokayed, H., Liwicki, M.: A survey of historical document image datasets. Int. J. Doc. Anal. Recogn. 25(4), 305–338 (2022)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 preprint, pp. 1–14 (2014)
Tan, M., Le, Q.: EfficientNet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114 (2019)
Google Scholar

Download references

Acknowledgements

This work was supported by “A Brain Computer Interface (BCI) based System for Transferring Human Emotions inside Unmanned Aerial Vehicles (UAVs)” Sapienza Research Projects (Protocol number: RM1221816C1CF63B); and Departmental Strategic Plan (DSP) of the University of Udine - Interdepartmental Project on Artificial Intelligence (2020–25); and “A proactive counter-UAV system to protect army tanks and patrols in urban areas (PROACTIVE COUNTER-UAV)” project of the Italian Ministry of Defence (Number 2066/16.12.2019); and the MICS (Made in Italy - Circular and Sustainable) Extended Partnership and received funding from Next-Generation EU (Italian PNRR - M4 C2, Invest 1.3 - D.D. 1551.11-10-2022, PE00000004). CUP MICS B53C22004130001.

Author information

Authors and Affiliations

Department of Computer Science, Sapienza University, Via Salaria 113, 00198, Rome, Italy
Alessio Fagioli, Danilo Avola & Luigi Cinque
Department of Humanist Studies and Cultural Heritage, University of Udine, Vicolo Florio, 2/b, 33100, Udine, Italy
Emanuela Colombi
Department of Mathematics, Computer Science and Physics, University of Udine, Via delle Scienze 206, 33100, Udine, Italy
Gian Luca Foresti

Authors

Alessio Fagioli
View author publications
You can also search for this author in PubMed Google Scholar
Danilo Avola
View author publications
You can also search for this author in PubMed Google Scholar
Luigi Cinque
View author publications
You can also search for this author in PubMed Google Scholar
Emanuela Colombi
View author publications
You can also search for this author in PubMed Google Scholar
Gian Luca Foresti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alessio Fagioli .

Editor information

Editors and Affiliations

University of Udine, Udine, Italy
Gian Luca Foresti
University of Udine, Udine, Italy
Andrea Fusiello
University of York, York, UK
Edwin Hancock

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fagioli, A., Avola, D., Cinque, L., Colombi, E., Foresti, G.L. (2024). Writer Identification in Historical Handwritten Documents: A Latin Dataset and a Benchmark. In: Foresti, G.L., Fusiello, A., Hancock, E. (eds) Image Analysis and Processing - ICIAP 2023 Workshops. ICIAP 2023. Lecture Notes in Computer Science, vol 14366. Springer, Cham. https://doi.org/10.1007/978-3-031-51026-7_39

Download citation

DOI: https://doi.org/10.1007/978-3-031-51026-7_39
Published: 21 January 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-51025-0
Online ISBN: 978-3-031-51026-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Writer Identification in Historical Handwritten Documents: A Latin Dataset and a Benchmark