research-article

An Evaluation of Handwritten Text Recognition Methods for Historical Ciphered Manuscripts

Authors:
Mohamed Ali Souibgui

Computer Vision Center, Spain

Computer Vision Center, Spain

0000-0003-0100-9392
View Profile

,
Pau Torras

Computer Vision Center, Spain and Computer Science Department, Universitat Autònoma de Barcelona, Spain

Computer Vision Center, Spain and Computer Science Department, Universitat Autònoma de Barcelona, Spain

0000-0003-0327-9046
View Profile

,
Jialuo Chen

Computer Vision Center, Spain

Computer Vision Center, Spain

0000-0002-7808-6567
View Profile

,
Alicia Fornés

Computer Vision Center, Spain and Computer Science Department, Universitat Autònoma de Barcelona, Spain

Computer Vision Center, Spain and Computer Science Department, Universitat Autònoma de Barcelona, Spain

0000-0002-9692-5336
View Profile

HIP '23: Proceedings of the 7th International Workshop on Historical Document Imaging and ProcessingAugust 2023Pages 7–12https://doi.org/10.1145/3604951.3605509

Published:25 August 2023Publication History

HIP '23: Proceedings of the 7th International Workshop on Historical Document Imaging and Processing

Pages 7–12

ABSTRACT

This paper investigates the effectiveness of different deep learning HTR families, including LSTM, Seq2Seq, and transformer-based approaches with self-supervised pretraining, in recognizing ciphered manuscripts from different historical periods and cultures. The goal is to identify the most suitable method or training techniques for recognizing ciphered manuscripts and to provide insights into the challenges and opportunities in this field of research. We evaluate the performance of these models on several datasets of ciphered manuscripts and discuss their results. This study contributes to the development of more accurate and efficient methods for recognizing historical manuscripts for the preservation and dissemination of our cultural heritage.

References

Eugen Antal and Pavol Marák. 2022. Automated transcription of historical encrypted manuscripts. Tatra Mountains Mathematical Publications 82, 2 (2022), 65–86.Google ScholarCross Ref
Arnau Baró, Jialuo Chen, Alicia Fornés, and Beáta Megyesi. 2019. Towards a generic unsupervised method for transcription of encoded manuscripts. In Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage. 73–78.Google ScholarDigital Library
Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. 2021. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision. 9650–9660.Google ScholarCross Ref
Arthur Flor de Sousa Neto, Byron Leite Dantas Bezerra, Alejandro Héctor Toselli, and Estanislau Baptista Lima. 2020. HTR-Flor: A deep learning system for offline handwritten text recognition. In 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI). IEEE, 54–61.Google ScholarCross Ref
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).Google Scholar
Alicia Fornés, Beáta Megyesi, and Joan Mas. 2017. Transcription of encoded manuscripts with image processing techniques. In Digital Humanities Montreal, Canada, August 8-11, 2017.Google Scholar
Alex Graves and Jürgen Schmidhuber. 2008. Offline handwriting recognition with multidimensional recurrent neural networks. Advances in neural information processing systems 21 (2008).Google Scholar
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. 2022. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16000–16009.Google ScholarCross Ref
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. 2961–2969.Google ScholarCross Ref
Mihály Héder and Beáta Megyesi. 2022. The DECODE Database of Historical Ciphers and Keys: Version 2. International Conference on Historical Cryptology (June 2022), 111–114. https://doi.org/10.3384/ecp188397Google ScholarCross Ref
Lei Kang, Pau Riba, Marçal Rusiñol, Alicia Fornés, and Mauricio Villegas. 2022. Pay attention to what you read: non-recurrent handwritten text-line recognition. Pattern Recognition 129 (2022), 108766.Google ScholarDigital Library
Lei Kang, J Ignacio Toledo, Pau Riba, Mauricio Villegas, Alicia Fornés, and Marçal Rusinol. 2019. Convolve, attend and spell: An attention-based sequence-to-sequence model for handwritten word recognition. In Pattern Recognition: 40th German Conference, GCPR 2018, Stuttgart, Germany, October 9-12, 2018, Proceedings 40. Springer, 459–472.Google ScholarCross Ref
Minghao Li, Tengchao Lv, Jingye Chen, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, and Furu Wei. 2021. Trocr: Transformer-based optical character recognition with pre-trained models. arXiv preprint arXiv:2109.10282 (2021).Google Scholar
Hao Liu, Bin Wang, Zhimin Bao, Mobai Xue, Sheng Kang, Deqiang Jiang, Yinsong Liu, and Bo Ren. 2022. Perceiving stroke-semantic context: Hierarchical contrastive learning for robust scene text recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 1702–1710.Google ScholarCross Ref
Johannes Michael, Roger Labahn, Tobias Grüning, and Jochen Zöllner. 2019. Evaluating sequence-to-sequence models for handwritten text recognition. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 1286–1293.Google ScholarCross Ref
Baoguang Shi, Xiang Bai, and Cong Yao. 2016. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence 39, 11 (2016), 2298–2304.Google ScholarDigital Library
Ioannis Siglidis, Nicolas Gonthier, Julien Gaubil, Tom Monnier, and Mathieu Aubry. 2023. The Learnable Typewriter: A Generative Approach to Text Line Analysis. arXiv preprint arXiv:2302.01660 (2023).Google Scholar
Mohamed Ali Souibgui, Asma Bensalah, Jialuo Chen, Alicia Fornés, and Michelle Waldispühl. 2023. A User Perspective on HTR methods for the Automatic Transcription of Rare Scripts: The Case of Codex Runicus. ACM Journal on Computing and Cultural Heritage 15, 4 (2023), 1–18.Google Scholar
Mohamed Ali Souibgui, Sanket Biswas, Andres Mafla, Ali Furkan Biten, Alicia Fornés, Yousri Kessentini, Josep Lladós, Lluis Gomez, and Dimosthenis Karatzas. 2023. Text-DIAE: A Self-Supervised Degradation Invariant Autoencoder for Text Recognition and Document Enhancement. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 2330–2338.Google ScholarDigital Library
Mohamed Ali Souibgui, Ali Furkan Biten, Sounak Dey, Alicia Fornés, Yousri Kessentini, Lluis Gomez, Dimosthenis Karatzas, and Josep Lladós. 2022. One-shot compositional data generation for low resource handwritten text recognition. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 935–943.Google ScholarCross Ref
Mohamed Ali Souibgui, Alicia Fornés, Yousri Kessentini, and Beáta Megyesi. 2022. Few shots are all you need: A progressive learning approach for low resource handwritten text recognition. Pattern Recognition Letters 160 (2022), 43–49.Google ScholarDigital Library
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. Advances in neural information processing systems 27 (2014).Google Scholar
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).Google Scholar
Paul Voigtlaender, Patrick Doetsch, and Hermann Ney. 2016. Handwriting recognition with large multidimensional long short-term memory recurrent neural networks. In 2016 15th international conference on frontiers in handwriting recognition (ICFHR). IEEE, 228–233.Google ScholarCross Ref
Xusen Yin, Nada Aldarrab, Beáta Megyesi, and Kevin Knight. 2019. Decipherment of historical manuscript images. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 78–85.Google ScholarCross Ref

Index Terms

An Evaluation of Handwritten Text Recognition Methods for Historical Ciphered Manuscripts
1. Applied computing
  1. Document management and text processing
    1. Document capture
      1. Optical character recognition
2. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Learning settings
      1. Semi-supervised learning settings

Recommendations

An online overlaid handwritten Japanese text recognition system for small tablet

The paper presents a recognition system of online overlaid handwritten Japanese text patterns on a smart phone or baby-face tablet. The proposed system oversegments a sequence of strokes into primitive segments at candidate off-strokes between strokes ...
Read More
iiit-indic-hw-words: A Dataset for Indic Handwritten Text Recognition
Document Analysis and Recognition – ICDAR 2021
Abstract
Handwritten text recognition (htr) for Indian languages is not yet a well-studied problem. This is primarily due to the unavailability of large annotated datasets in the associated scripts. Existing datasets are small in size. They also use small ...
Read More
Handwritten text recognition for historical documents in the transcriptorium project
DATeCH '14: Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage

Transcription of historical handwritten documents is a crucial problem for making easier the access to these documents to the general public. Currently, huge amount of historical handwritten documents are being made available by on-line portals ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

HIP '23: Proceedings of the 7th International Workshop on Historical Document Imaging and Processing
August 2023
117 pages
ISBN:9798400708411
DOI:10.1145/3604951

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 August 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Handwritten Text Recognition
Historical Ciphered Manuscripts
LSTM
Seq2Seq
Transformer networks
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate52of90submissions,58%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 65
  Total Downloads
- Downloads (Last 12 months)65
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

An Evaluation of Handwritten Text Recognition Methods for Historical Ciphered Manuscripts

HIP '23: Proceedings of the 7th International Workshop on Historical Document Imaging and Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

An online overlaid handwritten Japanese text recognition system for small tablet

iiit-indic-hw-words: A Dataset for Indic Handwritten Text Recognition

Handwritten text recognition for historical documents in the transcriptorium project

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

An Evaluation of Handwritten Text Recognition Methods for Historical Ciphered Manuscripts

HIP '23: Proceedings of the 7th International Workshop on Historical Document Imaging and Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

An online overlaid handwritten Japanese text recognition system for small tablet

iiit-indic-hw-words: A Dataset for Indic Handwritten Text Recognition

Handwritten text recognition for historical documents in the transcriptorium project

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media