An Innovative Character Recognition for Ancient Book and Archival Materials: A Segmentation and Self-learning Based Approach

Barbuti, Nicola; Caldarola, Tommaso

doi:10.1007/978-3-642-35834-0_26

Nicola Barbuti³ &
Tommaso Caldarola⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 354))

Included in the following conference series:

Italian Research Conference on Digital Libraries

1237 Accesses
4 Citations

Abstract

The paper illustrates the invention of a method and an apparatus able to recognize the text in a set of digital images referring to pages of ancient manuscripts or printed books. It includes the following macro steps: identifying and connecting in sequence regions containing words in a subset of the images; structuring a thesaurus of fonts used in those regions; performing the character recognition of one or more images belonging to the set, associating to this recognition a first value of efficiency. The prototype is patent pending (National Pat. Pend. n. BA2011A000038 – Intern. Pat. Pend. n. I116-PCT).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Feldgajer, O.: Universal Character Section for Multifont (EP 0369761 (A2)), http://worldwide.espacenet.com/publicationDetails/biblio?CC=EP&NR=0369761&KC=&FT=E&locale=en_EP
Bar-Yosef, I., Mokeichev, A., Kedem, K., Dinstein, I.: Adaptive shape prior for recognition and variational segmentation of degraded historical characters. Pattern Recognition 42(12), 3348–3354 (2008)
Article Google Scholar
von Ahn, L., Maurer, B., McMillen, C., Abraham, D., Blum, M.: reCAPTCHA: Human-Based Character Recognition via Web Security Measures. Science 321(5895), 1465–1468 (2008)
Article MathSciNet MATH Google Scholar
Krtolica, R.V., Malitsky, S.: Multifont Optical Character Recognition Using a Box Connectivity Approach (EP0649113A2), http://worldwide.espacenet.com/publicationDetails/biblio?CC=EP&NR=0649113&KC=&FT=E&locale=en_EP
Blondy, A.: Document Digitization (Fr 2768825 A1), http://patent.ipexl.com/FR/FR2768825.html
Nakamura, M.: Method and Apparatus for Isolating an Area Corresponding to a Character or Word (Us 5144682 A), http://www.patentbuddy.com/Patent/5144682
Masami, M.: Technique for Correcting Character-Recognition Errors (Gb 2463577), http://worldwide.espacenet.com/publicationDetails/biblio?CC=GB&NR=2463577&KC=&FT=E&locale=en_EP
http://www.a2ia.com
Eynard, L., Leydier, Y., Emptoz, H.: Particular Words Mining and Article Spotting in Old French Gazettes. In: Proceedings of MLDM Posters, pp. 176–188 (2009)
Google Scholar
Gordo, A., Llorenz, D., Marzal, A., Prat, F., Vilar, J.M.: State: A Multimodal Assisted Text-Transcription System for Ancient Documents. In: DAS 2008. Proceedings of 8th IAPR International Workshop on Document Analysis Systems, pp. 135–142 (2008)
Google Scholar
Le Bourgeois, F., Emptoz, H.: DEBORA: Digital AccEss to BOoks of the RenaissAnce. IJDAR 9(2-4), 193–221 (2007)
Article Google Scholar
Leydier, Y., Le Bourgeois, F., Emptoz, H.: Textual Indexation of Ancient Documents. In: Proceedings of the 2005 ACM Symposium on Document Engineering, pp. 111–117 (2005)
Google Scholar
Leydier, Y., Le Bourgeois, F., Emptoz, H.: Towards an Omnilingual Word Retrieval System for Ancient Manuscripts. Pattern Recognition 42(9), 2089–2105 (2009)
Article MATH Google Scholar
Rawat, S., Kumar, K.S.S., Meshesha, M., Sikdar, I.D., Balasubramanian, A., Jawahar, C.V.: A Semi-automatic Adaptive OCR for Digital Libraries. In: Bunke, H., Spitz, A.L. (eds.) DAS 2006. LNCS, vol. 3872, pp. 13–24. Springer, Heidelberg (2006)
Chapter Google Scholar
Toselli, A.H., Romero, V., Pastor, M., Vidal, E.: Multimodal Interactive Transcription of Text Images. Pattern Recognition 43(5), 1814–1825 (2010)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Classical and Late Antiquity Studies, University of Bari Aldo Moro, Italy
Nicola Barbuti
D.A.BI.MUS. L.L.C., Spin Off of University of Bari Aldo Moro, Italy
Tommaso Caldarola

Authors

Nicola Barbuti
View author publications
You can also search for this author in PubMed Google Scholar
Tommaso Caldarola
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information Engineering, University of Padua, Via Gradenigo, 6/a, 35131, Padua, Italy
Maristella Agosti & Nicola Ferro &
Department of Computer Science, University of Bari, Via E. Orabona, 4, 70126, Bari, Italy
Floriana Esposito & Stefano Ferilli &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barbuti, N., Caldarola, T. (2013). An Innovative Character Recognition for Ancient Book and Archival Materials: A Segmentation and Self-learning Based Approach. In: Agosti, M., Esposito, F., Ferilli, S., Ferro, N. (eds) Digital Libraries and Archives. IRCDL 2012. Communications in Computer and Information Science, vol 354. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35834-0_26

Download citation

DOI: https://doi.org/10.1007/978-3-642-35834-0_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35833-3
Online ISBN: 978-3-642-35834-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics