Skip to main content

An Innovative Character Recognition for Ancient Book and Archival Materials: A Segmentation and Self-learning Based Approach

  • Conference paper
Digital Libraries and Archives (IRCDL 2012)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 354))

Included in the following conference series:

Abstract

The paper illustrates the invention of a method and an apparatus able to recognize the text in a set of digital images referring to pages of ancient manuscripts or printed books. It includes the following macro steps: identifying and connecting in sequence regions containing words in a subset of the images; structuring a thesaurus of fonts used in those regions; performing the character recognition of one or more images belonging to the set, associating to this recognition a first value of efficiency. The prototype is patent pending (National Pat. Pend. n. BA2011A000038 – Intern. Pat. Pend. n. I116-PCT).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Feldgajer, O.: Universal Character Section for Multifont (EP 0369761 (A2)), http://worldwide.espacenet.com/publicationDetails/biblio?CC=EP&NR=0369761&KC=&FT=E&locale=en_EP

  2. Bar-Yosef, I., Mokeichev, A., Kedem, K., Dinstein, I.: Adaptive shape prior for recognition and variational segmentation of degraded historical characters. Pattern Recognition 42(12), 3348–3354 (2008)

    Article  Google Scholar 

  3. von Ahn, L., Maurer, B., McMillen, C., Abraham, D., Blum, M.: reCAPTCHA: Human-Based Character Recognition via Web Security Measures. Science 321(5895), 1465–1468 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  4. Krtolica, R.V., Malitsky, S.: Multifont Optical Character Recognition Using a Box Connectivity Approach (EP0649113A2), http://worldwide.espacenet.com/publicationDetails/biblio?CC=EP&NR=0649113&KC=&FT=E&locale=en_EP

  5. Blondy, A.: Document Digitization (Fr 2768825 A1), http://patent.ipexl.com/FR/FR2768825.html

  6. Nakamura, M.: Method and Apparatus for Isolating an Area Corresponding to a Character or Word (Us 5144682 A), http://www.patentbuddy.com/Patent/5144682

  7. Masami, M.: Technique for Correcting Character-Recognition Errors (Gb 2463577), http://worldwide.espacenet.com/publicationDetails/biblio?CC=GB&NR=2463577&KC=&FT=E&locale=en_EP

  8. http://www.a2ia.com

  9. Eynard, L., Leydier, Y., Emptoz, H.: Particular Words Mining and Article Spotting in Old French Gazettes. In: Proceedings of MLDM Posters, pp. 176–188 (2009)

    Google Scholar 

  10. Gordo, A., Llorenz, D., Marzal, A., Prat, F., Vilar, J.M.: State: A Multimodal Assisted Text-Transcription System for Ancient Documents. In: DAS 2008. Proceedings of 8th IAPR International Workshop on Document Analysis Systems, pp. 135–142 (2008)

    Google Scholar 

  11. Le Bourgeois, F., Emptoz, H.: DEBORA: Digital AccEss to BOoks of the RenaissAnce. IJDAR 9(2-4), 193–221 (2007)

    Article  Google Scholar 

  12. Leydier, Y., Le Bourgeois, F., Emptoz, H.: Textual Indexation of Ancient Documents. In: Proceedings of the 2005 ACM Symposium on Document Engineering, pp. 111–117 (2005)

    Google Scholar 

  13. Leydier, Y., Le Bourgeois, F., Emptoz, H.: Towards an Omnilingual Word Retrieval System for Ancient Manuscripts. Pattern Recognition 42(9), 2089–2105 (2009)

    Article  MATH  Google Scholar 

  14. Rawat, S., Kumar, K.S.S., Meshesha, M., Sikdar, I.D., Balasubramanian, A., Jawahar, C.V.: A Semi-automatic Adaptive OCR for Digital Libraries. In: Bunke, H., Spitz, A.L. (eds.) DAS 2006. LNCS, vol. 3872, pp. 13–24. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  15. Toselli, A.H., Romero, V., Pastor, M., Vidal, E.: Multimodal Interactive Transcription of Text Images. Pattern Recognition 43(5), 1814–1825 (2010)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Barbuti, N., Caldarola, T. (2013). An Innovative Character Recognition for Ancient Book and Archival Materials: A Segmentation and Self-learning Based Approach. In: Agosti, M., Esposito, F., Ferilli, S., Ferro, N. (eds) Digital Libraries and Archives. IRCDL 2012. Communications in Computer and Information Science, vol 354. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35834-0_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35834-0_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35833-3

  • Online ISBN: 978-3-642-35834-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics