Skip to main content

A Database for Arabic Printed Character Recognition

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5112))

Abstract

Electronic Document Management (EDM) technology is being widely adopted as it makes for the efficient routing and retrieval of documents. Optical Character Recognition (OCR) is an important front end for such technology. Excellent OCR now exists for Latin based languages, but there are few systems that read Arabic, which limits the penetration of EDM into Arabic-speaking countries. In developing an OCR system for Arabic it is necessary to create a database of Arabic words. Such a database has many uses as well as in training and testing a recognition system. This paper provides a comprehensive study and analysis of Arabic words and explains how such a database was constructed. Unlike earlier studies, this paper describes a database developed using a large number of collected Arabic words (6 million). It also considers connected segments or Pieces of Arabic Words (PAWs) as well as Naked Pieces of Arabic Word (NPAWs); PAWS without diacritics. Background information concerning the Arabic language is also presented.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   139.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. INDEXES : United Nations Documentation, the Department of Public Information (DPI), Dag Hammarskjöld Library (DHL) (2007), http://www.un.org/Depts/dhl/resguide/itp.htm

  2. INTERNET WORLD USERS BY LANGUAGE, Top Ten Languages Used in the Web,Internet World Stats, Usage and Population Statistics (2007)

    Google Scholar 

  3. T. U. o. C. UCLA, Los Angeles, “Arabic”, International Institute, Center for World Languages, Language Materials Project (2006)

    Google Scholar 

  4. David Graff, K.C., Kong, J., Maeda, K.: Arabic Gigaword Second Edition. Philadelphia: Linguistic Data Consortium, University of Pennsylvania (2006)

    Google Scholar 

  5. Schlosser, S.: ERIM Arabic Document Database, Environmental Research Institute of Michigan

    Google Scholar 

  6. Ramzi Abbes, J.D., Hassoun, M.: The Architecture of a Standard Arabic Lexical Database. Some Figures, Ratios and Categories from the DIINAR.1 Source Program. In: Workshop of Computational Approaches to Arabic Script-based Languages, Geneva, Switzerland (2004)

    Google Scholar 

  7. Beesley, K.R.: Arabic Finite-State Morphological Analysis and Generation. In: COLING, Copenhagen (1996)

    Google Scholar 

  8. Al-Ma’adeed, S., Elliman, D., Higgins, C.A.: A data base for Arabic handwritten text recognition research. In: Eighth International Workshop on Frontiers in Handwriting Recognition (2002)

    Google Scholar 

  9. Pechwitz, M., Maddouri, S.S., Märgner, V., Ellouze, N., Amiri, H.: IFN/ENIT - DATABASE OF HANDWRITTEN ARABIC WORDS. In: 7th Colloque International Francophone sur l’Ecrit et le Document, CIFED 2002, Tunisia (2002)

    Google Scholar 

  10. Unicode, Arabic, Range: 0600-06FF, The Unicode Standard, Version 5 (2007), http://www.unicode.org/charts/PDF/U0600.pdf

  11. The Unicode Consortium. The Unicode Standard, Version 4.1.0, Boston, MA, pp. 195–206. Addison-Wesley, Reading (2003)

    Google Scholar 

  12. Unicode, ”Arabic Shaping ” in Unicode 5.0.0 (1991-2006), http://unicode.org/Public/UNIDATA/ArabicShaping.txt

  13. Liana, V.G., Lorigo, M.: Offline Arabic Handwriting Recognition: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 712–724 (2006)

    Article  Google Scholar 

  14. Fahmy, M.M.M., Ali, S.A.: Automatic Recognition Of Handwritten Arabic Characters Using Their Geometrical Features. Journal of Studies in Informatics and Control with Emphasis on Useful Applications of Advanced Technology 10 (2001)

    Google Scholar 

  15. Amin, A.: Off line Arabic character recognition - a survey. In: Fourth International Conference on Document Analysis and Recognition, Germany (1997)

    Google Scholar 

  16. Harty, R., Ghaddar, C.: Arabic Text Recognition. The International Arab Journal of Information Technology 1, 156–163 (2004)

    Google Scholar 

  17. W. contributors, Code page. From Wikipedia, the free encyclopedia, Wikipedia, The Free Encyclopedia (2006)

    Google Scholar 

  18. arabo.com, Arabo Arab Search Engine & Dictionary (2005)

    Google Scholar 

  19. S. P. Ltd, WebZIP 7.0, 7.0 ed (2006)

    Google Scholar 

  20. Larkey, L.S., Ballesteros, L., Connell, M.E.: Improving stemming for Arabic information retrieval: Light stemming and co-occurrence analysis. In: 25th International Conference on Research and Development in Information Retrieval (SIGIR) (2002)

    Google Scholar 

  21. Buckwalter, T.: ARABIC WORD FREQUENCY COUNTS (2002), http://www.qamus.org/transliteration.htm

  22. Mashali, S., Mahmoud, A., Elnemr, H., Ahmed, G., Osama, S.: Arabic OCR Database Development. In: Fifth Conference on Language Engineering, Egypt (2005)

    Google Scholar 

  23. Hyams, D.G.: CurveExpert 1.3, A comprehensive curve fitting system for Windows, 1.3 ed (2005)

    Google Scholar 

  24. Gu, B., Hu, F., Liu, H.: Modelling Classification Performance for Large Data Sets. In: Wang, X.S., Yu, G., Lu, H. (eds.) WAIM 2001. LNCS, vol. 2118, Springer, Heidelberg (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Aurélio Campilho Mohamed Kamel

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

AbdelRaouf, A., Higgins, C.A., Khalil, M. (2008). A Database for Arabic Printed Character Recognition. In: Campilho, A., Kamel, M. (eds) Image Analysis and Recognition. ICIAR 2008. Lecture Notes in Computer Science, vol 5112. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69812-8_56

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69812-8_56

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69811-1

  • Online ISBN: 978-3-540-69812-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics