Skip to main content
Log in

Comprehensive synthetic Arabic database for on/off-line script recognition research

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

Developing and maintaining large comprehensive databases for script recognition that include different shapes for each word in the lexicon is expensive and difficult. In this paper, we present an efficient system that automatically generates prototypes for each word in a lexicon using multiple appearances of each letter. Large sets of different shapes are created for each letter in each position. These sets are then used to generate valid shapes for each word-part. The number of valid permutations for each word is large and prohibits practical training and searching for various tasks, such as script recognition and word spotting. We apply dimensionality reduction and clustering techniques to maintain compact representation of these databases, without affecting their ability to represent the wide variety of handwriting styles. In addition, a database for off-line script recognition is generated from the on-line strokes using a standard dilation technique, while making special efforts to resemble pen’s path. We also examined and used several layout techniques for producing words from the generated word-parts. Our experimental results show that the proposed system can automatically generate large databases, whose quality is at least as good as the manually generated ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. ADAB: Arabic DAta Base, for on-line recognition of the cursive Arabic handwritten word

  2. Al Ohali Y., Cheriet M., Suen C.: Databases for recognition of handwritten arabic cheques. Pattern Recogn. 36(1), 111–121 (2003)

    Article  MATH  Google Scholar 

  3. Al-Yousefi H., Udpa S.: Recognition of arabic characters. IEEE Trans. Pattern Anal. Mach. Intell. 14(8), 853–857 (1992)

    Article  Google Scholar 

  4. Alma’adeed, S.: Recognition of off-line handwritten arabic words using neural network. In: GMAI ’06: Proceedings of the conference on Geometric Modeling and Imaging, pp. 141–144. IEEE Computer Society, Washington, DC, USA (2006)

  5. Alsallakh, B., Safadi, H.: Arapen: an arabic online handwriting recognition system. In: Information and Communication Technologies, 2006 (ICTTA ’06). 2nd, vol. 1, pp. 1844–1849 (April 2006)

  6. Alshebeili S.A., Nabawi A.A.F., Mahmoud S.A.: Arabic character-recognition using 1-d slices of the character spectrum. Signal Process. 56(1), 59–75 (1997)

    Article  MATH  Google Scholar 

  7. Amin A.: Off-line arabic character recognition: the state of the art. Pattern Recogn. 31(5), 517–530 (1998)

    Article  MathSciNet  Google Scholar 

  8. Amin A., Mari J.: Machine recognition and correction of printed arabic text. IEEE Trans. Syst. Man Cybern. 19(5), 1300–1306 (1989)

    Article  Google Scholar 

  9. Ataer, E., Duygulu, P.: Matching ottoman words: an image retrieval approach to historical document indexing. In: CIVR ’07: Proceedings of the 6th ACM international conference on Image and video retrieval, pp. 341–347. ACM, New York, NY, USA (2007)

  10. Ball, G., Srihari, S., Srinivasan, H.: Segmentation-free and segmentation-dependent approaches to arabic word spotting. In: Proceedings of International Workshop on Frontiers in Handwriting Recognition (IWFHR-10), pp. 53–58. La Baule, France (October 2006)

  11. Biadsy, F., El-Sana, J., Habash, N.: Online Arabic handwriting recognition using hidden Markov models. In: Proceedings of the 10th International Workshop on Frontiers of Handwriting and Recognition, pp. 3278–3286 (2006)

  12. Biadsy, F., Saabni, R., El-Sana, J.: Segmentation-free online arabic handwriting recognition. Int. J. Pattern Recogn. (page to appear) (2011)

  13. Cheng, W., Lopresti, D.: Parameter calibration for synthesizing realistic-looking variability in offline handwriting. In: Document Recognition and Retrieval XVIII IS&T/SPIE International Symposium on Electronic Imaging, p. 157. IEEE Computer Society, San Francisco, CA (2011)

  14. El-Emami S., Usher M.: On-line recognition of handwritten arabic characters. IEEE Trans. Pattern Anal. Mach. Intell. 12(7), 704–710 (1990)

    Article  Google Scholar 

  15. El Abed, H., Kherallah, M., Margner, V., Alimi, A.M.: Arabic online handwriting recognition competition. In: 10th International Conference on Document Analysis and Recognition (ICDAR), pp. 1383–1387 (2009)

  16. El-sheikh T., Guindi R.: Automatic recognition of isolated arabic characters. Signal Process. 14(2), 177–184 (1988)

    Article  Google Scholar 

  17. Garris, M.: Design and collection of a handwriting sample image database

  18. Gatos, B., Konidaris, T., Ntzios, K., Pratikakis, I., Perantonis, S.J.: A segmentation-free approach for keyword search in historical typewritten documents. In: Proceedings of Eighth International Conference on Document Analysis and Recognition, 2005, pp. 54–58, vol. 1. 29 August–1 September (2005)

  19. Gillies, A., Erl, E., Trenkle, J., Schlosser, S.: Arabic text recognition system. In: Proceedings of the Symposium on Document Image Understanding Technology (1999)

  20. http://www.cedar.buffalo.edu/Databases/

  21. Kharma, N., Ahmed, M., Ward, R.: A new comprehensive database of hand-written arabic words, numbers and signatures used for ocr testing. In: IEEE Canadian Conference on Electrical and Computer Engineering, pp. 766–768 (1999)

  22. Koerich A.L., Sabourin R., Suen C.Y.: Large vocabulary off-line handwriting recognition: a survey. Pattern Anal. Appl. 6(2), 97–121 (2003)

    Article  MathSciNet  Google Scholar 

  23. Maddouri, S., Amiri, H.: Combination of local and global vision modelling for arabic handwritten words recognition. In: Proceedings of Eighth International Workshop on Frontiers in Handwriting Recognition, 2002, pp. 128–135 (2002)

  24. Mahmoud S.A.: Arabic character recognition using fourier descriptors and character contour encoding. Pattern Recogn. 27(6), 815–824 (1994)

    Article  Google Scholar 

  25. Margner, V., Pechwitz, M.: Synthetic data for arabic ocr system development. In: Sixth International Conference on Document Analysis and Recognition (ICDAR’01), pp. 1159–1163 (2001)

  26. Marti U., Bunke H.: The iam-database: an english sentence database for off-line handwriting recognition. Int. J. Document Anal. Recogn. 5, 39–46 (2002)

    Article  MATH  Google Scholar 

  27. Mezghani N., Mitiche A., Cheriet M.: Bayes classification of online arabic characters by gibbs modeling of class conditional densities. IEEE Trans. Pattern Anal. Mach. Intell. 30(7), 1121–1131 (2008)

    Article  Google Scholar 

  28. Mozzaffari, S., Faez, K., Faradji, F., Ziaratban, M., Golzan, M.: A comprehnsive isolated farsi/aarabic character database for handwritten ocr research. In: Proceedings of the 10th International Workshop on Frontiers in Handwriting Recognition, France, pp. 385–389 (October 2006)

  29. Pechwitz, M., Maddouri, S.S., Margner, V., Ellouze, N., Amiri, H.: Ifn/enit—database of handwritten arabic words. In: Proceedings of CIFED 2002, pp. 129–136 (2002)

  30. Plamondon, R., Guerfali, W.: Why handwriting segmentation can be misleading? In: Proceedings of International Conference on Pattern Recognition, pp. 369–400. Vienna, Austria (1996)

  31. Plamondon R., Srihari S.N.: On-line and off-line handwriting recognition: a comprehensive survey. IEEE Trans. Pattern Anal. Mach. Intell. 22, 63–84 (2000)

    Article  Google Scholar 

  32. Saabni R., El-Sana J.: Justifying holistic approach for arabic script recognition. Technical report, Ben Gurion University of the negev, Israel (2008)

    Google Scholar 

  33. Saabni, R., El-sana, J.: Hierarchical on-line arabic handwriting recognition. In: 10th International Conference on Document Analysis and Recognition (ICDAR), pp. 867–871. Barcelona, Spain (2009)

  34. Solimanpour, F., Sadri, J., Suen, C.Y.: Standard databases for recognition of handwritten digits, numerical strings, legal amounts, letters and dates in farsi language. In: Proceedings of the 10th IntlWorkshop on Frontiers in Handwriting Recognition (IWFHR), pp. 3–7, France (October 2006)

  35. Souici, S.T., Sellami, L.M.: Off-line handwritten arabic character segmentation algorithm: Acsa. In: Eighth International Workshop on Frontiers in Handwriting Recognition, pp. 452–457 (2002)

  36. The Unipen Website: http://hwr.nici.kun.nl/unipen/unipen-history.html

  37. Varga, T., Bunke, H.: Comparing natural and synthetic training data for on-line cursive handwriting recognition. In: 9th International Workshop on Frontiers in Handwriting Recognition (IWFHR’04), pp. 221–225 (2004)

  38. Varga, T., Bunke, H.: Generation of synthetic training data for an hmm-based handwriting recognition system. In: ICDAR ’03: Proceedings of the Seventh International Conference on Document Analysis and Recognition, pp. 618–622, IEEE Computer Society, Washington, DC, USA (2003)

  39. Varga, T., Kilchhofer, D., Bunke, H.: Template-based synthetic handwriting generation for the training of recognition systems. In: Proceedings of the 12th Conference of the International Graphonomics Society, pp. 206–211 (2005)

  40. Viard-Gaudin, C., Lallican, P.M., Binter, P., Knerr, S.: The ireste on/off (ironoff) dual handwriting database. In: Proceedings of the Fifth International Conference on Document Analysis and Recognition, ICDAR ’99, pp. 455–458. IEEE Computer Society, Washington, DC, USA (1999)

  41. Wang, J., Wu, C., Xu, Y.-Q., Shum, H.-Y., Ji, L.: Learning-based cursive handwriting synthesis. In: IWFHR ’02: Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition (IWFHR’02), pp. 157–162. IEEE Computer Society, Washington, DC, USA (2002)

  42. Zagoris, K., Papamarkos, N., Chamzas, C.: Web document image retrieval system based on word spotting. In: IEEE International Conference on Image Processing, 2006, pp. 477–480, 8–11 October 2006

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raid M. Saabni.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saabni, R.M., El-Sana, J.A. Comprehensive synthetic Arabic database for on/off-line script recognition research. IJDAR 16, 285–294 (2013). https://doi.org/10.1007/s10032-012-0189-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-012-0189-5

Keywords

Navigation