Abstract
This paper concerns with the recognition of offline Farsi/Arabic handwriting. The overall appearance of each subword in Farsi/Arabic script is described by its shape contour that provides us with a rich set of discriminative characteristics. Our approach is writer-dependent; that is, the system is trained to recognize the subwords written by a particular writer. A fast contour alignment is the central part of the proposed algorithm, where the alignment is performed based on a handful of feature points. An efficient lexicon reduction algorithm based on characteristic loci feature, which works directly on subwords’ binary images, is proposed as well. Fast and precise alignment along with efficient lexicon reduction and appropriate similarity matching yields a high recognition rate while kept the speed high. Our experiment on IBN SINA database shows that the correct classification rate could be as high as 91.08 %. This figure is achieved merely by subword shape matching, without dots and diacritics, and without any statistical language model.
Similar content being viewed by others
References
AbdulKader, A.: A two-tier Arabic offline handwriting recognition based on conditional joining rules. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), LNCS 4768, Springer (2008)
Abdulla, S., Al-Nassiri, A., Salam, R.A.: Off-line Arabic handwritten word segmentation using rotational invariant segments features. Int. Arab J. Inf. Technol. 5(2), 200–208 (Apr 2008)
Abed, H., Margner, V.: Arabic text recognition systems—state of the art and future trends. In: Proceedings of International Conference on Innovations in Information Technology, pp. 692–696, Al Ain (2008)
Aburas, A.A., Rehiel, S.M.A.: Off-line omni-style handwriting Arabic character recognition system based on wavelet compression. J. Arab Res. Inst. Sci. Eng. (ARISER) 3(4), 123–135 (2007)
Al Hamad, H.A., Abu Zitar, R.: Development of an efficient neural-based segmentation technique for Arabic handwriting recognition. Pattern Recognit. 43(8), 2773–2798 (2010)
Al-Hajj Mohamad, R., Likforman-Sulem, L., Mokbel, C.: Combining slanted-frame classifiers for improved HMM-based Arabic handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(7), 1165–1177 (2009)
Al Khateeb, J.H., Jianmin, J., Jinchang, R., Stan, S.I.: Component-based segmentation of words from handwritten Arabic text. Int. J. Comput. Syst. Sci. Eng. 5(1), 344–348 (2009)
Alma’adeed, S., Higgens, C., Elliman, D.: Off-line recognition of handwritten Arabic words using multiple hidden Markov models. Knowl. Based Syst. 17, 75–79 (2004)
Amrouch, M., Elyassa, M., Rachidi, A., Mammass, D.: Off-line arabic handwritten characters recognition based on a hidden markov models. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), LNCS 5099, pp. 447–454 (2008)
Azmi, R.: Recognition of omnifont printed Farsi text. PhD Thesis, Tarbiat Modarres University, Tehran, Iran (1999) (in Farsi)
Ball, G.R., Srihari, S.N.: Prototype integration in off-line handwriting recognition adaptation. In: Proceedings of International Conference on Frontiers in Handwriting Recognition, pp. 529–534, Montreal, Canada (2008)
Ball, G.R., Srihari, S.N.: Writer adaptation in off-line Arabic handwriting recognition. In: Proceedings of SPIE, 6815 (2008)
Basu, S., Das, N., Sarkar, R., Kundu, M., Nasipuri, M., Basu D.: Recognition of numeric postal codes from multi-script postal address blocks. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), LNCS 5909, pp. 381–386 (2009)
Benouareth, A., Ennaji, A., Sellami, M.: Semi-continuous HMMs with explicit state duration for unconstrained arabic word modeling and recognition. Pattern Recognit. Lett. 29(12), 1742–1752 (2008)
Bergroth, L., Hakonen, H., Raita, T.: A survey of longest common subsequence algorithms. In: Proceedings of the 7th Symposium on String Processing and, Information Retrieval (SPIRE), pp. 39–48 (2000)
Cheikh, I.B., Kacem, A.: Neural network for the recognition of handwritten Tunisian city names. In: Proceedings of the 9th International Conference on Document Analysis and Recognition (ICDAR’07), vol. 2, pp. 1108–1112, Curitiba (2007)
Chen, J., Cao, H., Prasad, R., Bhardwaj, A., Natarajan, P.: Gabor features for offline arabic handwriting recognition. In: Proceedings of IAPR Workshop on Document Analysis Systems (DAS’10), pp. 53–58, Boston, MA (2010)
Cheriet, M., Kharma, N., Liu, C.L., Suen, C.Y.: Character Recognition Systems: A Guide for Students and Practioners. Wiley, London (2007)
Chherawala, Y., Cheriet, M.: W-TSV: weighted topological signature vector for lexicon reduction in handwritten Arabic documents. Pattern Recognit. 45, 3277–3287 (2012)
Dehghan, M., Faez, K., Ahmadi, M., Shridhar, M.: Handwritten Farsi (Arabic) word recognition: a holistic approach using discrete HMM. Pattern Recognit. 34(5), 1057–1065 (2001)
Dreuw, P., Rybach, D., Gollan, C., Ney, H.: Writer adaptive training and writing variant model refinement for offline Arabic handwriting recognition. In: Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR’09), pp. 21–25, Barcelona (2009)
Ebrahimi, A., Kabir, E.: A pictorial dictionary for printed Farsi subwords. Pattern Recognit. Lett. 29, 656–663 (2008)
Ehsani, M., Babaee, M.: Recognition of Farsi handwritten cheque values using neural networks. In: Proceedings of the 3rd International IEEE Conference Intelligent Systems, pp. 656–660 (2006)
Eldin, A.S., Nouh, A.S.: Arabic character recognition: a survey. In: Proceedings of SPIE Optical Pattern Recognition, vol. 3386, pp. 331–340, Orlando, Florida, USA (1998)
Farah, N., Souici, L., Farah, L., Sellami, M.: Arabic words recognition with classifiers combination: an application to literal amounts. In: Proceedings of Artificial Intelligence: Methodology, Systems, and Applications, pp. 331–340, Varna, Bulgaria (2004)
Farah, N., Souici, L., Sellami, M.: Classifiers combination and syntax analysis for arabic literal amount recognition. Eng. Appl. Artif. Intell. 19(1), 29–39 (2006)
Farrahi Moghaddam, R., Cheriet, M., Adankon, M., Filonenko, K., Wisnovsky, R.: IBN SINA: a database for research on processing and understanding of Arabic manuscripts images. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems (DAS ’10), pp. 11–18. ACM (2010)
Farrahi Moghaddam, R., Cheriet, M., Milo, T., Wisnovsky, R.: A prototype system for handwritten sub-word recognition: toward Arabic-manuscript transliteration CoRR, abs/1111.3281 (2011)
Farrahi Moghaddam, R., Cheriet, M.: A multi-scale framework for adaptive binarization of degraded document images. Pattern Recognit. 43, 2186–2198 (2010)
Fischer, A., Riesen, K., Bunke, H.: Graph similarity features for HMM-based handwriting recognition in historical documents. In: Proceedings of the International Conference on Frontiers in Handwriting Recognition (ICFHR ’10), pp. 253–258 (2010)
Glucksman, H.: Classification of mixed-font alphabets by characteristic loci. In: Proceedings of IEEE Computer Conference, pp. 138–141 (1967)
James, G.M.: Curve alignment by moments. Ann. Appl. Stat. 1(2), 480–501 (2007)
Jou, F.D., Fan, K.C., Chang, Y.L.: Efficient matching of large-size histograms. Pattern Recognit. Lett. 25, 277–286 (2004)
Kessentini, Y., Paquet, T., Ben Hamadou, A.: Off-line handwritten word recognition using multi-stream hidden markov models. Pattern Recognit. Lett. 31(1), 60–70 (2010)
Khorsheed, M.S.: Off-line Arabic character recognition—a review. Pattern Anal. Appl. 5, 31–45 (2002)
Koerich, A.L., Sabourin, R., Suen, C.Y.: Large vocabulary off-line handwriting recognition: a survey. Pattern Anal. Appl. 6, 97–121 (2003)
Li, Z., Luo, X., Gao, C.: Multi-resolution curve alignment based on salient features. In: Proceedings of the 18th International Conference on, Pattern Recognition (ICPR’06), vol. 2, pp. 357–360 (2006)
Liu, C.L., Suen, C.Y.: A new benchmark on the recognition of handwritten Bangla and Farsi numeral characters. Pattern Recognit. 42(12), 3287–3295 (2009)
Lopresti, D., Nagy, G., Seth, S., Zhang, X.: Multi-character field recognition for Arabic and chinese handwriting. In: Lecture Notes in Computer Science, vol. 4768, p. 218 (2008)
Lorigo, L.M., Govindaraju, V.: Offline Arabic handwriting recognition: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 28(5), 712–724 (2006)
Madhvanath, S., Govindaraju, V.: The role of holistic paradigms in handwritten word recognition. IEEE Trans. Pattern Anal. Mach. Intell. 23, 149–164 (2001)
Mahmoud, S.: Arabic (Indian) handwritten digits recognition using Gabor-based features. In: Proceedings of International Conference on Innovations in Information Technology, pp. 683–687, Al Ain (2008)
Marques, J.S.: A fuzzy algorithm for curve and surface alignment. Pattern Recognit. Lett. 19(9), 797–803 (1998)
Mattar, M.A., Ross, M.G., Learned-Miller, E.G.: Nonparametric curve alignment. In: Proceedings of IEEE International Conference on Acoustics, Speech, and, Signal Processing (ICASSP’09), pp. 3457–3460 (2009)
Mozaffari, S., Faez, K., Margner, V.: Application of fractal theory for on-line and off-line Farsi digit recognition. In: Lecture Notes in Computer Science, vol. 4571, p. 868 (2007)
Mozaffari, S., Faez, K., Margner, V., El-Abed, H.: Two-stage Lexicon reduction for offline Arabic handwritten word recognition. Int. J. Pattern Recognit. Artif. Intell. 22, 1323–1341 (2008)
Munich, M.E., Perona, P.: Continuous dynamic time warping for translation-invariant curve alignment with applications to signature verification. In: Proceedings of the 7th IEEE International Conference on Computer Vision (ICCV’99), vol. 1, pp. 108–115 (1999)
Myers, C., Rabiner, L., Rosenberg, A.: Performance tradeoffs in dynamic time warping algorithms for isolated word recognition. IEEE Trans. Signal Process. Acoust. Speech Signal Process. 28, 623–635 (1980)
Parvez, M.T., Mahmoud, S.A.: Arabic handwriting recognition using structural and syntactic pattern attributes. Pattern Recognit. 46, 141–154 (2013)
Plamondon, R., Srihari, S.N.: On-line and off-line handwriting recognition: a comprehensive survey. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 63–84 (2000)
Quiniou, S., Anquetil, E., Carbonnel, S.: Statistical language models for on-line handwritten sentence recognition. In: Proceedings of the Eight International Conference on Document Analysis and Recognition (ICDAR05) (2005)
Ravani, R., Nooralishahi, P., Amani, A.S.: A novel approach for Persian/Arabic Intelligent Word Recognition (IWR). In: Proceedings of the 3rd European Workshop on Visual Information Processing (EUVIP), pp. 292–297 (2011)
Ronn, B.B.: Non-parametric maximum likelihood estimation for shifted curves. J. R. Stat. Soc. B(63), 243–259
Saeed, K., Albakoor, M.: Region growing based segmentation algorithm for typewritten and handwritten text recognition. Appl. Soft Comput. 9(2), 608–617 (2009)
Sari, T., Souici, L., Sellami, M.: Off-line handwritten Arabic character segmentation algorithm: ACSA. In: Proceedings of International Workshop on Frontiers in Handwriting Recognition, pp. 452–457, Niagara-on-the-Lake Ontario, Canada (2002)
Sari, T., Sellami, M.: Cursive Arabic script segmentation and recognition system. Int. J. Comput. Appl. 27(3), 161–168 (2005)
Sebastian, T., Klein, P., Kimia, B.: On aligning curves. IEEE Trans. Pattern Anal. Mach. Intell. 25, 116–125 (2003)
Sonka, M., Hlavac, V., Boyle, R.: Image Processing, Analysis and Machine Vision. Thomson Learning, USA (2008)
Souici-Meslati, L., Sellami, M.: A hybrid approach for Arabic literal amounts recognition. Arab. J. Sci. Eng. 29, 177–194 (2004)
Steinherz, T., Rivlin, E., Intrator, N.: Off-line cursive script word recognition: a survey. Int. J. Document Anal. Recognit. (IJDAR) 2, 90–110 (1999)
Vamvakas, G., Gatos, B., Stamatopoulos, N., Perantonis, S.: A complete optical character recognition methodology for historical documents. In: Proceedings of the Eighth IAPR International Workshop on Document Analysis Systems (DAS ’08), pp. 525–532 (2008)
Vinciarelli, A., Bengio, S.: Writer adaptation techniques in HMM based off-line cursive script recognition. Pattern Recognit. Lett. 23(8), 905–915 (2002)
Wang, K.M., Gasser, T.: Alignment of curves by dynamic time warping. Ann. Stat. 25(3), 1251–1276 (1997)
Wshah, S., Govindaraju, V., Cheng, Y., Li, H.: A novel lexicon reduction method for Arabic handwriting recognition. In: Proceedings of the 20th International Conference on Pattern Recognition (ICPR ’10), pp. 2865–2868 (2010)
Wshah, S., Shi, Z., Govindaraju, V.: Segmentation of Arabic handwriting based on both contour and skeleton segmentation. In: Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR’09), pp. 793–797, Barcelona (2009)
Wuthrich, M., Liwicki, M., Fischer, A., Indermuhle, E., Bunke, H., Viehhauser, G., Stolz, M.: Language model integration for the recognition of handwritten medieval documents. In: Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR ’09), pp. 211–215 (2009)
Xia, M., Liu, B.: Aligning curves under projective transform and its application to image registration. In: Proceedings of IEEE International Conference on Image Processing (ICIP’06), pp. 349–352 (2006)
Acknowledgments
The authors would like to thank the anonymous reviewers for their valuable comments and constructive suggestions that helped them to improve content and presentation of the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Fouladi, K., Araabi, B.N. & Kabir, E. A fast and accurate contour-based method for writer-dependent offline handwritten Farsi/Arabic subwords recognition. IJDAR 17, 181–203 (2014). https://doi.org/10.1007/s10032-013-0210-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-013-0210-7