Abstract
Handwritten Urdu recognition has been the least explored to date due to unavailability of a standard hand-written Urdu dataset, huge variation among writing styles of different Urdu writers, irregular positioning of diacritics associated with ligatures, similarity in shape of some Urdu characters in writing, and unavailability of an efficient learning and training technique. Few researchers have proposed the handwritten Urdu datasets among which only Urdu Nastaliq handwritten dataset (UNHD) is publicly available. The UNHD contains ligatures of only up to five characters and does not cover the entire Urdu ligature corpus. Hence, we present a novel comprehensive handwritten Urdu dataset named UHLD for the ‘Urdu Handwritten Ligature Dataset’:—which consists of ligatures of up to seven-character length and covers most of the ligature corpus of the Urdu language. The UHLD is written by both genders independent of age of person, paper color, paper type (blank or ruled), ink color, pen type. We propose an unconstrained handwritten Urdu recognition system that can recognize handwritten Urdu ligatures with up to six characters. A new robust algorithm has also been proposed here that is able to divide a complete ligature into primary and secondary components with 98% accuracy on a large Urdu dataset. Our proposed holistic handwritten Urdu recognition system ensures independent recognition of both primary and secondary components of a word/ligature. The proposed recognition technique is transformation invariant and computationally efficient and achieves a better recognition rate of 97% for UHLD and 93% for UNHD.
Similar content being viewed by others
Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Naz, S., Umar, A.I., Shirazi, S.H., Khan, S.A., Ahmed, I., Khan, A.A.: Challenges of Urdu named entity recognition: a scarce resourced language. Res. J. Appl. Sci. Eng. Technol. 8(10), 1272–1278 (2014)
Daud, A., Khan, W., Che, D.: Urdu language processing: a survey. Artif. Intell. Rev. 47(3), 279–311 (2017)
Weber, G.: Top languages. Retrieved April. 11, 2009 (2008)
Ahmed, S.B., Naz, S., Swati, S., Razzak, M.I.: Handwritten Urdu character recognition using one-dimensional BLSTM classifier. Neural Comput. Appl. 31(4), 1143–1151 (2019)
Alghamdi, M. A., Alkhazi, I. S., Teahan, W. J.: (July). Arabic OCR evaluation tool. In 2016 7th international conference on computer science and information technology (CSIT) (pp. 1-6). IEEE (2016)
Satti, D.A., Saleem, K.: (November). Complexities and implementation challenges in offline urdu Nastaliq OCR. In: Proceedings of the Conference on Language & Technology, 85-91m (2012)
Khan, N.H., Adnan, A.: Urdu optical character recognition systems: Present contributions and future directions. IEEE Access 6, 46019–46046 (2018)
Naz, S., Umar, A.I., Ahmed, R., Razzak, M.I., Rashid, S.F., Shafait, F.: Urdu Nasta’liq text recognition using implicit segmentation based on multi-dimensional long short term memory neural networks. Springerplus 5(1), 2010 (2016)
Naz, S., et al.: Segmentation techniques for recognition of Arabic-like scripts: A comprehensive survey. Edu. Inf. Technol. 21(5), 1225–1241 (2015). https://doi.org/10.1007/s10639-015-9377-5
Din, I.U., Siddiqi, I., Khalid, S., Azam, T.: Segmentation-free optical character recognition for printed Urdu text. EURASIP J Image Video Process 2017(1), 62 (2017)
Lehal, G.S.: December. Choice of recognizable units for Urdu OCR. In: Proceeding of the Workshop on Document Analysis and Recognition, pp. 79–85 (2012)
Ahmed, S.B., Naz, S., Swati, S., Razzak, I., Umar, A.I. Khan, A.A.: UCOM offline dataset-an urdu handwritten dataset generation. Int. Arab J. Inf. Technol. (IAJIT), 14(2) (2017)
Husnain, M., Saad Missen, M.M., Mumtaz, S., Jhanidr, M.Z., Coustaty, M., Muzzamil Luqman, M., Ogier, J.M., Sang Choi, G.: Recognition of urdu handwritten characters using convolutional neural network. Appl. Sci. 9(13), 2758 (2019)
Hassan, S., Irfan, A., Mirza, A. Siddiqi, I.: Cursive handwritten text recognition using Bi-directional LSTMs: A case study on Urdu handwriting. In: 2019 International Conference on Deep Learning and Machine Learning in Emerging Applications (Deep-ML), pp. 67-72. IEEE (2019)
Ahmed, S.B., Hameed, I.A., Naz, S., Razzak, M.I., Yusof, R.: Evaluation of handwritten Urdu text by integration of MNIST dataset learning experience. IEEE Access 7, 153566–153578 (2019)
Naeem, M.F., Raza, S.M., Khan, M.M., Ul-Hasan, A., Shafait, F.: A convolutional recursive deep architecture for unconstrained Urdu handwriting recognition. Neural Comput. Appl. 34(2), 1635–48 (2022)
GuoDong, Z., KimTeng, L.: Interpolation of n-gram and mutual information based trigger pair language models for Mandarin speech recognition. Comput. Speech Language. 13(2), 125–41 (1999)
Ganai, A.F. Koul, A.: September. Projection profile based ligature segmentation of Nastaleeq Urdu OCR. In: 2016 4th International Symposium on Computational and Business Intelligence (ISCBI), pp. 170–175. IEEE (2016)
Lehal, G.S.: Ligature segmentation for Urdu OCR. In: 2013 12th International Conference on Document Analysis and Recognition. Available at: https://doi.org/10.1109/icdar.2013.229(2013)
Rehman, K.U.U., Khan, Y.D.: A scale and rotation invariant Urdu Nastalique ligature recognition using cascade forward backpropagation neural network. IEEE Access 7, 120648–120669 (2019)
Mostafavi, S.M., Kazerouni, I.A. Haddadnia, J.: Noise removal from printed text and handwriting images using coordinate logic filters. In: 2010 International Conference on Computer Applications and Industrial Electronics, pp. 160-164. IEEE (2010)
Devi, H.: Thresholding: A Pixel-Level image processing methodology preprocessing technique for an OCR system for the Brahmi script. Ancient Asia, 1 (2006)
Kumar, V., Gupta, P.: Importance of statistical measures in digital image processing. Int. J. Emerging Technol. Adv. Eng. 2(8), 56–62 (2012)
Singh, Y.K.: Finding connected components in a gray scale image. ADBU J. Eng. Technol. 5(2) (2016)
Sabbour, N., Shafait, F.: A segmentation-free approach to Arabic and Urdu OCR. In: Document Recognition and Retrieval XX (Vol. 8658, p. 86580N). International Society for Optics and Photonics (2013)
Yang, L., Hanneke, S., Carbonell, J.: A theory of transfer learning with applications to active learning. Mach. Learn. 90(2), 161–189 (2013)
Ng, H.W., Nguyen, V.D., Vonikakis, V. Winkler, S.: November. Deep learning for emotion recognition on small datasets using transfer learning. In: Proceedings of the 2015 ACM on international conference on multimodal interaction, pp. 443-449 (2015)
Jogin, M., Madhulika, M.S., Divya, G.D., Meghana, R.K. Apoorva, S.: May. Feature extraction using convolution neural networks (CNN) and deep learning. In: 2018 3rd IEEE International Conference on Recent Trends in Electronics, In-formation & Communication Technology (RTEICT), pp. 2319–2323. IEEE (2018)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
He, K., Zhang, X., Ren, S. Sun, J.: Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: European Conference on Computer Vision, pp. 818-833. Springer, Cham (2014)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V. Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Uddin, I., Javed, N., Siddiqi, I.A., Khalid, S., Khurshid, K.: Recognition of printed Urdu ligatures using convolutional neural networks. J. Electron. Imaging 28(3), 033004 (2019)
Acknowledgements
We thank Dr. Saad Bin Ahmed for providing UNHD Database
Author information
Authors and Affiliations
Contributions
The author AFG: writes the entire manuscript text, figures, and tables. The author FKL: reviewed the manuscript, suggests changes, and checked for plagiarism in the manuscript. All authors reviewed the manuscript before submission.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare to have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ganai, A.F., Khursheed, F. A novel holistic unconstrained handwritten urdu recognition system using convolutional neural networks. IJDAR 25, 351–371 (2022). https://doi.org/10.1007/s10032-022-00414-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-022-00414-7