Skip to main content
Log in

Computationally efficient recognition of unconstrained handwritten Urdu script using BERT with vision transformers

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The handwritten Urdu text recognition is a challenging area in pattern recognition and has gained much importance after the rapid emergence of several camera-based applications on portable devices, which facilitate the daily processing of plenty of images. The various challenges encountered in handwritten Urdu recognition are writer-dependent variations amongst different Urdu writers, irregular positioning of diacritics associated with a character, context sensitivity of characters, and cursive nature of Urdu script. These challenges also make it difficult to formulate a large generalized handwritten Urdu dataset. The state-of-the-art approaches proposed for the recognition of handwritten Urdu text mostly focus on implicit approaches. These approaches are error prone and do not yield significant recognition rates. The holistic approach of handwritten Urdu recognition has been least explored to date and the existing holistic approaches are complex and time consuming as they mostly rely on convolutional/recurrent neural networks or statistical methods. Hence, in this research, a novel and efficient vision transformer-based methodology using BERT architecture has been proposed to the recognition of handwritten Urdu text. The proposed approach uses convolution feature maps as word embedding in the transformer that makes full use of the powerful attention mechanism of the vision transformer to focus on a particular connected component (ligature) in handwritten Urdu text. To cover the entire Urdu corpus, we have pre-trained several benchmark handwritten Urdu datasets such as UNHD and NUST-UHWR and tested unconstrained handwritten Urdu text. In comparison with the state-of-the-art techniques, the experimental evaluation of the proposed approach reports the better results of the various performance parameters such as Ligature Error Rate (LER), precision, sensitivity, specificity, f1-score, and accuracy. The great success of the proposed approach lies in (i) the significant reduction of training time needed to train a large handwritten Urdu dataset, (ii) minimum computational complexity as there is no overhead of diacritic separation and re-association as used in most of the state-of-the-art techniques, and (iii) the proposed approach registers a new state-of-the-art LER of up to 3% only on unconstrained handwritten Urdu text.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

  1. Naz S, Umar AI, Shirazi SH, Khan SA, Ahmed I, Khan AA (2014) Challenges of urdu named entity recognition: a scarce resourced language. Res J Appl Sci Eng Technol 8(10):1272–1278

    Article  Google Scholar 

  2. Daud A, Khan W, Che D (2017) Urdu language processing: a survey. Artif Intell Rev 47:279–311

    Article  Google Scholar 

  3. Khan NH, Adnan A (2018) Urdu optical character recognition systems: Present contributions and future directions. IEEE Access 6:46019–46046

    Article  Google Scholar 

  4. Satti DA, Saleem K (2012) Complexities and implementation challenges in offline urdu nastaliq ocr. In: Proceedings of the conference on language and technology, pp 85–91

  5. Ahmed SB, Naz S, Swati S, Razzak MI (2019) Handwritten urdu character recognition using one-dimensional blstm classifier. Neural Comput Appl 31:1143–1151

    Article  Google Scholar 

  6. ul Sehr Zia N, Naeem MF, Raza SMK, Khan MM, Ul-Hasan A, Shafait F (2022) A convolutional recursive deep architecture for unconstrained urdu handwriting recognition. Neural Comput Appl, pp 1–14

  7. Naz S, Umar AI, Shirazi SH, Ahmed SB, Razzak MI, Siddiqi I (2016) Segmentation techniques for recognition of arabic-like scripts: a comprehensive survey. Educ Inf Technol 21:1225–1241

    Article  Google Scholar 

  8. Ganai AF, Khursheed F (2022) A novel holistic unconstrained handwritten urdu recognition system using convolutional neural networks. Int J Document Anal Recogn (IJDAR) 25(4):351–371

    Article  Google Scholar 

  9. Ganai AF, Khursheed F (2023) Computationally efficient holistic approach for handwritten urdu recognition using lrcn model. Int J Intell Syst Appl Eng, 11(4s):536–551

  10. Ahmed SB, Hameed IA, Naz S, Razzak MI, Yusof R (2019) Evaluation of handwritten urdu text by integration of mnist dataset learning experience. IEEE Access 7:153566–153578

    Article  Google Scholar 

  11. Schaefer AM, Udluft S, Zimmermann H-G (2008) Learning long-term dependencies with recurrent neural networks. Neurocomputing 71(13–15):2481–2488

    Article  Google Scholar 

  12. Chen Z, Yin F, Zhang X-Y, Yang Q, Liu C-L (2020) Multrenets: Multilingual text recognition networks for simultaneous script identification and handwriting recognition. Pattern Recogn 108:107555. https://doi.org/10.1016/j.patcog.2020.107555. URL https://www.sciencedirect.com/science/article/pii/S0031320320303587

  13. Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

  14. Floridi L, Chiriatti M (2020) Gpt-3: its nature, scope, limits, and consequences. Minds Mach 30:681–694

    Article  Google Scholar 

  15. Ezen-Can A (2020) A comparison of lstm and bert for small corpus. arXiv preprint arXiv:2009.05451

  16. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst, 30

  17. Xue M, Du J, Wang B, Ren B, Hu Y (2023) Joint optimization for attention-based generation and recognition of chinese characters using tree position embedding. Pattern Recogn 140:109538. https://doi.org/10.1016/j.patcog.2023.109538. URL https://www.sciencedirect.com/science/article/pii/S0031320323002388

  18. Zhuang F, Qi Z, Duan K, Xi D, Zhu Y, Zhu H, Xiong H, He Q (2020) A comprehensive survey on transfer learning. Proc IEEE 109(1):43–76

    Article  Google Scholar 

  19. Khan L, Amjad A, Ashraf N, Chang H-T (2022) Multi-class sentiment analysis of urdu text using multilingual bert. Sci Rep 12(1):5436

    Article  Google Scholar 

  20. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al. (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

  21. Ramchoun H, Ghanou Y, Ettaouil M, Janati Idrissi MA (2016) Multilayer perceptron: architecture optimization and training

  22. DeMers D, Cottrell G (1992) Non-linear dimensionality reduction. Adv Neural Inf Process Syst, 5

  23. Likhomanenko T, Xu Q, Synnaeve G, Collobert R, Rogozhnikov, (2021) A Cape: Encoding relative positions with continuous augmented positional embeddings. Adv Neural Inf Process Syst 34:16079–16092

  24. Reyes AK, Caicedo JC, Camargo JE (2015) Fine-tuning deep convolutional networks for plant recognition. CLEF (Working Notes) 1391:467–475

    Google Scholar 

  25. Bin Ahmed S, Naz S, Swati S, Razzak I, Umar AI, Ali Khan A (2017) Ucom offline dataset-an urdu handwritten dataset generation

  26. Husnain M, Saad Missen MM, Mumtaz S, Jhanidr MZ, Coustaty M, Muzzamil Luqman M, Ogier J-M, Sang Choi G (2019) Recognition of urdu handwritten characters using convolutional neural network. Appl Sci 9(13):2758

    Article  Google Scholar 

  27. Hassan S, Irfan A, Mirza A, Siddiqi I (2019) Cursive handwritten text recognition using bi-directional lstms: a case study on urdu handwriting. In: 2019 International conference on deep learning and machine learning in emerging applications (Deep-ML), IEEE, pp 67–72

  28. Pauls A, Klein D (2011) Faster and smaller n-gram language models. In: Proceedings of the 49th annual meeting of the association for computational linguistics. Human Lang Technol, pp 258–267

  29. Misgar MM, Mushtaq F, Khurana SS, Kumar M (2023) Recognition of offline handwritten urdu characters using rnn and lstm models. Multimedia Tools Appl 82(2):2053–2076

    Article  Google Scholar 

  30. Kang L, Riba P, Rusiñol M, Fornés A, Villegas M (2022) Pay attention to what you read: non-recurrent handwritten text-line recognition. Pattern Recogn 129:108766

    Article  Google Scholar 

  31. Shaiq MD, Cheema MDA, Kamal A (2022) Transformer based urdu handwritten text optical character reader. arXiv preprint arXiv:2206.04575

  32. Vidal E, Toselli AH, Ríos-Vila A, Calvo-Zaragoza J (2023) End-to-end page-level assessment of handwritten text recognition. Pattern Recog. 142:109695. https://doi.org/10.1016/j.patcog.2023.109695. URL https://www.sciencedirect.com/science/article/pii/S003132032300393X

  33. Marti U-V, Bunke H (2002) The iam-database: an english sentence database for offline handwriting recognition. Int J Document Anal Recogn 5:39–46

    Article  MATH  Google Scholar 

  34. Sanchez JA, Romero V, Toselli AH, Vidal E (2016) Icfhr2016 competition on handwritten text recognition on the read dataset. In: 2016 15th International conference on frontiers in handwriting recognition (ICFHR), IEEE, pp 630–635

  35. Sanchez JA, Romero V, Toselli AH, Villegas M, Vidal E (2017) Icdar2017 competition on handwritten text recognition on the read dataset. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1, IEEE, pp 1383–1388

  36. Riaz N, Arbab H, Maqsood A, Nasir K, Ul-Hasan A, Shafait F (2022) Conv-transformer architecture for unconstrained off-line urdu handwriting recognition. Int J Document Anal Recogn (IJDAR) 25(4):373–384

    Article  Google Scholar 

  37. Naz S, Umar AI, Ahmed R, Razzak MI, Rashid SF, Shafait F (2016) Urdu nasta’liq text recognition using implicit segmentation based on multi-dimensional long short term memory neural networks. SpringerPlus 5:1–16

    Article  Google Scholar 

  38. Cunningham P, Cord M, Delany SJ (2008) Supervised learning. Machine learning techniques for multimedia: case studies on organization and retrieval, pp 21–49

  39. Ganai AF, Koul A (2016) Projection profile based ligature segmentation of nastaleeq urdu ocr. In: 2016 4th International symposium on computational and business intelligence (ISCBI), IEEE, pp 170–175

  40. Lehal GS (2013) Ligature segmentation for urdu ocr. In: 2013 12th International conference on document analysis and recognition, IEEE, pp 1130–1134

  41. Uddin I, Javed N, Siddiqi I, Khalid S, Khurshid K (2019) Recognition of printed urdu ligatures using convolutional neural networks. J Electronic Imag 28(3):033004–033004

    Article  Google Scholar 

  42. Hendrycks D, Gimpel K (2016) Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415

  43. Brownlee J (2017) Gentle introduction to the adam optimization algorithm for deep learning. Mach Learn Mastery 3(7)

Download references

Acknowledgements

We thank Dr. Saad Bin Ahmed for providing UNHD Database. We also thank Dr Zia ul Sehr for providing NUST-UHWR database.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aejaz Farooq Ganai.

Ethics declarations

Conflict of interest

The authors declare to have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ganai, A.F., Khursheed, F. Computationally efficient recognition of unconstrained handwritten Urdu script using BERT with vision transformers. Neural Comput & Applic 35, 24161–24177 (2023). https://doi.org/10.1007/s00521-023-08976-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08976-1

Keywords

Navigation