Abstract
Detecting and correcting misspelled words in a written text are of great importance in many natural language processing applications. Errors can be broadly classified into two groups, namely spelling error and contextual errors. Spelling errors occur when the misspelled words do not exist in a dictionary and are meaningless, while contextual errors occur when the words do exist in the dictionary, but their use is not as intended by the writer. This paper presents an “Urdu Spell Checker” that detects incorrect spellings of a word using widely used lexicon lookup approach and provides a list of candidate words containing correct spellings by applying the edit distance technique which covers all types of spelling errors. To identify the best candidate word, this paper proposes a hybrid model that ranks the words in the candidate word list. Multiple ranking techniques such as Soundex, Shapex, LCS and N-gram are used standalone, as well in combination, to determine the best technique in terms of F1 score. A dictionary containing 48,551 words is developed from UMC corpus and Urdu newspaper corpus. Our hybrid model achieves an F1 score of 94.02% when considering top five suggested words and an F1 score of 88.29% when considering top one suggested word.

Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abramovici S (1983) Errors in proofreading: Evidence for syntactic control of letter processing? Memory Cogn 11(3):258–261
Ahmad Z, Orakzai J, Shamsher I, Adnan A (2007) Urdu Nastaleeq optical character recognition. In: Proceedings of world academy of science, engineering and technology, pp 249–252
Alkhatib M, Monem A, Shaalan K (2020) Deep learning for Arabic error detection and correction. ACM Trans Asian Low-Resour Lang Inf Process (TALLIP) 19(5):1–13
Aziz R, Anwar M (2020) Urdu Spell checker: a scarce resource language. In: International conference on intelligent technologies and applications, pp 471–483
Azmi A, Almutery M, Aboalsamh H (2019) Real-word errors in Arabic texts: A better algorithm for detection and correction. IEEE/ACM Trans Audio, Speech, Lang Process 27(8):1308–1320
Barari L, QasemiZadeh B (2005) CloniZER spell checker adaptive language independent spell checker. In: Proc. of the first ICGST international conference on artificial intelligence and machine learning AIML, pp 19–21
Dahar I, Abbas F, Rajput U, Hussain A, Azhar F (2018) An efficient sindhi spelling checker for microsoft word. Int J Comput Sci Netw Security 18(5):144–150
Damerau F (1964) A technique for computer detection and correction of spelling errors. Commun ACM 7(3):171–176
Deorowicz S, Ciura M (2005) Correcting spelling errors by modelling their causes. Int J Appl Math Comput Sci 15(2):275
Eastman C, McLean D (1981) On the need for parsing ill-formed input. Comput Linguist 7(4):257
Etoori P, Chinnakotla M, Mamidi R (2018) Automatic spelling correction for resource-scarce languages using deep learning. In: Proceedings of ACL 2018, student research workshop, pp 146–152
Faili H, Ehsan N, Montazery M, Pilehvar M (2016) Vafa spell-checker for detecting spelling, grammatical, and real-word errors of Persian language. Digit Scholarsh Human 31(1):95–117
Hamarashid H, Saeed S, Rashid T (2020) Next word prediction based on the N-gram model for Kurdish Sorani and Kurmanji. Neural Comput Appl 33(6):4247–4566
Hanson A, Riseman E, Fisher E (1976) Context in word recognition. Pattern Recogn 8(1):35–45
Hassan Y, Aly M, Atiya A (2014) Arabic spelling correction using supervised learning. In: Proceedings of the EMNLP 2014 workshop on arabic natural language processing (ANLP), pp 121–126
Jurafsky D, Martin J (2018) N-gram language models. Speech Lang Process 23:1–28
Naseem T (2004) A Hybrid Approach for Urdu Spell Checking. Master of Science (Computer Science) thesis at the National University of Computer & Emerging Sciences.
Noaman H, Sarhan S, Rashwan M (2016) Automatic arabic spelling errors detection and correction based on confusion matrix-noisy channel hybrid system. Egypt Comput Sci J 40(2):1–11
Pollock J, Zamora A (1983) Collection and characterization of spelling errors in scientific and scholarly text. J Am Soc Inf Sci 34(1):51–58
Rasooli M, Kahefi O, Minaei-Bidgoli B (2011) Effect of Adaptive Spell Checking in Persian. In: 2011 7th international conference on natural language processing and knowledge engineering, pp 161–164
Rehman Z, Anwar W, Bajwa UI (2011) Challenges in Urdu text tokenization and sentence boundary disambiguation. In: Proceedings of the 2nd workshop on south southeast asian natural language processing (WSSANLP), pp 40–45
Sardar S, Wahab A (2010) Optical character recognition system for Urdu. In: 2010 international conference on information and emerging technologies, pp 1–5
Shaalan K, Aref R, Fahmy A (2010) An approach for analyzing and correcting spelling errors for non-native Arabic learners. In: 2010 The 7th international conference on informatics and systems (INFOS), pp 1–7
Stauffer R (1949) Chapter III: Research in spelling and handwriting. Rev Educ Res 19(2):118–124
Wint Z, Ducros T, Aritsugi M (2018) Non-words spell corrector of social media data in message filtering systems. J Digit Inf Manage 16(2):1–12
Yazdani A, Ghazisaeedi M, Ahmadinejad N, Giti M, Amjadi H, Nahvijou A (2019) Automated misspelling detection and correction in persian clinical text. J Digit Imaging 33(3):1–8
Zerrouki T, Alhawiti K, Balla A (2014) Autocorrection of arabic common errors for large text corpus. In: Proceedings of the EMNLP 2014 workshop on arabic natural language processing (ANLP), pp 127–131
Zobel JA (1995) Finding approximate matches in large lexicons. Softw Practice Experience 25(3):331–345
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
Romila Aziz: Conceptualization, Data Curation, Formal Analysis, Investigation, Methodology, Resources, Software, Writing Original Draft, Investigation. Muhammad Waqas Anwar: Visualization, Supervision, Project Administration, Funding Acquisition, Writing and Review Editing, Investigation, Validation. Muhammad Hasan Jamal: Writing and Review Editing, Investigation, Validation. Usama Ijaz Bajwa: Writing and Review Editing, Validation.
Corresponding author
Ethics declarations
Conflict of interest
We have no financial and personal relationship with other people and organizations.
Availability of Data and Material
Not applicable.
Code Availability
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Aziz, R., Anwar, M.W., Jamal, M.H. et al. A hybrid model for spelling error detection and correction for Urdu language. Neural Comput & Applic 33, 14707–14721 (2021). https://doi.org/10.1007/s00521-021-06110-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06110-7