Abstract
In the digital world of computers, several software applications have been developed to ensure spellings of various words. English language is found to have gone far ahead in the development of spell checking applications whilst other languages specifically naming Urdu, lack behind to cherish such technologies. We develop “Urdu Spell Checker” which detects incorrect spellings of a word and provides a list of options containing correct spellings. The spell checker carries correct spellings of words residing inside a predefined lexicon or corpus. It is to ensure whether entered word is correct or not. In case if the input word matches with the corpus words it is considered correct otherwise it is considered as misspelled word. Multiple techniques are used individually as well as a combination these techniques is used to check which set of methods is best in terms of output. By using multiple techniques for error correction, it is observed that Jaro distance provides best results with combination of soundex, shapex and n-gram that is 80.0% precision, 44.87% recall and 57.37% F-Measure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Damerau, F.J.: A technique for computer detection and correction of spelling errors. Commun. ACM 7(3), 171–176 (1964)
Naseem, T., Hussain, S.: A novel approach for ranking spelling error corrections. Lang. Resour. Eval. 41(2), 117–128 (2007)
Naseem, T.: A hybrid approach for Urdu spell checking. Master of Science (Computer Science) thesis at the National University of Computer & Emerging Sciences, pp. 1–87 (2004)
Das, M., Borgohain, S., Gogoi, J., Nair, S.B.: Design and implementation of a spell checker for Assamese, pp. 156–162. IEEE (2002)
Solak, A., Oflazer, K.: Design and implementation of a spelling checker for Turkish. Literary Linguist. Comput. 8(3), 113–130 (1993)
Durrani, N., Hussain, S.: Urdu word segmentation. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 528–536 (2010)
Zaghouani, W., et al.: Large scale arabic error annotation: guidelines and framework. In: LREC, pp. 2362–2369 (2014)
Rasooli, M.S., Kahefi, O., Minaei-Bidgoli, B.: Effect of adaptive spell checking in Persian. In: 2011 7th International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE), pp. 161–164. IEEE (2011)
Iqbal, S., Anwar, M.W., Bajwa, U.I., Rehman, Z.: Urdu spell checking: reverse edit distance approach. In: Proceedings of the 4th Workshop on South and Southeast Asian Natural Language Processing, pp. 58–65 (2013)
Magdy, W., Darwish, K.: Arabic OCR error correction using character segment correction, language modeling, and shallow morphology. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 408–414 (2006)
Zhang, Q., Zhang, S., Hou, J., Cheng, X.: HANSpeller: a unified framework for Chinese spelling correction. Int. J. Comput. Linguist. Chin. Lang. Process. 20(1), 1–22 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Aziz, R., Anwar, M.W. (2020). Urdu Spell Checker: A Scarce Resource Language. In: Bajwa, I., Sibalija, T., Jawawi, D. (eds) Intelligent Technologies and Applications. INTAP 2019. Communications in Computer and Information Science, vol 1198. Springer, Singapore. https://doi.org/10.1007/978-981-15-5232-8_40
Download citation
DOI: https://doi.org/10.1007/978-981-15-5232-8_40
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-5231-1
Online ISBN: 978-981-15-5232-8
eBook Packages: Computer ScienceComputer Science (R0)