Skip to main content

Urdu Spell Checker: A Scarce Resource Language

  • Conference paper
  • First Online:
Book cover Intelligent Technologies and Applications (INTAP 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1198))

Included in the following conference series:

Abstract

In the digital world of computers, several software applications have been developed to ensure spellings of various words. English language is found to have gone far ahead in the development of spell checking applications whilst other languages specifically naming Urdu, lack behind to cherish such technologies. We develop “Urdu Spell Checker” which detects incorrect spellings of a word and provides a list of options containing correct spellings. The spell checker carries correct spellings of words residing inside a predefined lexicon or corpus. It is to ensure whether entered word is correct or not. In case if the input word matches with the corpus words it is considered correct otherwise it is considered as misspelled word. Multiple techniques are used individually as well as a combination these techniques is used to check which set of methods is best in terms of output. By using multiple techniques for error correction, it is observed that Jaro distance provides best results with combination of soundex, shapex and n-gram that is 80.0% precision, 44.87% recall and 57.37% F-Measure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://en.wikipedia.org/wiki/History_of_Hindustani.

  2. 2.

    https://www.wdl.org/en/item/9700/.

  3. 3.

    https://en.wikipedia.org/wiki/Levenshtein_distance.

  4. 4.

    https://blogs.sap.com/2013/12/04/jaro-winkler-distance-algorithm/.

  5. 5.

    https://www.jewishgen.org/InfoFiles/Soundex.html#NARA.

References

  1. Damerau, F.J.: A technique for computer detection and correction of spelling errors. Commun. ACM 7(3), 171–176 (1964)

    Article  Google Scholar 

  2. Naseem, T., Hussain, S.: A novel approach for ranking spelling error corrections. Lang. Resour. Eval. 41(2), 117–128 (2007)

    Article  Google Scholar 

  3. Naseem, T.: A hybrid approach for Urdu spell checking. Master of Science (Computer Science) thesis at the National University of Computer & Emerging Sciences, pp. 1–87 (2004)

    Google Scholar 

  4. Das, M., Borgohain, S., Gogoi, J., Nair, S.B.: Design and implementation of a spell checker for Assamese, pp. 156–162. IEEE (2002)

    Google Scholar 

  5. Solak, A., Oflazer, K.: Design and implementation of a spelling checker for Turkish. Literary Linguist. Comput. 8(3), 113–130 (1993)

    Article  Google Scholar 

  6. Durrani, N., Hussain, S.: Urdu word segmentation. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 528–536 (2010)

    Google Scholar 

  7. Zaghouani, W., et al.: Large scale arabic error annotation: guidelines and framework. In: LREC, pp. 2362–2369 (2014)

    Google Scholar 

  8. Rasooli, M.S., Kahefi, O., Minaei-Bidgoli, B.: Effect of adaptive spell checking in Persian. In: 2011 7th International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE), pp. 161–164. IEEE (2011)

    Google Scholar 

  9. Iqbal, S., Anwar, M.W., Bajwa, U.I., Rehman, Z.: Urdu spell checking: reverse edit distance approach. In: Proceedings of the 4th Workshop on South and Southeast Asian Natural Language Processing, pp. 58–65 (2013)

    Google Scholar 

  10. Magdy, W., Darwish, K.: Arabic OCR error correction using character segment correction, language modeling, and shallow morphology. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 408–414 (2006)

    Google Scholar 

  11. Zhang, Q., Zhang, S., Hou, J., Cheng, X.: HANSpeller: a unified framework for Chinese spelling correction. Int. J. Comput. Linguist. Chin. Lang. Process. 20(1), 1–22 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Romila Aziz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Aziz, R., Anwar, M.W. (2020). Urdu Spell Checker: A Scarce Resource Language. In: Bajwa, I., Sibalija, T., Jawawi, D. (eds) Intelligent Technologies and Applications. INTAP 2019. Communications in Computer and Information Science, vol 1198. Springer, Singapore. https://doi.org/10.1007/978-981-15-5232-8_40

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-5232-8_40

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-5231-1

  • Online ISBN: 978-981-15-5232-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics