Skip to main content
Log in

HINDIA: a deep-learning-based model for spell-checking of Hindi language

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The spelling error is a mistake occurred while typing the text document. The applications like search engines, information retrieval, emails, etc., require user typing. In such applications, good spell-checker is essential to rectify the misspelling. Spell-checkers for western languages like English are very powerful and can handle any type of spelling errors, whereas in the case of Indian languages like Hindi, Urdu, Bengali, Kannada, Assamese, etc., the available spell-checkers are very basic ones. These spell-checkers are developed using traditional methods like statistical methods and rule-based methods. This article presents a novel model HINDIA to handle the spelling errors of the Hindi language, one of the most spoken languages in India. It utilizes a deep-learning method for spelling error detection and correction. The proposed spell-checking model works in two phases. In the first phase model identifies the erroneous words in the input sample and in the second phase it replaces the wrong words with the most probable correct words. Model HINDIA is developed using the attention-based encoder–decoder bidirectional recurrent neural network (BiRNN) which uses long short-term memory cells. Several modifications in the BiRNN have been made and network is fine-tuned to process the spelling errors of Hindi language. It uses publicly available dataset ‘monolingual corpus’ developed by IIT Mumbai for training and testing. The performance of the proposed model is evaluated in two scenarios. In the first scenario where the testing dataset is generated using split function. HINDIA performs significantly well with precision 0.86, recall 0.72, f-measure 0.78 and accuracy 0.80. Further, in the second scenario, where a dataset is manually generated its performance is fairly good with precision 0.81, recall 0.72, f-measure 0.76 and accuracy 0.74. Model HINDIA gives better performance than the deep-learning-based Malayalam spell-checker and some other deep-learning-based correction models present in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Uddin MZ, Hassan MM (2019) Activity recognition for cognitive assistance using body sensors data and deep convolutional neural network. IEEE Sens J 19(19):8413–8419

    Article  Google Scholar 

  2. Hassan MM, Uddin MZ, Mohamed A, Almogren A (2018) A robust human activity recognition system using smartphone sensors and deep learning. Futur Gener Comput Syst 81:307–313

    Article  Google Scholar 

  3. Reshma U, Ganesh HBB, Mandar K, Mankame P, Kulkarni G (2018) Deep learning for digital text analytics: sentiment analysis, pp 1–8. arXiv Prepr. arXiv:1804.03673

  4. Dumais S, Cutrell E, Cadiz J, Jancke G, Sarin R, Robbins DC (2003) Stuff I’ve seen: a system for personal information retrieval and re-use. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval—SIGIR’03, vol 49, no. 2, p 72

  5. Zhou P, Qi Z, Zheng S, Xu J, Bao H, Xu B (2016) Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. arXiv Prepr. arXiv:1611.06639

  6. Plank B, Søgaard A, Goldberg Y (2016) Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. arXiv Prepr. arXiv:1604.05529

  7. Xie Z, Avati A, Arivazhagan N, Jurafsky D, Ng AY (2016) Neural language correction with character-based attention. arXiv:1603.09727v1

  8. Uzzaman N, Khan M (2006) A comprehensive Bangla spelling checker. BRAC University, Dhaka

    Google Scholar 

  9. Choudhury R, Deb N, Kashyap K (2019) Context sensitive spelling checker for Assamese language. In: Kalita J, Balas VE, Borah S, Pradhan R (eds) Recent developments in machine learning and data analytics. Springer, Singapore, pp 177–188

  10. Korhonen T (2008) Adaptive spell checker for dyslexic writers. In: Miesenberger K, Klaus J, Zagler W, Karshmer A. In: Comput. help. people with spec. needs. ICCHP 2008. Lect. notes comput. sci., vol 5105, pp 733–741

  11. Lai KH, Topaz M, Goss FR, Zhou L (2015) Automated misspelling detection and correction in clinical free-text records. J Biomed Inf 55:188–195

    Article  Google Scholar 

  12. Singh SP, Kumar A, Singh L, Bhargava M, Goyal K, Sharma B (2016) Frequency based spell checking and rule based grammar checking. In: International conference on electrical, electronics, and optimization techniques, ICEEOT 2016, pp 4435–4439

  13. Liu PLT, Paas F (2017) Effects of spell checkers on english as a second language students’ incidental spelling learning: a cognitive load perspective. Read Writ 30(7):1501–1525

    Article  Google Scholar 

  14. Al-hussaini L (2017) Experience: insights into the benchmarking data of hunspell and aspell spell checkers. ACM J Data Inf Qual 8(3):1–10

    Google Scholar 

  15. Octaviano M, Borra A (2017) A spell checker for a low-resourced and morphologically rich language. In: Proceedings of the 2017 IEEE region 10 conference (TELCON), pp 1853–1856

  16. Rajashekara Murthy, S Akshatha AN, Upadhyaya CG, Ramakanth Kumar P (2017) Kannada spell checker with sandhi splitter. In: International conference on advances in computing, communications and informatics, ICACCI 2017, pp 950–956

  17. Das M, Borgohain S, Gogoi J, Nair SB (2002) Design and implementation of a spell checker for assamese. In: Language engineering conference, proceedings IEEE, pp 156–162

  18. Manohar N, Lekshmipriya PT, Jayan V, Bhadran VK (2015) Spellchecker for Malayalam using finite state transition models. In: IEEE recent advances in intelligent computational systems, RAICS 2015, pp 157–161

  19. Dhanabalan T, Parthasarathi R, Geetha TV (2003) Tamil spell checker. In: Sixth tamil internet conference, Chennai, Tamilnadu, India, pp 18–27

  20. Christopher M, Uma Maheshwar Rao G, Amba PK, (2012) Telugu spell-checker. In: International Telugu internet conference proceedings, pp 1–8

  21. Singh S, Singh S (2018) Review of real-word error detection and correction methods in text documents. In: 2018 second international conference on electronics, communication and aerospace technology (ICECA), pp 1076–1081

  22. Jain A, Jain M, Jain G, Tayal DK (2018) ‘UTTAM’ An efficient spelling correction system for Hindi language based on supervised learning. ACM Trans Asian Low-Resour Lang Inf Process 18(1):1–26

    Article  Google Scholar 

  23. Rajashekara MS, Madi V, Sachin D, Ramakanth PK (2012) A non-word kannada spell checker using morphological analyzer and dictionary lookup method. Int J Eng Sci Emerg Technol 2(2):43–52

    Google Scholar 

  24. Segar J, Sarveswaran K (2015) Contextual spell checking for Tamil language. In: 14th Tamil internet conference, pp 1–5

  25. Fossati D, Di Eugenio B (2007) I Saw TREE trees in the park : how to correct real-word spelling mistakes. In: LREC, pp 896–901

  26. Jain U, Kaur J (2015) Text chunker for Punjabi. Int J Curr Eng Technol 5(5):3349–3353

    Google Scholar 

  27. Abdullah M, Islam Z, Khan M (2007) Error-tolerant finite-state recognizer and string pattern similarity based spelling-checker for Bangla. In: Proceeding of 5th international conference on natural language processing (ICON)

  28. Naseem T, Hussain S (2007) A Novel approach for ranking spelling error corrections for Urdu. Lang Resour Eval 41(2):117–128

    Article  Google Scholar 

  29. Iqbal S, Anwar W, Bajwa UI, Rehman Z (2013) Urdu spell checking : reverse edit distance approach. In: Proceedings of the 4th workshop on south and southeast asian natural language processing, pp 58–65

  30. Ghosh S, Kristensson PO (2015) Neural networks for text correction and completion in keyboard decoding. J Letex Cl Files 14(8):1–14

    Google Scholar 

  31. Sakaguchi K, Duh K, Post M, Van Durme B (2017) Robsut wrod reocginiton via semi-character recurrent neural network. In: Thirty-first AAAI conference on artificial intelligence, pp 3281–3287

  32. Sooraj S, Manjusha K, Anand Kumar M, Soman KP (2018) Deep learning based spell checker for malayalam language. J Intell Fuzzy Syst 34(3):1427–1434

    Article  Google Scholar 

  33. Gumaei A, Hassan MM, Alelaiwi A, Alsalman H (2019) A hybrid deep learning model for human activity recognition using multimodal body sensing data. IEEE Access 7:99152–99160

    Article  Google Scholar 

  34. Uddin MZ, Hassan MM, Alsanad A, Savaglio C (2020) A body sensor data fusion and deep recurrent neural network-based behavior recognition approach for robust healthcare. Inf Fusion 55:105–115

    Article  Google Scholar 

  35. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  36. Gers FA, Schraudolph NN, Schmidhuber J (2002) Learning precise timing with LSTM recurrent networks. J Mach Learn Res 3(1):115–143

    MathSciNet  MATH  Google Scholar 

  37. Cui Z, Ke R, Wang Y (2018) Deep bidirectional and unidirectional LSTM recurrent neural network for network-wide traffic speed prediction. pp 1–11

  38. Kim Y (2014) Convolutional neural networks for sentence classification. arXiv:1408.5882

  39. Bowman SR, Vilnis L, Vinyals O, Dai AM, Jozefowicz R, Bengio S (2016) Generating sentences from a continuous space. In: CoNLL 2016 - 20th SIGNLL conf. comput. nat. lang. learn. proc., pp 10–21

  40. Tong E, Jones C, Zadeh A, Morency LP (2017) Combating human trafficking with deep multimodal models. In: ACL 2017—55th annu. meet. assoc. comput. linguist. proc. conf. (Long Pap.) vol 1, pp 1547–1556

  41. Young T, Hazarika D, Poria S, Cambria E (2018) Recent trends in deep learning based natural language processing. IEEE Comput Intell Mag 13(3):55–75

    Article  Google Scholar 

  42. Homma Y, Sy S, Yeh C (2016) Detecting duplicate questions with deep learning. In: 30th conference on neural information processing systems (NIPS 2016), pp 1–8

  43. Kunchukuttan A, Mehta P, Bhattacharyya P (2018) The IIT Bombay English-Hindi parallel corpus. In: Language resources and evaluation conference

  44. Bojar O et al (2014) HindiEnCorp- Hindi-English and Hindi only corpus for machine translation. In: Ninth workshop on statistical machine translation, pp 3550–3555

  45. Kaur B, Singh H (2015) Design and implementation of HINSPELL—Hindi spell checker using hybrid approach. Int J Sci Res Manag 3(2):20158–22062

    Google Scholar 

Download references

Acknowledgement

The authors thank the reviewers for their insightful comments. The authors would also like to thank the Ministry of Electronics and IT, Government of INDIA, for providing fellowship under Grant Number: PhD-MLA-4 (69)/2015-16 (Visvesvaraya PhD Scheme for Electronics and IT) to pursue Ph.D. work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shashank Singh.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: List of variables used in the article and their definitions

Sr. no.

Variables

Definition

1.

X  = (X1, X2, X3, …, Xn)

Fixed size input vector

2.

H  =  (H0, H1, H2, …, Hm)

Hidden layers

3.

Y  =  (Y0, Y1, Y2,…,Yk)

Output symbols

4.

X t

Input to the network at time t

5.

H t

Number of Hidden layers at time t

6.

Y t

Output symbol at time t

7.

V t

Current word symbol at time t

8.

W

Weight matrix of the input vector

9

U

Recurrent weight matrix

10.

B

Bias of the network

11.

V

Weight of the hidden layer

12.

f t

Activation function of forget gate at time t

13.

W f

It is the weight matrix of forget gate

14.

B f

Bias of the forget gate

15.

Σ

Sigmoid function

16

i t

Activation function of input gate at time t

17.

\( \hat{C}_{t} \)

Cell input activation vector

18.

W c

Weight matrix of the candidate

19.

b c

Bias of the candidate cell

20.

C t

Cell state at time t

21.

O t

Output at time t

22.

W o

Weight matrix of output gate

Appendix B: List of abbreviations

Sr. no.

Abbreviation

Full-form

1.

RNN

Recurrent neural network

2

BiRNN

Bidirectional recurrent neural network

3.

LSTM

Long short-term memory

4.

AI

Artificial intelligence

5.

DL

Deep-learning

6.

NLP

Natural language processing

7.

SMS

Short message service

8.

POS

Part-of-speech

9.

HMM

Hidden Markov model

10.

REDM

Reverse edit distance model

11.

FSA

Finite state automata

12.

FSR

Finite state representation

13.

SCRNN

Semi character recurrent neural network

14.

FFNN

Feed-forward neural network

15.

BPTT

Backpropagation through time

16.

En-De RNN

Encoder–decoder recurrent neural network

17.

PoO

Probability of occurrence

18.

FAQ

Frequently asked question

19.

CBOW

Continuous bag of word

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Singh, S., Singh, S. HINDIA: a deep-learning-based model for spell-checking of Hindi language. Neural Comput & Applic 33, 3825–3840 (2021). https://doi.org/10.1007/s00521-020-05207-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-05207-9

Keywords

Navigation