Abstract
The spelling error is a mistake occurred while typing the text document. The applications like search engines, information retrieval, emails, etc., require user typing. In such applications, good spell-checker is essential to rectify the misspelling. Spell-checkers for western languages like English are very powerful and can handle any type of spelling errors, whereas in the case of Indian languages like Hindi, Urdu, Bengali, Kannada, Assamese, etc., the available spell-checkers are very basic ones. These spell-checkers are developed using traditional methods like statistical methods and rule-based methods. This article presents a novel model HINDIA to handle the spelling errors of the Hindi language, one of the most spoken languages in India. It utilizes a deep-learning method for spelling error detection and correction. The proposed spell-checking model works in two phases. In the first phase model identifies the erroneous words in the input sample and in the second phase it replaces the wrong words with the most probable correct words. Model HINDIA is developed using the attention-based encoder–decoder bidirectional recurrent neural network (BiRNN) which uses long short-term memory cells. Several modifications in the BiRNN have been made and network is fine-tuned to process the spelling errors of Hindi language. It uses publicly available dataset ‘monolingual corpus’ developed by IIT Mumbai for training and testing. The performance of the proposed model is evaluated in two scenarios. In the first scenario where the testing dataset is generated using split function. HINDIA performs significantly well with precision 0.86, recall 0.72, f-measure 0.78 and accuracy 0.80. Further, in the second scenario, where a dataset is manually generated its performance is fairly good with precision 0.81, recall 0.72, f-measure 0.76 and accuracy 0.74. Model HINDIA gives better performance than the deep-learning-based Malayalam spell-checker and some other deep-learning-based correction models present in the literature.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Uddin MZ, Hassan MM (2019) Activity recognition for cognitive assistance using body sensors data and deep convolutional neural network. IEEE Sens J 19(19):8413–8419
Hassan MM, Uddin MZ, Mohamed A, Almogren A (2018) A robust human activity recognition system using smartphone sensors and deep learning. Futur Gener Comput Syst 81:307–313
Reshma U, Ganesh HBB, Mandar K, Mankame P, Kulkarni G (2018) Deep learning for digital text analytics: sentiment analysis, pp 1–8. arXiv Prepr. arXiv:1804.03673
Dumais S, Cutrell E, Cadiz J, Jancke G, Sarin R, Robbins DC (2003) Stuff I’ve seen: a system for personal information retrieval and re-use. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval—SIGIR’03, vol 49, no. 2, p 72
Zhou P, Qi Z, Zheng S, Xu J, Bao H, Xu B (2016) Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. arXiv Prepr. arXiv:1611.06639
Plank B, Søgaard A, Goldberg Y (2016) Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. arXiv Prepr. arXiv:1604.05529
Xie Z, Avati A, Arivazhagan N, Jurafsky D, Ng AY (2016) Neural language correction with character-based attention. arXiv:1603.09727v1
Uzzaman N, Khan M (2006) A comprehensive Bangla spelling checker. BRAC University, Dhaka
Choudhury R, Deb N, Kashyap K (2019) Context sensitive spelling checker for Assamese language. In: Kalita J, Balas VE, Borah S, Pradhan R (eds) Recent developments in machine learning and data analytics. Springer, Singapore, pp 177–188
Korhonen T (2008) Adaptive spell checker for dyslexic writers. In: Miesenberger K, Klaus J, Zagler W, Karshmer A. In: Comput. help. people with spec. needs. ICCHP 2008. Lect. notes comput. sci., vol 5105, pp 733–741
Lai KH, Topaz M, Goss FR, Zhou L (2015) Automated misspelling detection and correction in clinical free-text records. J Biomed Inf 55:188–195
Singh SP, Kumar A, Singh L, Bhargava M, Goyal K, Sharma B (2016) Frequency based spell checking and rule based grammar checking. In: International conference on electrical, electronics, and optimization techniques, ICEEOT 2016, pp 4435–4439
Liu PLT, Paas F (2017) Effects of spell checkers on english as a second language students’ incidental spelling learning: a cognitive load perspective. Read Writ 30(7):1501–1525
Al-hussaini L (2017) Experience: insights into the benchmarking data of hunspell and aspell spell checkers. ACM J Data Inf Qual 8(3):1–10
Octaviano M, Borra A (2017) A spell checker for a low-resourced and morphologically rich language. In: Proceedings of the 2017 IEEE region 10 conference (TELCON), pp 1853–1856
Rajashekara Murthy, S Akshatha AN, Upadhyaya CG, Ramakanth Kumar P (2017) Kannada spell checker with sandhi splitter. In: International conference on advances in computing, communications and informatics, ICACCI 2017, pp 950–956
Das M, Borgohain S, Gogoi J, Nair SB (2002) Design and implementation of a spell checker for assamese. In: Language engineering conference, proceedings IEEE, pp 156–162
Manohar N, Lekshmipriya PT, Jayan V, Bhadran VK (2015) Spellchecker for Malayalam using finite state transition models. In: IEEE recent advances in intelligent computational systems, RAICS 2015, pp 157–161
Dhanabalan T, Parthasarathi R, Geetha TV (2003) Tamil spell checker. In: Sixth tamil internet conference, Chennai, Tamilnadu, India, pp 18–27
Christopher M, Uma Maheshwar Rao G, Amba PK, (2012) Telugu spell-checker. In: International Telugu internet conference proceedings, pp 1–8
Singh S, Singh S (2018) Review of real-word error detection and correction methods in text documents. In: 2018 second international conference on electronics, communication and aerospace technology (ICECA), pp 1076–1081
Jain A, Jain M, Jain G, Tayal DK (2018) ‘UTTAM’ An efficient spelling correction system for Hindi language based on supervised learning. ACM Trans Asian Low-Resour Lang Inf Process 18(1):1–26
Rajashekara MS, Madi V, Sachin D, Ramakanth PK (2012) A non-word kannada spell checker using morphological analyzer and dictionary lookup method. Int J Eng Sci Emerg Technol 2(2):43–52
Segar J, Sarveswaran K (2015) Contextual spell checking for Tamil language. In: 14th Tamil internet conference, pp 1–5
Fossati D, Di Eugenio B (2007) I Saw TREE trees in the park : how to correct real-word spelling mistakes. In: LREC, pp 896–901
Jain U, Kaur J (2015) Text chunker for Punjabi. Int J Curr Eng Technol 5(5):3349–3353
Abdullah M, Islam Z, Khan M (2007) Error-tolerant finite-state recognizer and string pattern similarity based spelling-checker for Bangla. In: Proceeding of 5th international conference on natural language processing (ICON)
Naseem T, Hussain S (2007) A Novel approach for ranking spelling error corrections for Urdu. Lang Resour Eval 41(2):117–128
Iqbal S, Anwar W, Bajwa UI, Rehman Z (2013) Urdu spell checking : reverse edit distance approach. In: Proceedings of the 4th workshop on south and southeast asian natural language processing, pp 58–65
Ghosh S, Kristensson PO (2015) Neural networks for text correction and completion in keyboard decoding. J Letex Cl Files 14(8):1–14
Sakaguchi K, Duh K, Post M, Van Durme B (2017) Robsut wrod reocginiton via semi-character recurrent neural network. In: Thirty-first AAAI conference on artificial intelligence, pp 3281–3287
Sooraj S, Manjusha K, Anand Kumar M, Soman KP (2018) Deep learning based spell checker for malayalam language. J Intell Fuzzy Syst 34(3):1427–1434
Gumaei A, Hassan MM, Alelaiwi A, Alsalman H (2019) A hybrid deep learning model for human activity recognition using multimodal body sensing data. IEEE Access 7:99152–99160
Uddin MZ, Hassan MM, Alsanad A, Savaglio C (2020) A body sensor data fusion and deep recurrent neural network-based behavior recognition approach for robust healthcare. Inf Fusion 55:105–115
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Gers FA, Schraudolph NN, Schmidhuber J (2002) Learning precise timing with LSTM recurrent networks. J Mach Learn Res 3(1):115–143
Cui Z, Ke R, Wang Y (2018) Deep bidirectional and unidirectional LSTM recurrent neural network for network-wide traffic speed prediction. pp 1–11
Kim Y (2014) Convolutional neural networks for sentence classification. arXiv:1408.5882
Bowman SR, Vilnis L, Vinyals O, Dai AM, Jozefowicz R, Bengio S (2016) Generating sentences from a continuous space. In: CoNLL 2016 - 20th SIGNLL conf. comput. nat. lang. learn. proc., pp 10–21
Tong E, Jones C, Zadeh A, Morency LP (2017) Combating human trafficking with deep multimodal models. In: ACL 2017—55th annu. meet. assoc. comput. linguist. proc. conf. (Long Pap.) vol 1, pp 1547–1556
Young T, Hazarika D, Poria S, Cambria E (2018) Recent trends in deep learning based natural language processing. IEEE Comput Intell Mag 13(3):55–75
Homma Y, Sy S, Yeh C (2016) Detecting duplicate questions with deep learning. In: 30th conference on neural information processing systems (NIPS 2016), pp 1–8
Kunchukuttan A, Mehta P, Bhattacharyya P (2018) The IIT Bombay English-Hindi parallel corpus. In: Language resources and evaluation conference
Bojar O et al (2014) HindiEnCorp- Hindi-English and Hindi only corpus for machine translation. In: Ninth workshop on statistical machine translation, pp 3550–3555
Kaur B, Singh H (2015) Design and implementation of HINSPELL—Hindi spell checker using hybrid approach. Int J Sci Res Manag 3(2):20158–22062
Acknowledgement
The authors thank the reviewers for their insightful comments. The authors would also like to thank the Ministry of Electronics and IT, Government of INDIA, for providing fellowship under Grant Number: PhD-MLA-4 (69)/2015-16 (Visvesvaraya PhD Scheme for Electronics and IT) to pursue Ph.D. work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: List of variables used in the article and their definitions
Sr. no. | Variables | Definition |
---|---|---|
1. | X = (X1, X2, X3, …, Xn) | Fixed size input vector |
2. | H = (H0, H1, H2, …, Hm) | Hidden layers |
3. | Y = (Y0, Y1, Y2,…,Yk) | Output symbols |
4. | X t | Input to the network at time t |
5. | H t | Number of Hidden layers at time t |
6. | Y t | Output symbol at time t |
7. | V t | Current word symbol at time t |
8. | W | Weight matrix of the input vector |
9 | U | Recurrent weight matrix |
10. | B | Bias of the network |
11. | V | Weight of the hidden layer |
12. | f t | Activation function of forget gate at time t |
13. | W f | It is the weight matrix of forget gate |
14. | B f | Bias of the forget gate |
15. | Σ | Sigmoid function |
16 | i t | Activation function of input gate at time t |
17. | \( \hat{C}_{t} \) | Cell input activation vector |
18. | W c | Weight matrix of the candidate |
19. | b c | Bias of the candidate cell |
20. | C t | Cell state at time t |
21. | O t | Output at time t |
22. | W o | Weight matrix of output gate |
Appendix B: List of abbreviations
Sr. no. | Abbreviation | Full-form |
---|---|---|
1. | RNN | Recurrent neural network |
2 | BiRNN | Bidirectional recurrent neural network |
3. | LSTM | Long short-term memory |
4. | AI | Artificial intelligence |
5. | DL | Deep-learning |
6. | NLP | Natural language processing |
7. | SMS | Short message service |
8. | POS | Part-of-speech |
9. | HMM | Hidden Markov model |
10. | REDM | Reverse edit distance model |
11. | FSA | Finite state automata |
12. | FSR | Finite state representation |
13. | SCRNN | Semi character recurrent neural network |
14. | FFNN | Feed-forward neural network |
15. | BPTT | Backpropagation through time |
16. | En-De RNN | Encoder–decoder recurrent neural network |
17. | PoO | Probability of occurrence |
18. | FAQ | Frequently asked question |
19. | CBOW | Continuous bag of word |
Rights and permissions
About this article
Cite this article
Singh, S., Singh, S. HINDIA: a deep-learning-based model for spell-checking of Hindi language. Neural Comput & Applic 33, 3825–3840 (2021). https://doi.org/10.1007/s00521-020-05207-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-05207-9