Skip to main content
Log in

Natural language processing for Nepali text: a review

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Because of the proliferation of Nepali textual documents online, researchers in Nepal and overseas have started working towards its automated analysis for quick inferences, using different machine learning (ML) algorithms, ranging from traditional ML-based algorithms to recent deep learning (DL)-based algorithms. However, researchers are still unaware about the recent trends of NLP research direction in the Nepali language. In this paper, we survey different natural language processing (NLP) research works with associated resources in Nepali language. Furthermore, we organize the NLP approaches, techniques, and application tasks used in the Nepali language processing using the comprehensive taxonomy for each of them. Finally, we discuss and analyze based on such assimilated information for further improvement in NLP research works in the Nepali language. Our thorough survey bestows the detailed backgrounds and motivations to researchers, which not only opens up new potential avenues but also ushers towards further progress of NLP research works in the Nepali language.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://www.mpp.org.np, (accessed date: 02/07/2021).

  2. www.ltk.org.np, (accessed date: 02/07/2021).

  3. http://www.elra.info/en/catalogues/free-resources/nepali-corpora/ (accessed date: 17/02/2021).

  4. https://data.ldcil.org/a-gold-standard-nepali-raw-text-corpus (accessed at 17/02/2021).

  5. https://ieee-dataport.org/open-access/large-scale-nepali-text-corpus (accessed date: 16/02/2021).

  6. https://github.com/sndsabin/Nepali-News-Classifier (accessed date: 17/01/2021), Information and Language Processing Research Lab, Kathmandu University, Nepal.

  7. https://www.kaggle.com/ashokpant/nepali-news-dataset-large (accessed date :16/02/2021).

  8. https://ieee-dataport.org/documents/nepaliliinguistic (accessed date: 16/02/2021).

  9. http://xixona.dlsi.ua.es/~fran/apertium2-documentation.pdf (accessed date: 13/02/2021).

  10. https://ekantipur.com/ (accessed date: 13/02/2021).

  11. https://nagariknews.nagariknetwork.com/ (accessed date: 13/02/2021).

  12. http://www.statmt.org/moses/.

  13. https://github.com/moses-smt/giza-pp.

  14. https://sourceforge.net/projects/irstlm/.

  15. https://www.cstr.ed.ac.uk/projects/festival/.

References

  • Acharya P, Bal BK (2018) A comparative study of SMT and NMT: case study of English-Nepali language pair. In: SLTU, pp 90–93

  • Acharya S, Pant AK, Gyawali PK (2015) Deep learning based large scale handwritten devanagari character recognition. In: 2015 9th International conference on software, knowledge, information management and applications (SKIMA). IEEE, pp 1–6

  • Adhikari A, Ghimire S (2019) Nepali image captioning. In: 2019 artificial intelligence for transforming business and society (AITB), IEEE 1:1–6

  • Bachchan RK, Timalsina AK (2018) Plagiarism detection framework using monte carlo based artificial neural network for Nepali language. 2018 IEEE 3rd international conference on computing. Communication and security (ICCCS). IEEE, pp 122–127

  • Bal BK (2009) Towards building advanced natural language applications–an overview of the existing primary resources and applications in Nepali. In: Proceedings of the 7th workshop on Asian language resources (ALR7), Association for Computational Linguistics, Suntec, Singapore, pp 165–170

  • Bal BK, Shrestha P (2004) A morphological analyzer and a stemmer for Nepali. PAN Localization, Working Papers 2007:324–331

  • Bal BK, Shrestha P, Pustakalaya MP, PatanDhoka N (2007) Architectural and system design of the Nepali grammar checker. PAN Localization Working Paper

  • Bam S, Shahi T (2014) Named entity recognition for Nepali text using support vector machines. Intell Inf Manag 6(2):21–29. https://doi.org/10.4236/iim.2014.62004

    Article  Google Scholar 

  • Basnet A, Timalsina AK (2018) Improving Nepali news recommendation using classification based on LSTM recurrent neural networks. In: 2018 IEEE 3rd international conference on computing. Communication and Security (ICCCS), IEEE, pp 138–142

  • Basnet A, Timalsina AK (2018) Improving Nepali news recommendation using classification based on lstm recurrent neural networks. In: Proceedings of international conference on computing, Communication and Security (ICCCS), pp 138–142

  • Bhala RV, Abirami S (2014) Trends in word sense disambiguation. Artif Intell Rev 42(2):159–171

    Article  Google Scholar 

  • Bhat SM, Rai R (2012) Building morphological analyzer for Nepali. J Modern Lang 22(1):45–58

    Google Scholar 

  • Bista S, Keshari B, Bhatta J, Parajuli K (2005) Dobhase: online English to Nepali machine translation system. In: The proceedings of the 26th Annual conference of the Linguistic Society of Nepal

  • Bista S, Khatiwada L, Keshari B (2004) Nepali lexicon development. PAN Localization, Working Papers 2007:311–15

  • Borah S, Choden U, Lepcha N (2017) Design of a morph analyzer for non-declinable adjectives of nepali language. In: Proceedings of the 2017 international conference on machine learning and soft computing, pp 126–130

  • Brown PF, Della Pietra VJ, Desouza PV, Lai JC, Mercer RL (1992) Class-based n-gram models of natural language. Comput Linguist 18(4):467–480

    Google Scholar 

  • Carpuat M, Wu D (2007) Improving statistical machine translation using word sense disambiguation. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), pp 61–72

  • Chhetri I, Dey G, Das SK, Borah S (2015) Development of a morph analyser for Nepali noun token. In: 2015 international conference on advances in computer engineering and applications. IEEE, pp 984–987

  • Choudhary N, Ramamoorthy L (2019) LDC-IL raw text corpora: an overview. Linguistic resources for AI/NLP in Indian languages. Central Institute of Indian Languages, Mysuru pp 1–10

  • Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Dangol D, Shrestha RD, Timalsina A (2018) Automated news classification using n-gram model and key features of Nepali language. SCITECH Nepal 13(1):64–69

    Article  Google Scholar 

  • Daud A, Khan W, Che D (2017) Urdu language processing: a survey. Artif Intell Rev 47(3):279–311

    Article  Google Scholar 

  • Dey A, Paul A, Purkayastha BS (2014) Named entity recognition for Nepali language: a semi hybrid approach. Int J Eng Innov Technol (IJEIT) 3:21–25

    Google Scholar 

  • Dhungana UR, Shakya S (2014) Word sense disambiguation in Nepali language. In: 2014 Fourth international conference on digital information and communication technology and its applications (DICTAP). IEEE, pp 46–50

  • Ekbal A, Bandyopadhyay S (2008) Bengali named entity recognition using support vector machine. In: Proceedings of the IJCNLP-08 workshop on named entity recognition for south and south east Asian Languages

  • Gupta CP, Bal BK (2015) Detecting sentiment in Nepali texts: a bootstrap approach for sentiment analysis of texts in the Nepali language. In: 2015 international conference on cognitive computing and information processing (CCIP). IEEE, pp 1–4

  • Guzmán F, Chen P, Ott M, Pino J, Lample G, Koehn P, Chaudhary V, Ranzato M (2019) Two new evaluation datasets for low-resource machine translation: Nepali-English and Sinhala-English. CoRR abs/1902.01382. http://arxiv.org/abs/1902.01382

  • Hung C, Chen SJ (2016) Word sense disambiguation based sentiment lexicons for sentiment classification. Knowl-Based Syst 110:224–232

    Article  Google Scholar 

  • Kafle K, Sharma D, Subedi A, Timalsina AK (2016) Improving Nepali document classification by neural network. In: Proceedings of IOE graduate conference, pp 317–322

  • Khanal R (2019) Linguistic geography of nepalese languages. Third Pole J Geogr Educ 18:45–54. https://doi.org/10.3126/ttp.v18i0.27994

    Article  Google Scholar 

  • Khatiwada R (2009) Nepali. J Int Phon Assoc 39(3):373–380

    Article  Google Scholar 

  • Lamsal R (2020) A large scale Nepali text corpus. IEEEdataport. https://doi.org/10.21227/jxrd-d245

  • Lappin S, Leass HJ (1994) An algorithm for pronominal anaphora resolution. Comput Linguist 20(4):535–561

    Google Scholar 

  • Laskar SR, Pakray P, Bandyopadhyay S (2019) Neural machine translation: Hindi-Nepali. In: Proceedings of the fourth conference on machine translation (Volume 3: Shared Task Papers, Day 2), pp 202–207

  • Lesk M (1986) Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th annual international conference on Systems documentation, pp 24–26

  • Lewis DD (1998) Naive (bayes) at forty: the independence assumption in information retrieval. In: European conference on machine learning. Springer, pp 4–15

  • MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Oakland, CA, USA 1:281–297

  • Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:13013781

  • Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41

    Article  Google Scholar 

  • Pant AK, Panday SP, Joshi SR (2012) Off-line nepali handwritten character recognition using multilayer perceptron and radial basis function neural networks. In: 2012 third Asian Himalayas international conference on internet, IEEE, pp 1–5

  • Pant N, Bal BK (2016) Improving Nepali ocr performance by using hybrid recognition approaches. In: 2016 7th international conference on information, intelligence, systems & applications (IISA). IEEE, pp 1–6

  • Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp 311–318

  • Paul A, Purkayastha BS (2018) English to Nepali statistical machine translation system. In: Proceedings of the international conference on computing and communication systems. Springer, pp 423–431

  • Paul A, Purkayastha BS, Sarkar S (2015) Hidden Markov model based part of speech tagging for Nepali language. In: 2015 international symposium on advanced computing and communication (ISACC). IEEE, pp 149–156

  • Piryani R, Piryani B, Singh VK, Pinto D (2020) Sentiment analysis in Nepali: exploring machine learning and lexicon-based approaches. J Intell Fuzzy Syst (Preprint):1–12

  • Poli R, Kennedy J, Blackwell T (2007) Particle swarm optimization. Swarm Intell 1(1):33–57

    Article  Google Scholar 

  • Prabha G, Jyothsna P, Shahina K, Premjith B, Soman K (2018) A deep learning approach for part-of-speech tagging in nepali language. In: 2018 international conference on advances in computing. Communications and informatics (ICACCI). IEEE, pp 1132–1136

  • Prajwal R, Prasad KL, Bal BK (2008) Report on Nepali computational grammar. Madan Puraskar Pustakalaya https://www.academia.edu/2414578/Report_on_Nepali_Computational_Grammar

  • Prasain B (2008) Computational analysis of Nepali basic verbs (written forms). Nepalese Linguistics 23:262–270

    Google Scholar 

  • Prasain B, Khatiwada L, Bal B, Shrestha P (2008) Part-of-speech tagset for Nepali. Madan Puraskar Pustakalaya, Unpublished

  • Regmi S, Bal BK, Kultsova M (2017) Analyzing facts and opinions in Nepali subjective texts. In: 2017 8th international conference on information, intelligence, systems & applications (IISA). IEEE, pp 1–4

  • Salton G, McGill MJ (1983) Introduction to modern information retrieval. Mcgraw-Hill, New York

    MATH  Google Scholar 

  • Sarkar S, Roy A, Purkayastha B (2014) A comparative analysis of particle swarm optimization and K-means algorithm for text clustering using Nepali wordnet. Int J Nat Lang Comput (IJNLC) 3(3):83–92. http://www.airccse.org/journal/ijnlc/papers/3314ijnlc08.pdf

    Article  Google Scholar 

  • Senapati A, Poudyal A, Adhikary P, Kaushar S, Mahajan A, Saha BN (2020) A machine learning approach to anaphora resolution in Nepali language. In: 2020 international conference on computational performance evaluation (ComPE). IEEE, pp 436–441

  • Shah KB, Chaudhary KK, Ghimire A (2018) Nepali text to speech synthesis system using FreeTTS. SCITECH Nepal 13(1):24–31

    Article  Google Scholar 

  • Shahi TB, Dhamala TN, Balami B (2013) Support vector machines based part of speech tagging for Nepali text. Int J Comput Appl 70(24):38–42. https://doi.org/10.5120/12217-8374

    Article  Google Scholar 

  • Shahi TB, Pant AK (2018) Nepali news classification using naïve bayes, support vector machines and neural networks. In: 2018 International conference on communication information and computing technology (ICCICT). IEEE, pp 1–5

  • Shahi TB, Shakya S (2018) Nepali SMS filtering using decision trees, neural network and support vector machine. In: 2018 international conference on advances in computing. Communication Control and Networking (ICACCCN). IEEE, pp 1038–1042

  • Shahi TB, Yadav A et al (2014) Mobile sms spam filtering for Nepali text using naïve bayesian and support vector machine. Int J Intell Sci 4(01):24–28

    Article  Google Scholar 

  • Shrestha BB, Bal BK (2020) Named-entity based sentiment analysis of Nepali news media texts. In: Proceedings of the 6th workshop on natural language processing techniques for educational applications, pp 114–120

  • Shrestha I, Dhakal SS (2016) A new stemmer for Nepali language. In: 2016 2nd international conference on advances in computing, communication, & automation (ICACCA). IEEE, pp 1–5

  • Shrestha N, Hall PA, Bista SK (2008) Resources for nepali word sense disambiguation. In: 2008 international conference on natural language processing and knowledge engineering. IEEE, pp 1–5

  • Singh OM, Padia A, Joshi A (2019) Named entity recognition for nepali language. In: 2019 IEEE 5th international conference on collaboration and internet computing (CIC). IEEE, pp 184–190

  • Singh OM, Timilsina S, Bal BK, Joshi A (2020) Aspect based abusive sentiment detection in Nepali social media texts. In: 2020 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM). IEEE, pp 301–308

  • Singh J, Gupta V (2017) A systematic review of text stemming techniques. Artif Intell Rev 48(2):157–217

    Article  Google Scholar 

  • Sitaula C (2012) Semantic text clustering using enhanced vector space model using Nepali language. Comput Sci Telecommun 4:41–46

    Google Scholar 

  • Sitaula C (2013) A hybrid algorithm for stemming of Nepali text. Intell Inf Manag. https://doi.org/10.4236/iim.2013.54014

    Article  Google Scholar 

  • Sitaula C (2014) Semantic orientation of texts using iterative finite state machine. J Comput Sci Control Syst 7(1):51

    Google Scholar 

  • Sitaula C, Ojha YR (2013) Semantic sentence similarity using finite state machine. Intell Inf Manag 5(6):171–174

    Google Scholar 

  • Sitaula C, Basnet A, Aryal S (2021) Vector representation based on a supervised codebook for nepali documents classification. PeerJ Comput Sci 7:e412

    Article  Google Scholar 

  • Subba S, Paudel N, Shahi TB (2019) Nepali text document classification using deep neural network. Tribhuvan Univ J 33(1):11–22

    Article  Google Scholar 

  • Tamrakar S, Bal BK, Thapa RB (2020) Aspect based sentiment analysis of Nepali text using support vector machine and naive bayes. Tech J 2(1):22–29

    Article  Google Scholar 

  • Taylor P, Black AW, Caley R (1998) The architecture of the festival speech synthesis system. In: The third ESCA/COCOSDA workshop (ETRW) on speech synthesis

  • Thakur SK, Singh VK (2014) A lexicon pool augmented Naive Bayes classifier for Nepali text. In: Proceedings of seventh international conference on contemporary computing (IC3), pp 542–546

  • Thapa LBR, Bal BK (2016) Classifying sentiments in Nepali subjective texts. In: 2016 7th international conference on information, intelligence, systems & applications (IISA). IEEE, pp 1–6

  • Wang Y, Wang M, Fujita H (2020) Word sense disambiguation: a comprehensive knowledge exploitation framework. Knowl-Based Syst 190(105):030. https://doi.org/10.1016/j.knosys.2019.105030

    Article  Google Scholar 

  • Yadava YP, Hardie A, Lohani RR, Regmi BN, Gurung S, Gurung A, McEnery T, Allwood J, Hall P (2008) Construction and annotation of a corpus of contemporary Nepali. Corpora 3(2):213–225

    Article  Google Scholar 

  • Yajnik A (2017) Part of speech tagging using statistical approach for Nepali text. World Acad Sci Eng Technol Int J Comput Electr Autom Control Inf Eng 11(1):76–79

    Google Scholar 

  • Yajnik A (2018) Ann based pos tagging for nepali text. Int J Nat Lang Comput 7:13–18

    Article  Google Scholar 

  • Zhong Z, Ng HT (2012) Word sense disambiguation improves information retrieval. In: Proceedings of the 50th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 273–282

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tej Bahadur Shahi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shahi, T.B., Sitaula, C. Natural language processing for Nepali text: a review. Artif Intell Rev 55, 3401–3429 (2022). https://doi.org/10.1007/s10462-021-10093-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-021-10093-1

Keywords

Navigation