Skip to main content

Advertisement

Log in

In search of a suitable method for disambiguation of word senses in Bengali

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

The paper presents a study on word sense disambiguation (WSD) in Bengali, one of the less resourced Indian languages. The overall work is carried out in two sequential phases. In the first phase, four well-known approaches, which are often applied for sense disambiguation of words, are studied using the traditional methods. In the course of application, suitable modifications are made as well as implemented for eliciting desired results. In the second stage, a combined approach is proposed based on the results obtained from initial experiments. Within ‘supervised module’ the four commonly used methods, namely, the Decision Tree (DT) method, Support Vector Machine (SVM) method, Artificial Neural Network (ANN) method, and the Naïve Bayes (NB) method are used at the baseline for the purpose of classification of senses. These baseline strategies produced 63.84%, 76.9%, 76.23%, and 80.23% accurate results, respectively, when these methods are tested on 13 mostly used Bengali ambiguous words retrieved from a Bengali text corpus. Next, two major modifications are applied on these baseline strategies to increase the level of accuracy: (a) incorporation of Lemmatization process in the system (that produces 68.30%, 79%, 78.23%, and 82.30% accurate results, respectively), and (b) operation of Bootstrapping on the systems (including lemmatization feature), which produces 70.92%, 79.15%, 79.53%, and 83% accuracy, respectively. Next, in a knowledge-based method, the traditional Lesk algorithm is implemented at the baseline which produces 31% accurate result in sense disambiguation. This strategy is further modified by Context Expansion (CE) method in the sentences using the Bengali WordNet to produce 75% accuracy. Within ‘unsupervised module’, the baseline strategy produced a 36.2% accurate result in sense disambiguation task. To enhance the level of performance, two modifications are adopted over this baseline strategy: (a) Principal Component Analysis (PCA) over the feature vector, which produces 51.2% accuracy in WSD task, and (b) Context Expansion of the sentences using the Bengali WordNet with PCA, which produces 61% accuracy in sense disambiguation task. Finally, a combined approach is adopted after considering all the effective aspects of the three methods, and it produces the highest level accuracy (92%) in the task of sense disambiguation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

Notes

  1. Bengali WordNet has an online interface at http://www.cfilt.iitb.ac.in/indowordnet/.

References

  • Abualhaija, S. (2016). D-Bees: A novel global algorithm for solving word sense disambiguation. Technical University of Hamburg, Germany, 2016, 1–141.

    Google Scholar 

  • Agirre, E., & Edmonds, P. (Eds.). (2007). Word sense disambiguation-algorithms and applications (Vol. 33). New York: Springer.

    Google Scholar 

  • Aung, N. T., Soe, K. M., & Thein, N. L. (2011). A word sense disambiguation system using Naïve Bayes algorithm for Myanmar language. International Journal of Scientific and Engineering Research, 2(9), 1–7.

    Google Scholar 

  • Bala, P. (2013). Knowledge-Based Approach for Word sense disambiguation using Hindi WordNet. The International Journal of Engineering and Science (IJES), 2(4), 36–41.

    Google Scholar 

  • Banerjee, S., & Pedersen, T. (2002). An adapted Lesk algorithm for word sense disambiguation using WordNet. In Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing, (pp. 136–145).

  • Basile, P., Caputo, A., & Semeraro, G. (2014). An enhanced Lesk word sense disambiguation algorithm through a distributional semantic model. COLING, 2014, 1591–1600.

    Google Scholar 

  • Biswas, S., Dasgupta, S., Bhattacharya, D., & Bhattacharya, S. (2011). Sansad BAnglA AvidhAn, Published by SAhitya Sansad.

  • Bouhriz, N., Benabbou, F., & Lahmar, E. H. B. (2016). Word sense disambiguation approach for Arabic text. International Journal of Advanced Computer Science and Applications, 7(4), 381–385.

    Article  Google Scholar 

  • Das, A., & Sarkar, S. (2013). Word Sense Disambiguation in Bengali applied to BengaliHindi Machine Translation. In The processing of International Conference on Natural Language Processing (ICON).

  • Dash, N. S. (2000). The process of designing a multi-disciplinary monolingual sample corpus. International Journal of Corpus Linguistics, 5(2), 179–197.

    Article  Google Scholar 

  • Dash, N. S. (2002). Lexical polysemy in Bangla: A corpus-based study. PILC Journal of Dravidic Studies, 12(1–2), 203–214.

    Google Scholar 

  • Dash, N. S. (2017). Defining language-specific synsets in IndoWordNet: Some theoretical and practical issues. In N. S. Dash, P. Bhattacharyya, & J. Pawar (Eds.), The WordNet In Indian languages (pp. 45–64). Singapore: Springer.

    Chapter  Google Scholar 

  • Dash, N. S., Bhattacharyya, P., & Pawar, J. (Eds.). (2017). The WordNet in Indian Languages. Singapore: Springer.

  • Dhungana, U. R., & Shakya, S. (2014). Word sense disambiguation in the Nepali language. In Fourth International Conference on Digital Information and Communication Technology and it‘s Applications (DICTAP) (pp. 46–50)

  • Gaurav. (2013). Probabilistic latent semantic analysis for unsupervised word sense disambiguation. International Journal of Computer Science Issues, 10(5), 127–133.

    Google Scholar 

  • Haque, A., & Hoque, M. M. (2016). Bangla word sense disambiguation system using dictionary-based approach.

  • Haroon, R. P. (2010). Malayalam word sense disambiguation. In Computational Intelligence and Computing Research (ICCIC). IEEE.

  • Hdni, M. (2016). Word sense disambiguation for arabic text categorization. The International Arab Journal of Information Technology, 13, 215–222.

    Google Scholar 

  • Ide, N., & Véronis, J. (1998). Word sense disambiguation: The state of the art. Computational Linguistics, 24(1), 1–40.

    Google Scholar 

  • Kalita, P., & Barman, A. K. (2015). Implementation of Walker algorithm in word sense disambiguation for the Assamese language. In International Symposium on Advanced Computing and Communication (ISACC) (pp. 136–140).

  • Kumar, R., Goyal, V., & Khanna, R. (2013). N-gram based word sense disambiguation of Hindi post position (sē) in the context of Hindi to Punjabi machine translation system. An International Journal of Engineering Sciences, 9, 59–67.

    Google Scholar 

  • Kumari, S., & Singh, P. (2013). Optimized word sense disambiguation in Hindi using genetic algorithm. International Journal of Research in Computer and Communication Technology, 2(7), 445–449.

    Google Scholar 

  • Menai, M. E. B. (2014). Word sense disambiguation using an evolutionary approach. Informatica, 38, 155–170.

    Google Scholar 

  • Merhben, L., Zouaghi, A., & Zrigui, M. (2010). Ambiguous Arabic words disambiguation. In 11th ACIS International Conference on Software Engineering (pp. 157–164). Networking and Parallel/Distributed Computing: Artificial Intelligence.

  • Merhbene, L., Zouaghi, A., & Zrigui, M. (2013). A semi-supervised method for arabic word sense disambiguation using a weighted directed graph. In International Joint Conference on Natural Language Processing (pp. 1027–1031).

  • Navigli, R. (2009). Word sense disambiguation: A survey. ACM Computing Surveys, 41(2), 1–69.

    Article  Google Scholar 

  • Nazah, S., Hoque, M. M., & Hossain, R. (2017). Word sense disambiguation of Bangla sentences using a statistical approach. In 3rd International Conference on Electrical Information and Communication Technology (EICT) (pp. 1–6).

  • Pal, A. R., & Saha, D. (2017). Word sense disambiguation in Bengali: an unsupervised approach. In IEEE Second International Conference on Electrical, Computer and Communication Technologies (pp. 1369–1373).

  • Pal, A. R., Saha, D., Dash, N. S., & Pal, A. (2018). Word sense disambiguation/in Bangla language using supervised methodology with necessary modifications. Journal of The Institution of Engineers, 99(5), 519–526.

    Google Scholar 

  • Pal, A. R., Saha, D., & Naskar, S. K. (2017). Word sense disambiguation in Bengali: a knowledge based approach using Bengali WordNet. In IEEE Second International Conference on Electrical, Computer and Communication Technologies (pp. 1363–1368).

  • Pandit, R., & Naskar, S. K. (2015). A memory-based approach to word sense disambiguation in Bangla using k-NN method. In IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS) (pp. 383–386).

  • Parameswarappa, S., & Narayana, V. N. (2013). Kannada word sense disambiguation using decision list. International Journal of Emerging Trends and Technology in Computer Science (IJETTCS), 2(3), 272–278.

    Google Scholar 

  • Rana, P., & Kumar, P. (2015). Word sense disambiguation for punjabi language using overlap based approach. Advances in Intelligent Systems and Computing, 606–619.

  • Roy, A., Sarkar, S., & Purkayastha, B. S. (2014). Knowledge-based approaches to Nepali Word sense disambiguation. International Journal on Natural Language Computing (IJNLC), 33, 51–63.

    Article  Google Scholar 

  • Sarmah, J., & Sarma, S. K. (2016). Decision tree based word sense disambiguation for Assamese. In International Journal of Computer Applications (Vol. 141, pp. 42–48).

  • Singh, S. (2013). Hindi Word sense disambiguation using semantic relatedness measure. International Workshop on Multi-disciplinary Trends in Artificial Intelligence, Springer, 2013, 247–256.

    Google Scholar 

  • Singh, R. L., Ghosh, K., Nongmeikapam, K., & Bandyopadhyay, S. (2014). A decision tree-based word sense disambiguation system in Manipuri language. Advanced Computing: An International Journal (ACIJ), 5(4), 17–22.

    Google Scholar 

  • Singh, S., Singh, V. K., & Siddiqui, T. J. (2013). Hindi word sense disambiguation using semantic relatedness measure. International Workshop on Multi-disciplinary Trends in Artificial Intelligence, 2013, 247–256.

    Google Scholar 

  • Sinha, M., Kumar, M., Pande, P., Kashyap, L., & Bhattacharyya, P. (2004). Hindi word sense disambiguation. In International symposium on machine translation, natural language processing and translation support systems Delhi, India.

  • Srinivas, M., & Rani, B. P. (2016). Word sense disambiguation techniques for Indian and other Asian languages: A survey. International Journal of Computer Applications, 156(8), 35–41.

    Article  Google Scholar 

  • Tayal, D. K. (2015). Word sense disambiguation in Hindi language using hyperspace analogue to language and fuzzy-C means clustering. In International Conference on Natural Language Processing (ICON).

  • Vishwakarma, S. K., & Vishwakarma, C. K. (2012). A graph-based approach to word sense disambiguation for the Hindi anguage. International Journal of Scientific Research Engineering and Technology (IJSRET), 1(5), 313–318.

    Google Scholar 

  • Yadav, P., & Vishwakarma, S. (2013). Mining association rules-based approach to Word sense disambiguation for the Hindi language. International Journal of Emerging Technology and Advanced Engineering, 3(5), 470–473.

    Google Scholar 

  • Zouaghi, A., Merhbene, L., & Zrigui, M. (2011). Word sense disambiguation for Arabic language using variants of the Lesk algorithm. In ICAI 2011.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alok Ranjan Pal.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pal, A.R., Saha, D., Naskar, S.K. et al. In search of a suitable method for disambiguation of word senses in Bengali. Int J Speech Technol 24, 439–454 (2021). https://doi.org/10.1007/s10772-020-09787-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-020-09787-8

Keywords

Navigation