Abstract
The paper presents a study on word sense disambiguation (WSD) in Bengali, one of the less resourced Indian languages. The overall work is carried out in two sequential phases. In the first phase, four well-known approaches, which are often applied for sense disambiguation of words, are studied using the traditional methods. In the course of application, suitable modifications are made as well as implemented for eliciting desired results. In the second stage, a combined approach is proposed based on the results obtained from initial experiments. Within ‘supervised module’ the four commonly used methods, namely, the Decision Tree (DT) method, Support Vector Machine (SVM) method, Artificial Neural Network (ANN) method, and the Naïve Bayes (NB) method are used at the baseline for the purpose of classification of senses. These baseline strategies produced 63.84%, 76.9%, 76.23%, and 80.23% accurate results, respectively, when these methods are tested on 13 mostly used Bengali ambiguous words retrieved from a Bengali text corpus. Next, two major modifications are applied on these baseline strategies to increase the level of accuracy: (a) incorporation of Lemmatization process in the system (that produces 68.30%, 79%, 78.23%, and 82.30% accurate results, respectively), and (b) operation of Bootstrapping on the systems (including lemmatization feature), which produces 70.92%, 79.15%, 79.53%, and 83% accuracy, respectively. Next, in a knowledge-based method, the traditional Lesk algorithm is implemented at the baseline which produces 31% accurate result in sense disambiguation. This strategy is further modified by Context Expansion (CE) method in the sentences using the Bengali WordNet to produce 75% accuracy. Within ‘unsupervised module’, the baseline strategy produced a 36.2% accurate result in sense disambiguation task. To enhance the level of performance, two modifications are adopted over this baseline strategy: (a) Principal Component Analysis (PCA) over the feature vector, which produces 51.2% accuracy in WSD task, and (b) Context Expansion of the sentences using the Bengali WordNet with PCA, which produces 61% accuracy in sense disambiguation task. Finally, a combined approach is adopted after considering all the effective aspects of the three methods, and it produces the highest level accuracy (92%) in the task of sense disambiguation.
Similar content being viewed by others
Notes
Bengali WordNet has an online interface at http://www.cfilt.iitb.ac.in/indowordnet/.
References
Abualhaija, S. (2016). D-Bees: A novel global algorithm for solving word sense disambiguation. Technical University of Hamburg, Germany, 2016, 1–141.
Agirre, E., & Edmonds, P. (Eds.). (2007). Word sense disambiguation-algorithms and applications (Vol. 33). New York: Springer.
Aung, N. T., Soe, K. M., & Thein, N. L. (2011). A word sense disambiguation system using Naïve Bayes algorithm for Myanmar language. International Journal of Scientific and Engineering Research, 2(9), 1–7.
Bala, P. (2013). Knowledge-Based Approach for Word sense disambiguation using Hindi WordNet. The International Journal of Engineering and Science (IJES), 2(4), 36–41.
Banerjee, S., & Pedersen, T. (2002). An adapted Lesk algorithm for word sense disambiguation using WordNet. In Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing, (pp. 136–145).
Basile, P., Caputo, A., & Semeraro, G. (2014). An enhanced Lesk word sense disambiguation algorithm through a distributional semantic model. COLING, 2014, 1591–1600.
Biswas, S., Dasgupta, S., Bhattacharya, D., & Bhattacharya, S. (2011). Sansad BAnglA AvidhAn, Published by SAhitya Sansad.
Bouhriz, N., Benabbou, F., & Lahmar, E. H. B. (2016). Word sense disambiguation approach for Arabic text. International Journal of Advanced Computer Science and Applications, 7(4), 381–385.
Das, A., & Sarkar, S. (2013). Word Sense Disambiguation in Bengali applied to BengaliHindi Machine Translation. In The processing of International Conference on Natural Language Processing (ICON).
Dash, N. S. (2000). The process of designing a multi-disciplinary monolingual sample corpus. International Journal of Corpus Linguistics, 5(2), 179–197.
Dash, N. S. (2002). Lexical polysemy in Bangla: A corpus-based study. PILC Journal of Dravidic Studies, 12(1–2), 203–214.
Dash, N. S. (2017). Defining language-specific synsets in IndoWordNet: Some theoretical and practical issues. In N. S. Dash, P. Bhattacharyya, & J. Pawar (Eds.), The WordNet In Indian languages (pp. 45–64). Singapore: Springer.
Dash, N. S., Bhattacharyya, P., & Pawar, J. (Eds.). (2017). The WordNet in Indian Languages. Singapore: Springer.
Dhungana, U. R., & Shakya, S. (2014). Word sense disambiguation in the Nepali language. In Fourth International Conference on Digital Information and Communication Technology and it‘s Applications (DICTAP) (pp. 46–50)
Gaurav. (2013). Probabilistic latent semantic analysis for unsupervised word sense disambiguation. International Journal of Computer Science Issues, 10(5), 127–133.
Haque, A., & Hoque, M. M. (2016). Bangla word sense disambiguation system using dictionary-based approach.
Haroon, R. P. (2010). Malayalam word sense disambiguation. In Computational Intelligence and Computing Research (ICCIC). IEEE.
Hdni, M. (2016). Word sense disambiguation for arabic text categorization. The International Arab Journal of Information Technology, 13, 215–222.
Ide, N., & Véronis, J. (1998). Word sense disambiguation: The state of the art. Computational Linguistics, 24(1), 1–40.
Kalita, P., & Barman, A. K. (2015). Implementation of Walker algorithm in word sense disambiguation for the Assamese language. In International Symposium on Advanced Computing and Communication (ISACC) (pp. 136–140).
Kumar, R., Goyal, V., & Khanna, R. (2013). N-gram based word sense disambiguation of Hindi post position (sē) in the context of Hindi to Punjabi machine translation system. An International Journal of Engineering Sciences, 9, 59–67.
Kumari, S., & Singh, P. (2013). Optimized word sense disambiguation in Hindi using genetic algorithm. International Journal of Research in Computer and Communication Technology, 2(7), 445–449.
Menai, M. E. B. (2014). Word sense disambiguation using an evolutionary approach. Informatica, 38, 155–170.
Merhben, L., Zouaghi, A., & Zrigui, M. (2010). Ambiguous Arabic words disambiguation. In 11th ACIS International Conference on Software Engineering (pp. 157–164). Networking and Parallel/Distributed Computing: Artificial Intelligence.
Merhbene, L., Zouaghi, A., & Zrigui, M. (2013). A semi-supervised method for arabic word sense disambiguation using a weighted directed graph. In International Joint Conference on Natural Language Processing (pp. 1027–1031).
Navigli, R. (2009). Word sense disambiguation: A survey. ACM Computing Surveys, 41(2), 1–69.
Nazah, S., Hoque, M. M., & Hossain, R. (2017). Word sense disambiguation of Bangla sentences using a statistical approach. In 3rd International Conference on Electrical Information and Communication Technology (EICT) (pp. 1–6).
Pal, A. R., & Saha, D. (2017). Word sense disambiguation in Bengali: an unsupervised approach. In IEEE Second International Conference on Electrical, Computer and Communication Technologies (pp. 1369–1373).
Pal, A. R., Saha, D., Dash, N. S., & Pal, A. (2018). Word sense disambiguation/in Bangla language using supervised methodology with necessary modifications. Journal of The Institution of Engineers, 99(5), 519–526.
Pal, A. R., Saha, D., & Naskar, S. K. (2017). Word sense disambiguation in Bengali: a knowledge based approach using Bengali WordNet. In IEEE Second International Conference on Electrical, Computer and Communication Technologies (pp. 1363–1368).
Pandit, R., & Naskar, S. K. (2015). A memory-based approach to word sense disambiguation in Bangla using k-NN method. In IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS) (pp. 383–386).
Parameswarappa, S., & Narayana, V. N. (2013). Kannada word sense disambiguation using decision list. International Journal of Emerging Trends and Technology in Computer Science (IJETTCS), 2(3), 272–278.
Rana, P., & Kumar, P. (2015). Word sense disambiguation for punjabi language using overlap based approach. Advances in Intelligent Systems and Computing, 606–619.
Roy, A., Sarkar, S., & Purkayastha, B. S. (2014). Knowledge-based approaches to Nepali Word sense disambiguation. International Journal on Natural Language Computing (IJNLC), 33, 51–63.
Sarmah, J., & Sarma, S. K. (2016). Decision tree based word sense disambiguation for Assamese. In International Journal of Computer Applications (Vol. 141, pp. 42–48).
Singh, S. (2013). Hindi Word sense disambiguation using semantic relatedness measure. International Workshop on Multi-disciplinary Trends in Artificial Intelligence, Springer, 2013, 247–256.
Singh, R. L., Ghosh, K., Nongmeikapam, K., & Bandyopadhyay, S. (2014). A decision tree-based word sense disambiguation system in Manipuri language. Advanced Computing: An International Journal (ACIJ), 5(4), 17–22.
Singh, S., Singh, V. K., & Siddiqui, T. J. (2013). Hindi word sense disambiguation using semantic relatedness measure. International Workshop on Multi-disciplinary Trends in Artificial Intelligence, 2013, 247–256.
Sinha, M., Kumar, M., Pande, P., Kashyap, L., & Bhattacharyya, P. (2004). Hindi word sense disambiguation. In International symposium on machine translation, natural language processing and translation support systems Delhi, India.
Srinivas, M., & Rani, B. P. (2016). Word sense disambiguation techniques for Indian and other Asian languages: A survey. International Journal of Computer Applications, 156(8), 35–41.
Tayal, D. K. (2015). Word sense disambiguation in Hindi language using hyperspace analogue to language and fuzzy-C means clustering. In International Conference on Natural Language Processing (ICON).
Vishwakarma, S. K., & Vishwakarma, C. K. (2012). A graph-based approach to word sense disambiguation for the Hindi anguage. International Journal of Scientific Research Engineering and Technology (IJSRET), 1(5), 313–318.
Yadav, P., & Vishwakarma, S. (2013). Mining association rules-based approach to Word sense disambiguation for the Hindi language. International Journal of Emerging Technology and Advanced Engineering, 3(5), 470–473.
Zouaghi, A., Merhbene, L., & Zrigui, M. (2011). Word sense disambiguation for Arabic language using variants of the Lesk algorithm. In ICAI 2011.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pal, A.R., Saha, D., Naskar, S.K. et al. In search of a suitable method for disambiguation of word senses in Bengali. Int J Speech Technol 24, 439–454 (2021). https://doi.org/10.1007/s10772-020-09787-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-020-09787-8