Abstract
Recently, Question answering system is a major research area in language processing. Bengali isone of the most popular spoken languages in India. Still, it has faced difficulties in natural language processing.Among the semantic based systems, word mapping and keyword based approaches achieved the best results and got better attention on the user side. These systems are already implemented in various languages but not much in Indian language like Bengali. This work presents an efficient question answering system for retrieving Bengali language text. This system includes word embedding clustering and deep level feature representation for providing better grammatical similarities for retrieving the Bengali textual contents relevant to user queries. The pre-trained word embedding module is created by the help of a deep belief network. The modified density peak algorithm is employed to perform word embedding clustering.The presented work has been tested on a dataset from the Bengali corpus developed by TDIL and synthetic Bengali translated datasets accessible in English called SQuAD 2.0. This question answering system is implemented in python with NLTK tool kit and got good performance while retrieving the Bengali textual data.
Similar content being viewed by others
References
Ahmad A, Md. Amin R, Chowdhury F (2018) Bengali document clustering using word movers distance. International Conference on Bangla Speech and Language Processing (Icbslp) 2018:1–6
Ahmed R, Al Hasan M, Selim MR (2018) Aligning Sentences In English-Bengali Corpora 2018 International Conference on Computer, Communication, Chemical, Material and Electronic Engineering (Ic4me2) 1–5
Banerjee S, Naskar SK, Rosso P, Bndyopadhyay S (2019) Classifier combination approach for question classification for Bengali question answering system. Sadhana 44(12). https://doi.org/10.1007/s12046-019-1224-8
Carrino CP, Costa-jussà MR, Fonollosa JA (2019) Automatic Spanish translation of the SQuAD dataset for multilingual question answering. arXiv preprint arXiv:1912.05200
Cui Y, Liu T, Che W, Xiao L, Chen Z, Ma W, Wang S (2018) A span-extraction dataset for chinese machine reading comprehension. arXiv preprint arXiv:1810.07366
Choudhary L (2012) Role of ranking algorithms for information retrieval. Int J Artif Intell Applications 3(4):203–220
Chowdhury SR, Sarkar K, Dam S (2017) An approach to generic Bengali text summarization using latent semantic analysis. International Conference On Information Technology (Icit) 11–16
Das A, Halder T, Saha D (2017) Automatic extraction of Bengali root verbs using Paninian grammar” in Proc. 2nd IEEE International Conference on Recent Trends in Electronics, Information Communication Technology (RTEICT), Bangalore, India, p 953–956. https://doi.org/10.1109/RTEICT.2017.8256739
Das A, Mandal J, Danial Z, Pal AR, Saha D (2019) A Novel Approach for Automatic Bengali Question Answering System using Semantic Similarity Analysis. arXiv preprint. arXiv:1910.10758
Das A, Saha D (2017) Improvement of electronic governance and mobile governance in multilingual countries with digital etymology using sanskrit grammar, in Proc. IEEE Region 10 Humanitarian Technology Conference (R10-HTC), Dhaka, Bangladesh, p 502–505. https://doi.org/10.1109/R10-HTC.2017.8289008
Dhar A, Dash NS, Roy K (2018) Categorization of bangla web text documents based on Tf-Idf-Icf text analysis scheme social transformation – digital way 477–484
Efimov P, Chertok A, Boytsov L, Braslavski P (2020) SberQuAD - Russian reading comprehension dataset: Description and analysis. Experimental IR Meets multilinguality, multimodality, and interaction. CLEF 2020 (Vol. 12260). Springer
Grave E, Bojanowski P, Gupta P, Joulin A, Mikolov T (2018) Learning word vectors for 157 languages. In Proceedings of the International Conference on Language Resources and Evaluation (LREC)
Gupta D, Ekbal A, Bhattacharyya P (2019) A deep neural network framework for englishhindi question answering. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 19(2):1–22
Gupta D, Kumari S, Ekbal A, Bhattacharyya P (2018) MMQA: A multi-domain multi-lingual question-answering framework for English and Hindi. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC)
Islam MA, Kabir MF, Abdullah-Al-Mamun K, Huda MN (2016) Word/phrase based answer type classification for bengali question answering system. In 2016 5th International Conference on Informatics, Electronics and Vision (ICIEV) IEEE, p 445–448. https://doi.org/10.1109/ICIEV.2016.7760043
Jaya RR (2016) A simple and new optimization algorithm for solving constrained and unconstrained optimization problems. Int J Ind Eng Comput 7(1):19–34
Jiang M, Liang Y, Feng X, Fan X, Pei Z, Xue Y, Guan R (2018) Text classification based on deep belief network and softmax regression. Neural Comput Appl 29(1):61–70
Khatun S, Hoque MM (2018) Semantic analysis of bengali sentences. International Conference on Bangla Speech and Language Processing (Icbslp) 2018:1–6
Kowsher M, Rahman MM, Ahmed SS, Prottasha NJ (2019) Bangla Intelligence Question Answering System Based on Mathematics and Statistics. In 2019 22nd International Conference on Computer and Information Technology (ICCIT) IEEE 1–6
Lee K, Yoon K, Park S, Hwang S (2018) Semi-supervised training data generation for multilingual question answering. LREC
Ling H, Wu J, Huang J, Chen J, Li P (2020) Attention-based convolutional neural network for deep face recognition. Multimed Tools Appl 79(9):5595–5616
Mahmudand A, Khan M (2007) Research report on Bengla tagset. Brac University
Manna PP, Pal AR (2019) Question Answering System in Bengali Using Semantic Search. In 2019 International Conference on Applied Machine Learning (ICAML) IEEE, p 175–179. https://doi.org/10.1109/ICAML48257.2019.00041
Marcos-Pablos S, García-Peñalvo FJ (2018) Information retrieval methodology for aiding scientific database search. Soft Computing 1–10
MdShajalal, Aono M (2018) Semantic textual similarity in bengali text. International Conference on Bangla Speech and Language Processing (Icbslp) 1–5
Monisha STA, Sarker S, Nahid MMH (2019) “Classification of Bengali Questions Towards a Factoid Question Answering System,” 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT), 2019, pp. 1-5. https://doi.org/10.1109/ICASERT.2019.8934567
Mozannar H, Hajal KE, Maamary E, Hajj H (2019) Neural arabic question answering. arXiv preprint arXiv:1906.05394
Nie L, Zhao YL, Wang X, Shen J, Chua TS (2014) Learning to recommend descriptive tags for questions in social forums. ACM Trans Inf Syst (TOIS) 32(1):1–23
Nguyen G-H, Tamine L, Soulier L, Souf N (2018) A tri-partite neural document language model for semantic information retrieval. The Semantic Web 445–461
Noraset T, Lowphansirikul L, Tuarob S, Wabi QA (2021) A wikipedia-based thai question-answering system. Inf Process Manag 58(1):102431. https://www.sciencedirect.com/science/article/pii/S0306457320309249
Pandey HM (2016) Jaya a novel optimization algorithm: What, how and why? In: 2016 IEEE 6th International Conference-Cloud System and Big Data Engineering (Confluence), IEEE 728–730
Prasad SS, Kumar J, Prabhakar DK, Tripathi S (2016) Sentiment Mining: An Approach for Bengali and Tamil Tweets, 2016 Ninth International Conference on Contemporary Computing (Ic3) 1–4
Pundge AM, Khillare SA, Mahender CN (2016) Question answering system, approaches and techniques: a review. Int J Comput Appl 141(3):0975–8887
Purkaystha B, Datta T, Md. Islam S, Marium-E-Jannat (2018) Layered representation of bengali texts in reduced dimension using deep feed forward neural network for categorization, 2018 21st International Conference of Computer and Information Technology (Iccit) 1–5
Sarkar K, Chatterjee S (2017) Bengali-to-english forward and backward machine transliteration using support vector machines. Commun Comput Inf Sci 552–566
Soares MA, Parreiras FS (2020) A literature review on question answering techniques, paradigms and systems. J King Saud Univ-Comput Inf Sci 32(6):635–46
Srba I, Bielikova M (2016) A comprehensive survey and classification of approaches for community question answering. ACM Trans Web (TWEB) 10(3):1–63
Wang L, Qian X, Zhang Y, Shen J, Cao X (2019) Enhancing sketch-based image retrieval by cnn semantic re-ranking. IEEE Trans Cybern 50(7):3330–3342
Zhou S, Jia J, Yin Y, Li X, Yao Y, Zhang Y, Ye Z, Lei K, Huang Y, Shen J (2019) Understanding the teaching styles by an attention based multi-task cross-media dimensional modeling. In Proceedings of the 27th ACM International Conference on Multimedia 1322–1330
Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942. 2019 Sep 26. https://arxiv.org/abs/1909.11942
Zhang Z, Yang J, Zhao H. Retrospective reader for machine reading comprehension. arXiv preprint arXiv:2001.09694. 2020 Jan 27. https://arxiv.org/abs/2001.09694
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Das, A., Saha, D. Deep learning based Bengali question answering system using semantic textual similarity. Multimed Tools Appl 81, 589–613 (2022). https://doi.org/10.1007/s11042-021-11228-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11228-w