Skip to main content
Log in

Deep learning based Bengali question answering system using semantic textual similarity

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Recently, Question answering system is a major research area in language processing. Bengali isone of the most popular spoken languages in India. Still, it has faced difficulties in natural language processing.Among the semantic based systems, word mapping and keyword based approaches achieved the best results and got better attention on the user side. These systems are already implemented in various languages but not much in Indian language like Bengali. This work presents an efficient question answering system for retrieving Bengali language text. This system includes word embedding clustering and deep level feature representation for providing better grammatical similarities for retrieving the Bengali textual contents relevant to user queries. The pre-trained word embedding module is created by the help of a deep belief network. The modified density peak algorithm is employed to perform word embedding clustering.The presented work has been tested on a dataset from the Bengali corpus developed by TDIL and synthetic Bengali translated datasets accessible in English called SQuAD 2.0. This question answering system is implemented in python with NLTK tool kit and got good performance while retrieving the Bengali textual data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Ahmad A, Md. Amin R, Chowdhury F (2018) Bengali document clustering using word movers distance. International Conference on Bangla Speech and Language Processing (Icbslp) 2018:1–6

    Google Scholar 

  2. Ahmed R, Al Hasan M, Selim MR (2018) Aligning Sentences In English-Bengali Corpora 2018 International Conference on Computer, Communication, Chemical, Material and Electronic Engineering (Ic4me2) 1–5

  3. Banerjee S, Naskar SK, Rosso P, Bndyopadhyay S (2019) Classifier combination approach for question classification for Bengali question answering system. Sadhana 44(12). https://doi.org/10.1007/s12046-019-1224-8

  4. Carrino CP, Costa-jussà MR, Fonollosa JA (2019) Automatic Spanish translation of the SQuAD dataset for multilingual question answering. arXiv preprint arXiv:1912.05200

  5. Cui Y, Liu T, Che W, Xiao L, Chen Z, Ma W, Wang S (2018) A span-extraction dataset for chinese machine reading comprehension. arXiv preprint arXiv:1810.07366

  6. Choudhary L (2012) Role of ranking algorithms for information retrieval. Int J Artif Intell Applications 3(4):203–220

    Google Scholar 

  7. Chowdhury SR, Sarkar K, Dam S (2017) An approach to generic Bengali text summarization using latent semantic analysis. International Conference On Information Technology (Icit) 11–16

  8. Das A, Halder T, Saha D (2017) Automatic extraction of Bengali root verbs using Paninian grammar” in Proc. 2nd IEEE International Conference on Recent Trends in Electronics, Information Communication Technology (RTEICT), Bangalore, India, p 953–956. https://doi.org/10.1109/RTEICT.2017.8256739

  9. Das A, Mandal J, Danial Z, Pal AR, Saha D (2019) A Novel Approach for Automatic Bengali Question Answering System using Semantic Similarity Analysis. arXiv preprint. arXiv:1910.10758

  10. Das A, Saha D (2017) Improvement of electronic governance and mobile governance in multilingual countries with digital etymology using sanskrit grammar, in Proc. IEEE Region 10 Humanitarian Technology Conference (R10-HTC), Dhaka, Bangladesh, p 502–505. https://doi.org/10.1109/R10-HTC.2017.8289008

  11. Dhar A, Dash NS, Roy K (2018) Categorization of bangla web text documents based on Tf-Idf-Icf text analysis scheme social transformation – digital way 477–484

  12. Efimov P, Chertok A, Boytsov L, Braslavski P (2020) SberQuAD - Russian reading comprehension dataset: Description and analysis. Experimental IR Meets multilinguality, multimodality, and interaction. CLEF 2020 (Vol. 12260). Springer

  13. Grave E, Bojanowski P, Gupta P, Joulin A, Mikolov T (2018) Learning word vectors for 157 languages. In Proceedings of the International Conference on Language Resources and Evaluation (LREC)

  14. Gupta D, Ekbal A, Bhattacharyya P (2019) A deep neural network framework for englishhindi question answering. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 19(2):1–22

    Google Scholar 

  15. Gupta D, Kumari S, Ekbal A, Bhattacharyya P (2018) MMQA: A multi-domain multi-lingual question-answering framework for English and Hindi. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC)

  16. Islam MA, Kabir MF, Abdullah-Al-Mamun K, Huda MN (2016) Word/phrase based answer type classification for bengali question answering system. In 2016 5th International Conference on Informatics, Electronics and Vision (ICIEV) IEEE, p 445–448. https://doi.org/10.1109/ICIEV.2016.7760043

  17. Jaya RR (2016) A simple and new optimization algorithm for solving constrained and unconstrained optimization problems. Int J Ind Eng Comput 7(1):19–34

    Google Scholar 

  18. Jiang M, Liang Y, Feng X, Fan X, Pei Z, Xue Y, Guan R (2018) Text classification based on deep belief network and softmax regression. Neural Comput Appl 29(1):61–70

    Article  Google Scholar 

  19. Khatun S, Hoque MM (2018) Semantic analysis of bengali sentences. International Conference on Bangla Speech and Language Processing (Icbslp) 2018:1–6

    Google Scholar 

  20. Kowsher M, Rahman MM, Ahmed SS, Prottasha NJ (2019) Bangla Intelligence Question Answering System Based on Mathematics and Statistics. In 2019 22nd International Conference on Computer and Information Technology (ICCIT) IEEE 1–6

  21. Lee K, Yoon K, Park S, Hwang S (2018) Semi-supervised training data generation for multilingual question answering. LREC

  22. Ling H, Wu J, Huang J, Chen J, Li P (2020) Attention-based convolutional neural network for deep face recognition. Multimed Tools Appl 79(9):5595–5616

    Article  Google Scholar 

  23. Mahmudand A, Khan M (2007) Research report on Bengla tagset. Brac University

  24. Manna PP, Pal AR (2019) Question Answering System in Bengali Using Semantic Search. In 2019 International Conference on Applied Machine Learning (ICAML) IEEE, p 175–179. https://doi.org/10.1109/ICAML48257.2019.00041

  25. Marcos-Pablos S, García-Peñalvo FJ (2018) Information retrieval methodology for aiding scientific database search. Soft Computing 1–10

  26. MdShajalal, Aono M (2018) Semantic textual similarity in bengali text. International Conference on Bangla Speech and Language Processing (Icbslp) 1–5

  27. Monisha STA, Sarker S, Nahid MMH (2019) “Classification of Bengali Questions Towards a Factoid Question Answering System,” 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT), 2019, pp. 1-5. https://doi.org/10.1109/ICASERT.2019.8934567

  28. Mozannar H, Hajal KE, Maamary E, Hajj H (2019) Neural arabic question answering. arXiv preprint arXiv:1906.05394

  29. Nie L, Zhao YL, Wang X, Shen J, Chua TS (2014) Learning to recommend descriptive tags for questions in social forums. ACM Trans Inf Syst (TOIS) 32(1):1–23

    Article  Google Scholar 

  30. Nguyen G-H, Tamine L, Soulier L, Souf N (2018) A tri-partite neural document language model for semantic information retrieval. The Semantic Web 445–461

  31. Noraset T, Lowphansirikul L, Tuarob S, Wabi QA (2021) A wikipedia-based thai question-answering system. Inf Process Manag 58(1):102431. https://www.sciencedirect.com/science/article/pii/S0306457320309249

  32. Pandey HM (2016) Jaya a novel optimization algorithm: What, how and why? In: 2016 IEEE 6th International Conference-Cloud System and Big Data Engineering (Confluence), IEEE 728–730

  33. Prasad SS, Kumar J, Prabhakar DK, Tripathi S (2016) Sentiment Mining: An Approach for Bengali and Tamil Tweets, 2016 Ninth International Conference on Contemporary Computing (Ic3) 1–4

  34. Pundge AM, Khillare SA, Mahender CN (2016) Question answering system, approaches and techniques: a review. Int J Comput Appl 141(3):0975–8887

    Google Scholar 

  35. Purkaystha B, Datta T, Md. Islam S, Marium-E-Jannat (2018) Layered representation of bengali texts in reduced dimension using deep feed forward neural network for categorization, 2018 21st International Conference of Computer and Information Technology (Iccit) 1–5

  36. Sarkar K, Chatterjee S (2017) Bengali-to-english forward and backward machine transliteration using support vector machines. Commun Comput Inf Sci 552–566

  37. Soares MA, Parreiras FS (2020) A literature review on question answering techniques, paradigms and systems. J King Saud Univ-Comput Inf Sci 32(6):635–46

    Google Scholar 

  38. Srba I, Bielikova M (2016) A comprehensive survey and classification of approaches for community question answering. ACM Trans Web (TWEB) 10(3):1–63

    Article  Google Scholar 

  39. Wang L, Qian X, Zhang Y, Shen J, Cao X (2019) Enhancing sketch-based image retrieval by cnn semantic re-ranking. IEEE Trans Cybern 50(7):3330–3342

    Article  Google Scholar 

  40. Zhou S, Jia J, Yin Y, Li X, Yao Y, Zhang Y, Ye Z, Lei K, Huang Y, Shen J (2019) Understanding the teaching styles by an attention based multi-task cross-media dimensional modeling. In Proceedings of the 27th ACM International Conference on Multimedia 1322–1330

  41. Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942. 2019 Sep 26. https://arxiv.org/abs/1909.11942

  42. Zhang Z, Yang J, Zhao H. Retrospective reader for machine reading comprehension. arXiv preprint arXiv:2001.09694. 2020 Jan 27. https://arxiv.org/abs/2001.09694

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arijit Das.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Das, A., Saha, D. Deep learning based Bengali question answering system using semantic textual similarity. Multimed Tools Appl 81, 589–613 (2022). https://doi.org/10.1007/s11042-021-11228-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11228-w

Keywords

Navigation