skip to main content
10.1145/3626641.3626926acmotherconferencesArticle/Chapter ViewAbstractPublication PagessietConference Proceedingsconference-collections
research-article

Mutual Information for Learning Context Representation on RNN-Attention Based Models in Open Domain Generative Chatbot

Published: 27 December 2023 Publication History

Abstract

Chatbot is an example of the application of Artificial Intelligence that can receive and answer questions automatically. Chatbots are widely used in various fields such as health, customer service, entertainment, education and others. There are two approaches to chatbot development, rule-based and generative. Rule-based chatbot has the advantage of being easy to develop and produces good answers but requires predefined rules that are defined manually. Generative chatbot can provide dynamic and natural answers and does not require predefined rules. However, the drawback of generative chatbot lies in the weak representation of sentence information and information bottleneck which results in loss of information or context. The main objective of this research is to get the best model for open domain generative chatbot in a predefined scenario and improve the performance of the model in terms of word information representation using SBERT Pretrained Word Embedding and reduce information loss in encoder bottleneck and output using Mutual Information. Based on the experimental results, LSTM with the addition of Bahdanau Attention achieved the best performance in all scenarios with the highest BLEU and BERT F1-Score. Whereas in the 50 and 100 (long) sequence scenarios, the addition of Mutual Information and SBERT can improve overall model performance for BLEU by 3.62% and 2.58% respectively and BERT Score by 3.16% and 5.10% respectively.

References

[1]
Lekha Athota, Vinod Kumar Shukla, Nitin Pandey, and Ajay Rana. 2020. Chatbot for Healthcare System Using Artificial Intelligence. In 2020 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO). 619–622. https://doi.org/10.1109/ICRITO48877.2020.9197833
[2]
Fitra A. Bachtiar, Alfirsa Damasyifa Fauzulhaq, Marvel Timothy Raphael Manullang, Fabiansyah Raam Pontoh, Kuncahyo Setyo Nugroho, and Novanto Yudistira. 2023. A Generative-Based Chatbot for Daily Conversation: A Preliminary Study. In Proceedings of the 7th International Conference on Sustainable Information Engineering and Technology (Malang, Indonesia) (SIET ’22). Association for Computing Machinery, New York, NY, USA, 8–12. https://doi.org/10.1145/3568231.3568234
[3]
Dzmitry Bahdanau, Kyunghyun Cho, and Y. Bengio. 2014. Neural Machine Translation by Jointly Learning to Align and Translate. ArXiv 1409 (09 2014).
[4]
Guendalina Caldarini, Sardar Jaf, and Kenneth McGarry. 2022. A Literature Survey of Recent Advances in Chatbots. Information 13, 1 (2022). https://doi.org/10.3390/info13010041
[5]
Yu Cao, Liang Ding, Zhiliang Tian, and Meng Fang. 2021. Towards Efficiently Diversifying Dialogue Generation Via Embedding Augmentation. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 7443–7447. https://doi.org/10.1109/ICASSP39728.2021.9414915
[6]
Yogi Wisesa Chandra and Suyanto Suyanto. 2019. Indonesian Chatbot of University Admission Using a Question Answering System Based on Sequence-to-Sequence Model. Procedia Computer Science 157 (2019), 367–374. https://doi.org/10.1016/j.procs.2019.08.179 The 4th International Conference on Computer Science and Computational Intelligence (ICCSCI 2019) : Enabling Collaboration to Escalate Impact of Research Results for Society.
[7]
Brian Davis, Umang Bhatt, Kartikeya Bhardwaj, Radu Marculescu, and José M. F. Moura. 2019. NIF: A Framework for Quantifying Neural Information Flow in Deep Networks. CoRR abs/1901.08557 (2019). arXiv:1901.08557http://arxiv.org/abs/1901.08557
[8]
Manyu Dhyani and Rajiv Kumar. 2021. An intelligent Chatbot using deep learning with Bidirectional RNN and attention model. Materials Today: Proceedings 34 (2021), 817–824. https://doi.org/10.1016/j.matpr.2020.05.450 3rd International Conference on Science and Engineering in Materials.
[9]
Vishal Dutt, Satya Murthy Sasubilli, and Anand Eswararao Yerrapati. 2020. Dynamic Information Retrieval With Chatbots: A Review of Artificial Intelligence Methodology. In 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA). 1299–1303. https://doi.org/10.1109/ICECA49313.2020.9297533
[10]
Silvia García-Méndez, Francisco De Arriba-Pérez, Francisco J. González-Castaño, JOSé A. Regueiro-Janeiro, and Felipe Gil-Castiñeira. 2021. Entertainment Chatbot for the Digital Inclusion of Elderly People Without Abstraction Capabilities. IEEE Access 9 (2021), 75878–75891. https://doi.org/10.1109/ACCESS.2021.3080837
[11]
Jahnvi Gupta, Vinay Singh, and Ish Kumar. 2021. Florence- A Health Care Chatbot. In 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS), Vol. 1. 504–508. https://doi.org/10.1109/ICACCS51430.2021.9442006
[12]
Evan Hernandez, Sarah Schwettmann, David Bau, Teona Bagashvili, Antonio Torralba, and Jacob Andreas. 2022. Natural Language Descriptions of Deep Visual Features. CoRR abs/2201.11114 (2022). arXiv:2201.11114https://arxiv.org/abs/2201.11114
[13]
Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Philip Bachman, Adam Trischler, and Yoshua Bengio. 2019. Learning deep representations by mutual information estimation and maximization. In ICLR 2019. ICLR.
[14]
Huayang Li, Deng Cai, Jin Xu, and Taro Watanabe. 2022. Residual Learning of Neural Text Generation with n-gram Language Model. In Findings of the Association for Computational Linguistics: EMNLP 2022. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 1523–1533. https://doi.org/10.18653/v1/2022.findings-emnlp.109
[15]
Yanran Li, Hui Su, Xiaoyu Shen, Wenjie Li, Ziqiang Cao, and Shuzi Niu. 2017. DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Asian Federation of Natural Language Processing, Taipei, Taiwan, 986–995. https://aclanthology.org/I17-1099
[16]
Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective Approaches to Attention-based Neural Machine Translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Lisbon, Portugal, 1412–1421. https://doi.org/10.18653/v1/D15-1166
[17]
Chiara Valentina Misischia, Flora Poecze, and Christine Strauss. 2022. Chatbots in customer service: Their relevance and impact on service quality. Procedia Computer Science 201 (2022), 421–428. https://doi.org/10.1016/j.procs.2022.03.055 The 13th International Conference on Ambient Systems, Networks and Technologies (ANT) / The 5th International Conference on Emerging Data and Industry 4.0 (EDI40).
[18]
Chinedu Wilfred Okonkwo and Abejide Ade-Ibijola. 2021. Chatbots applications in education: A systematic review. Computers and Education: Artificial Intelligence 2 (2021), 100033. https://doi.org/10.1016/j.caeai.2021.100033
[19]
Kulothunkan Palasundram, Nurfadhlina Mohd Sharef, Khairul Azhar Kasmiran, and Azreen Azman. 2020. Enhancements to the Sequence-to-Sequence-Based Natural Answer Generation Models. IEEE Access 8 (2020), 45738–45752. https://doi.org/10.1109/ACCESS.2020.2978551
[20]
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Conference on Empirical Methods in Natural Language Processing. https://api.semanticscholar.org/CorpusID:201646309
[21]
Prashant Serai, Adam Stiff, and Eric Fosler-Lussier. 2020. End to End Speech Recognition Error Prediction with Sequence to Sequence Learning. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 6339–6343. https://doi.org/10.1109/ICASSP40776.2020.9054398
[22]
Wenxuan Wang, Wenxiang Jiao, Yongchang Hao, Xing Wang, Shuming Shi, Zhaopeng Tu, and Michael Lyu. 2022. Understanding and Improving Sequence-to-Sequence Pretraining for Neural Machine Translation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dublin, Ireland, 2591–2600. https://doi.org/10.18653/v1/2022.acl-long.185
[23]
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron C. Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. CoRR abs/1502.03044 (2015). arXiv:1502.03044http://arxiv.org/abs/1502.03044
[24]
Xunjian Yin and Xiaojun Wan. 2022. How Do Seq2Seq Models Perform on End-to-End Data-to-Text Generation?. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dublin, Ireland, 7701–7710. https://doi.org/10.18653/v1/2022.acl-long.531
[25]
Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. 2019. BERTScore: Evaluating Text Generation with BERT. CoRR abs/1904.09675 (2019). arXiv:1904.09675http://arxiv.org/abs/1904.09675
[26]
Lijian Zhou, Lijun Wang, Zhiang Zhao, Yuwei Liu, and Xiwu Liu. 2023. A Seq2Seq Model Improved by Transcendental Learning and Imaged Sequence Samples for Porosity Prediction. Mathematics 11, 1 (2023). https://doi.org/10.3390/math11010039

Index Terms

  1. Mutual Information for Learning Context Representation on RNN-Attention Based Models in Open Domain Generative Chatbot

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      SIET '23: Proceedings of the 8th International Conference on Sustainable Information Engineering and Technology
      October 2023
      722 pages
      ISBN:9798400708503
      DOI:10.1145/3626641
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 December 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Attention Mechanism
      2. Chatbot
      3. Deep Neural Network
      4. Natural Language Processing
      5. Sequence-to-sequence

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      SIET 2023

      Acceptance Rates

      Overall Acceptance Rate 45 of 57 submissions, 79%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 40
        Total Downloads
      • Downloads (Last 12 months)29
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 05 Mar 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media