Abstract
Developing an intelligent chatbot has evolved in the last few years to become a trending topic in the area of computer science. However, a chatbot often fails to understand the user’s intent, which can lead to the generation of inappropriate responses that cause dialogue breakdown and user dissatisfaction. Detecting the dialogue breakdown is essential to improve the performance of the chatbot and increase user satisfaction. Recent approaches have focused on modeling conversation breakdown using serveral approaches, including supervised and unsupervised approaches. Unsupervised approach relay heavy datasets, which make it challenging to apply it to the breakdown task. Another challenge facing predicting breakdown in conversation is the bias of human annotation for the dataset and the handling process for the breakdown. To tackle this challenge, we have developed a supervised ensemble automated approach that measures Chatbot Quality of Service (CQoS) based on dialogue breakdown. The proposed approach is able to label the datasets based on sentiment considering the context of the conversion to predict the breakdown. In this paper we aim to detect the affect of sentiment change of each speaker in a conversation. Furthermore, we use the supervised ensemble model to measure the CQoS based on breakdown. Then we handle this problem by using a hand-over mechanism that transfers the user to a live agent. Based on this idea, we perform several experiments across several datasets and state-of-the-art models, and we find that using sentiment as a trigger for breakdown outperforms human annotation. Overall, we infer that knowledge acquired from the supervised ensemble model can indeed help to measure CQoS based on detecting the breakdown in conversation.
Similar content being viewed by others
References
Yan Z, Duan N, Bao J, Chen P, Zhou M, Li Z (2018) Response selection from unstructured documents for human-computer conversation systems. Knowl-Based Syst 142:149
Nuruzzaman M, Hussain OK (2020) IntelliBot: a dialogue-based chatbot for the insurance industry. Knowl-Based Syst 196:105810
Yan Z, Duan N, Chen P, Zhou M, Zhou J, Li Z (2017) In: Thirty-first AAAI conference on artificial intelligence
Henderson M, Thomson B, Young S (2013) In: Proceedings of the SIGDIAL 2013 conference, pp 467–471
Banchs RE, Li H (2012) In: Proceedings of the ACL 2012 system demonstrations. Association for Computational Linguistics, pp 37–42
Wu Y, Li Z, Wu W, Zhou M (2018) Response selection with topic clues for retrieval-based chatbots. Neurocomputing 316:251
Ji Z, Lu Z, Li H (2014) arXiv preprint arXiv:1408.6988
Shang L, Lu Z, Li H (2015) arXiv preprint arXiv:1503.02364
Martinovsky B, Traum D (2006) The error is the clue: breakdown in human–machine interaction. Tech. rep., University of Southern California Marina Del Rey CA Inst for Creative
Xie Z, Ling G (2017) In: Proceedings of the Dialog System Technology Challenges Workshop (DSTC6)
Tripathy A, Agrawal A, Rath SK (2016) Classification of sentiment reviews using n-gram machine learning approach. Expert Syst Appl 57:117
Gautam G, Yadav D (2014) In: 2014 seventh international conference on Contemporary Computing (IC3). IEEE, pp 437–442
Walker MA, Langkilde-Geary I, Hastie HW, Wright J, Gorin A (2002) Automatically training a problematic dialogue predictor for a spoken dialogue system. J Artif Intell Res 16:293
Higashinaka R, Funakoshi K, Kobayashi Y, Inaba M (2016) In: Proceedings of the tenth international conference on Language Resources and Evaluation (LREC’16), pp 3146–3150
Kobayashi S, Unno Y, Fukuda M (2015) Multitask learning of recurrent neural network for detecting breakdowns of dialog and language modeling. Tech. rep., JSAI technical report (SIG-SLUD-75-B502)
Saito A, Iki T (2017) In: Proceedings of the dialog system technology challenges workshop (DSTC6)
Lee S, Lee D, Hooshyar D, Jo J, Lim H (2020) Integrating breakdown detection into dialogue systems to improve knowledge management: encoding temporal utterances with memory attention. Inf Technol Manag 21(1):51
Almansor EH, Hussain FK (2020) In: International conference on advanced information networking and applications. Springer, pp 60–70
Park C, Kim K, Kim S (2017) In: Proceedings of the dialog system technology challenges workshop (DSTC6)
Hori C, Perez J, Higashinaka R, Hori T, Boureau YL, Inaba M, Tsunomori Y, Takahashi T, Yoshino K, Kim S (2019) Overview of the sixth dialog system technology challenge: DSTC6. Comput Speech Lang 55:1
Sugiyama H (2017) In: Proceedings of Dialog System Technology Challenges, vol 6
Takayama J, Nomoto E, Arase Y (2017) In: Proceedings of the Dialog System Technology Challenge 6 Workshop (DSTC6)
Taniguchi K (2015) In JSAI Technical Report (SIG-SLUD-75-B502), pp 37–40
Lopes J (2017) In: Proceedings of Dialog System Technology Challenges Workshop (DSTC6)
Sugiyama H (2019) Empirical feature analysis for dialogue breakdown detection. Comput Speech Lang 54:140
Hutto CJ, Gilbert E (2014) In: Eighth international AAAI conference on weblogs and social media
Almansor EH, Al-Ani A (2018) In: International conference on machine learning and data mining in pattern recognition. Springer, pp 347–356
Almansor EH, Al-Ani A, Hussain FK (2019) In: Conference on complex, intelligent, and software intensive systems. Springer, pp 176–187
Kalarani P, Brunda SS (2019) Sentiment analysis by POS and joint sentiment topic features using SVM and ANN. Soft Comput 23(16):7067
Raza M, Hussain FK, Hussain OK, Zhao M, Rehman Z (2019) A comparative analysis of machine learning models for quality pillar assessment of SaaS services by multi-class text classification of users’ reviews. Future Gener Comput Syst 101:341–371
Higashinaka R, Funakoshi K, Inaba M, Tsunomori Y, Takahashi T, Kaji N (2017) In: Proceedings of dialog system technology challenge, vol 6
Danescu-Niculescu-Mizil C, Lee L (2011) In: Proceedings of the 2nd workshop on cognitive modeling and computational linguistics. Association for Computational Linguistics, pp 76–87
Poria S, Hazarika D, Majumder N, Naik G, Cambria E, Mihalcea R (2018) arXiv preprint arXiv:1810.02508
Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45(4):427
Raschka S (2015) Python machine learning. Packt Publishing Ltd, Birmingham
Müller AC, Guido S et al (2016) Introduction to machine learning with Python: a guide for data scientists. O’Reilly Media Inc, Sebastopol
Coelho LP, Richert W (2015) Building machine learning systems with Python. Packt Publishing Ltd, Birmingham
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Almansor, E.H., Hussain, F.K. & Hussain, O.K. Supervised ensemble sentiment-based framework to measure chatbot quality of services. Computing 103, 491–507 (2021). https://doi.org/10.1007/s00607-020-00863-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-020-00863-0
Keywords
- Chatbot quality of services
- Sentiment analysis
- Detect the dialogue breakdown
- Handling dialogue breakdown