Skip to main content
Log in

Supervised ensemble sentiment-based framework to measure chatbot quality of services

  • Special Issue Article
  • Published:
Computing Aims and scope Submit manuscript

Abstract

Developing an intelligent chatbot has evolved in the last few years to become a trending topic in the area of computer science. However, a chatbot often fails to understand the user’s intent, which can lead to the generation of inappropriate responses that cause dialogue breakdown and user dissatisfaction. Detecting the dialogue breakdown is essential to improve the performance of the chatbot and increase user satisfaction. Recent approaches have focused on modeling conversation breakdown using serveral approaches, including supervised and unsupervised approaches. Unsupervised approach relay heavy datasets, which make it challenging to apply it to the breakdown task. Another challenge facing predicting breakdown in conversation is the bias of human annotation for the dataset and the handling process for the breakdown. To tackle this challenge, we have developed a supervised ensemble automated approach that measures Chatbot Quality of Service (CQoS) based on dialogue breakdown. The proposed approach is able to label the datasets based on sentiment considering the context of the conversion to predict the breakdown. In this paper we aim to detect the affect of sentiment change of each speaker in a conversation. Furthermore, we use the supervised ensemble model to measure the CQoS based on breakdown. Then we handle this problem by using a hand-over mechanism that transfers the user to a live agent. Based on this idea, we perform several experiments across several datasets and state-of-the-art models, and we find that using sentiment as a trigger for breakdown outperforms human annotation. Overall, we infer that knowledge acquired from the supervised ensemble model can indeed help to measure CQoS based on detecting the breakdown in conversation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Yan Z, Duan N, Bao J, Chen P, Zhou M, Li Z (2018) Response selection from unstructured documents for human-computer conversation systems. Knowl-Based Syst 142:149

    Article  Google Scholar 

  2. Nuruzzaman M, Hussain OK (2020) IntelliBot: a dialogue-based chatbot for the insurance industry. Knowl-Based Syst 196:105810

    Article  Google Scholar 

  3. Yan Z, Duan N, Chen P, Zhou M, Zhou J, Li Z (2017) In: Thirty-first AAAI conference on artificial intelligence

  4. Henderson M, Thomson B, Young S (2013) In: Proceedings of the SIGDIAL 2013 conference, pp 467–471

  5. Banchs RE, Li H (2012) In: Proceedings of the ACL 2012 system demonstrations. Association for Computational Linguistics, pp 37–42

  6. Wu Y, Li Z, Wu W, Zhou M (2018) Response selection with topic clues for retrieval-based chatbots. Neurocomputing 316:251

    Article  Google Scholar 

  7. Ji Z, Lu Z, Li H (2014) arXiv preprint arXiv:1408.6988

  8. Shang L, Lu Z, Li H (2015) arXiv preprint arXiv:1503.02364

  9. Martinovsky B, Traum D (2006) The error is the clue: breakdown in human–machine interaction. Tech. rep., University of Southern California Marina Del Rey CA Inst for Creative

  10. Xie Z, Ling G (2017) In: Proceedings of the Dialog System Technology Challenges Workshop (DSTC6)

  11. Tripathy A, Agrawal A, Rath SK (2016) Classification of sentiment reviews using n-gram machine learning approach. Expert Syst Appl 57:117

    Article  Google Scholar 

  12. Gautam G, Yadav D (2014) In: 2014 seventh international conference on Contemporary Computing (IC3). IEEE, pp 437–442

  13. Walker MA, Langkilde-Geary I, Hastie HW, Wright J, Gorin A (2002) Automatically training a problematic dialogue predictor for a spoken dialogue system. J Artif Intell Res 16:293

    Article  Google Scholar 

  14. Higashinaka R, Funakoshi K, Kobayashi Y, Inaba M (2016) In: Proceedings of the tenth international conference on Language Resources and Evaluation (LREC’16), pp 3146–3150

  15. Kobayashi S, Unno Y, Fukuda M (2015) Multitask learning of recurrent neural network for detecting breakdowns of dialog and language modeling. Tech. rep., JSAI technical report (SIG-SLUD-75-B502)

  16. Saito A, Iki T (2017) In: Proceedings of the dialog system technology challenges workshop (DSTC6)

  17. Lee S, Lee D, Hooshyar D, Jo J, Lim H (2020) Integrating breakdown detection into dialogue systems to improve knowledge management: encoding temporal utterances with memory attention. Inf Technol Manag 21(1):51

    Article  Google Scholar 

  18. Almansor EH, Hussain FK (2020) In: International conference on advanced information networking and applications. Springer, pp 60–70

  19. Park C, Kim K, Kim S (2017) In: Proceedings of the dialog system technology challenges workshop (DSTC6)

  20. Hori C, Perez J, Higashinaka R, Hori T, Boureau YL, Inaba M, Tsunomori Y, Takahashi T, Yoshino K, Kim S (2019) Overview of the sixth dialog system technology challenge: DSTC6. Comput Speech Lang 55:1

    Article  Google Scholar 

  21. Sugiyama H (2017) In: Proceedings of Dialog System Technology Challenges, vol 6

  22. Takayama J, Nomoto E, Arase Y (2017) In: Proceedings of the Dialog System Technology Challenge 6 Workshop (DSTC6)

  23. Taniguchi K (2015) In JSAI Technical Report (SIG-SLUD-75-B502), pp 37–40

  24. Lopes J (2017) In: Proceedings of Dialog System Technology Challenges Workshop (DSTC6)

  25. Sugiyama H (2019) Empirical feature analysis for dialogue breakdown detection. Comput Speech Lang 54:140

    Article  Google Scholar 

  26. Hutto CJ, Gilbert E (2014) In: Eighth international AAAI conference on weblogs and social media

  27. Almansor EH, Al-Ani A (2018) In: International conference on machine learning and data mining in pattern recognition. Springer, pp 347–356

  28. Almansor EH, Al-Ani A, Hussain FK (2019) In: Conference on complex, intelligent, and software intensive systems. Springer, pp 176–187

  29. Kalarani P, Brunda SS (2019) Sentiment analysis by POS and joint sentiment topic features using SVM and ANN. Soft Comput 23(16):7067

    Article  Google Scholar 

  30. Raza M, Hussain FK, Hussain OK, Zhao M, Rehman Z (2019) A comparative analysis of machine learning models for quality pillar assessment of SaaS services by multi-class text classification of users’ reviews. Future Gener Comput Syst 101:341–371

    Article  Google Scholar 

  31. Higashinaka R, Funakoshi K, Inaba M, Tsunomori Y, Takahashi T, Kaji N (2017) In: Proceedings of dialog system technology challenge, vol 6

  32. Danescu-Niculescu-Mizil C, Lee L (2011) In: Proceedings of the 2nd workshop on cognitive modeling and computational linguistics. Association for Computational Linguistics, pp 76–87

  33. Poria S, Hazarika D, Majumder N, Naik G, Cambria E, Mihalcea R (2018) arXiv preprint arXiv:1810.02508

  34. Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45(4):427

    Article  Google Scholar 

  35. Raschka S (2015) Python machine learning. Packt Publishing Ltd, Birmingham

    Google Scholar 

  36. Müller AC, Guido S et al (2016) Introduction to machine learning with Python: a guide for data scientists. O’Reilly Media Inc, Sebastopol

    Google Scholar 

  37. Coelho LP, Richert W (2015) Building machine learning systems with Python. Packt Publishing Ltd, Birmingham

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ebtesam Hussain Almansor.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Almansor, E.H., Hussain, F.K. & Hussain, O.K. Supervised ensemble sentiment-based framework to measure chatbot quality of services. Computing 103, 491–507 (2021). https://doi.org/10.1007/s00607-020-00863-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-020-00863-0

Keywords

Mathematics Subject Classification

Navigation