ABSTRACT
Chatbots are widely employed in various scenarios. However, given the high costs of chatbot development and chatbots’ tremendous social influence, chatbot failures may inevitably lead to a huge economic loss. Previous chatbot evaluation frameworks rely heavily on human evaluation, lending little support for automatic early-stage chatbot examination prior to deployment. To reduce the risk of potential loss, we propose a computational approach to extracting features and training models that make a priori prediction about chatbots’ popularity, which indicates chatbot general performance. The features we extract cover chatbot Intent, Conversation Flow, and Response Design. We studied 1050 customer service chatbots on one of the most popular chatbot service platforms. Our model achieves 77.36% prediction accuracy among very popular and very unpopular chatbots, making the first step towards computational feedback before chatbot deployment. Our evaluation results also reveal the key design features associated with chatbot popularity and offer guidance on chatbot design.
- 2016. New DigitasLBi Research Shows More than 1 in 3 Americans are willing to make Purchases via Chatbots. digitas.com.Google Scholar
- 2017. Enterprises by business size. data.oecd.org.Google Scholar
- 2019. The Ultimate Guide for chatbot conversation flow. rakebots.com.Google Scholar
- 2020. 20+ Metrics for Chatbot Analytics in 2020: The Ultimate Guide. research.aimultiple.com.Google Scholar
- 2021. 8 Epic Chatbot/Conversational Bot Failures. research.aimultiple.com.Google Scholar
- Hadeel Al-Zubaide and Ayman A Issa. 2011. Ontbot: Ontology based chatbot. In International Symposium on Innovations in Information and Communications Technology. IEEE, 7–12. https://doi.org/10.1109/ISIICT.2011.6149594Google ScholarCross Ref
- Zahra Ashktorab, Mohit Jain, Q Vera Liao, and Justin D Weisz. 2019. Resilient Chatbots: Repair Strategy Preferences for Conversational Breakdowns. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 254. https://doi.org/10.1145/3290605.3300484Google ScholarDigital Library
- Paul Boutin. 2017. Why Most Chatbots Fail. chatbotsmagazine.com.Google Scholar
- Kay Henning Brodersen, Cheng Soon Ong, Klaas Enno Stephan, and Joachim M Buhmann. 2010. The balanced accuracy and its posterior distribution. In 2010 20th International Conference on Pattern Recognition. IEEE, 3121–3124. https://doi.org/10.1109/ICPR.2010.764Google ScholarDigital Library
- Heloisa Candello, Claudio Pinhanez, and Flavio Figueiredo. 2017. Typefaces and the perception of humanness in natural language chatbots. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM, 3476–3487. https://doi.org/10.1145/3025453.3025919Google ScholarDigital Library
- Heloisa Candello, Claudio Pinhanez, Mauro Pichiliani, Paulo Cavalin, Flavio Figueiredo, Marisa Vasconcelos, and Haylla Do Carmo. 2019. The Effect of Audiences on the User Experience with Conversational Interfaces in Physical Spaces. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 90. https://doi.org/10.1145/3290605.3300320Google ScholarDigital Library
- Mark Cirillo. 2019. 6 Examples of SMART Chatbot to Improve Your Customer Services. leadsbridge.com.Google Scholar
- David Coniam. 2014. The linguistic accuracy of chatbots: usability from an ESL perspective. Text & Talk 34, 5 (2014), 545–567. https://doi.org/10.1515/text-2014-0018Google ScholarCross Ref
- Menal Dahiya. 2017. A tool of conversation: Chatbot. International Journal of Computer Sciences and Engineering 5, 5(2017), 158–161.Google Scholar
- Edgar Dale and Jeanne S Chall. 1948. A formula for predicting readability: Instructions. Educational research bulletin(1948), 37–54.Google Scholar
- Christopher Dossman. 2019. AI Scholar: Chatbots that improve after deployment. towardsdatascience.com.Google Scholar
- Jon Doyle. 1992. Rationality and its roles in reasoning. Computational Intelligence 8, 2 (1992), 376–409. https://doi.org/10.1111/j.1467-8640.1992.tb00371.xGoogle ScholarCross Ref
- Akash Dubey. 2018. Feature Selection Using Random forest. towardsdatascience.com.Google Scholar
- Ahmed Fadhil, Gianluca Schiavo, Yunlong Wang, and Bereket A Yilma. 2018. The effect of emojis when interacting with conversational interface assisted health coaching system. In Proceedings of the 12th EAI International Conference on Pervasive Computing Technologies for Healthcare. ACM, 378–383. https://doi.org/10.1145/3240925.3240965Google ScholarDigital Library
- Ethan Fast, Binbin Chen, and Michael S Bernstein. 2016. Empath: Understanding topic signals in large-scale text. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 4647–4657. https://doi.org/10.1145/2858036.2858535Google ScholarDigital Library
- Ong Sing Goh, Cemal Ardil, Wilson Wong, and Chun Che Fung. 2007. A black-box approach for response quality evaluation of conversational agent systems. International Journal of Computational Intelligence 3, 3(2007), 195–203. https://doi.org/10.5281/zenodo.1076854Google Scholar
- Jonathan Grudin and Richard Jacques. 2019. Chatbots, Humbots, and the Quest for Artificial General Intelligence. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 209. https://doi.org/10.1145/3290605.3300439Google ScholarDigital Library
- Robert Gunning. 1969. The fog index after twenty years. Journal of Business Communication 6, 2 (1969), 3–13.Google ScholarCross Ref
- Lakisha Hall. 2018. 6 steps to successful conversational design. ibm.com.Google Scholar
- Chan Chun Ho, Ho Lam Lee, Wing Kwan Lo, and Kwok Fai Andrew Lui. 2018. Developing a chatbot for college student programme advisement. In 2018 International Symposium on Educational Technology (ISET). IEEE, 52–56. https://doi.org/10.1109/ISET.2018.00021Google Scholar
- HN Io and CB Lee. 2017. Chatbots and conversational agents: A bibliometric analysis. In 2017 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM). IEEE, 215–219. https://doi.org/10.1109/IEEM.2017.8289883Google ScholarCross Ref
- Mohammed Kaleem, Omar Alobadi, James O’Shea, and Keeley Crockett. 2016. Framework for the formulation of metrics for conversational agent evaluation. In RE-WOCHAT: Workshop on Collecting and Generating Resources for Chatbots and Conversational Agents-Development and Evaluation Workshop Programme (May 28 th, 2016). 20.Google Scholar
- Ralph L Keeney and Howard Raiffa. 1993. Decisions with multiple objectives: preferences and value trade-offs. Cambridge university press. https://doi.org/10.1017/CBO9781139174084Google Scholar
- Soomin Kim, Joonhwan Lee, and Gahgene Gweon. 2019. Comparing Data from Chatbot and Web Surveys: Effects of Platform and Conversational Style on Survey Response Quality. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 86. https://doi.org/10.1145/3290605.3300316Google ScholarDigital Library
- J Peter Kincaid, Robert P Fishburne Jr, Richard L Rogers, and Brad S Chissom. 1975. Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. (1975).Google Scholar
- Karolina Kuligowska. 2015. Commercial chatbot: Performance evaluation, usability metrics and quality standards of embodied conversational agents. Professionals Center for Business Research 2 (2015).Google Scholar
- Justin Lee. 2018. The practical guide to chatbot metrics and analytics. blog.growthbot.org.Google Scholar
- Michael McTear. 2018. Conversation modelling for chatbots: current approaches and future directions. Studientexte zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung 2018 (2018), 175–185.Google Scholar
- M Meira and AM de P Canuto. 2015. Evaluation of emotional agents’ architectures: an approach based on quality metrics and the influence of emotions on users. In Proceedings of the world congress on engineering, Vol. 1. 1–8.Google Scholar
- Isaac Oswalt. 2017. What Will a Chatbot Cost Me - And Is It Worth It?21handshake.com.Google Scholar
- David J Pasta. 2009. Learning when to be discrete: continuous vs. categorical predictors. In SAS Global Forum, Vol. 248.Google Scholar
- James W Pennebaker, Martha E Francis, and Roger J Booth. 2001. Linguistic inquiry and word count: LIWC 2001. Mahway: Lawrence Erlbaum Associates 71, 2001 (2001), 2001.Google Scholar
- Juanan Pereira and Oscar Díaz. 2018. A quality analysis of facebook messenger’s most popular chatbots. In Proceedings of the 33rd Annual ACM Symposium on Applied Computing. ACM, 2144–2150. https://doi.org/10.1145/3167132.3167362Google ScholarDigital Library
- Nicole M Radziwill and Morgan C Benton. 2017. Evaluating quality of chatbots and intelligent conversational agents. arXiv preprint arXiv:1704.04579(2017).Google Scholar
- Prateek Saxena. 2018. How Much Does it Cost to Develop A Chatbot. appinventiv.com.Google Scholar
- Joao Sedoc, Daphne Ippolito, Arun Kirubarajan, Jai Thirani, Lyle Ungar, and Chris Callison-Burch. 2019. Chateval: A tool for chatbot evaluation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). 60–65. https://doi.org/10.18653/v1/N19-4011Google Scholar
- RJ Senter and Edgar A Smith. 1967. Automated readability index. Technical Report. CINCINNATI UNIV OH.Google Scholar
- Bayan Abu Shawar and Eric Atwell. 2007. Chatbots: are they really useful?. In Ldv forum, Vol. 22. 29–49.Google Scholar
- Bayan Abu Shawar and Eric Atwell. 2007. Different measurements metrics to evaluate a chatbot system. In Proceedings of the workshop on bridging the gap: Academic and industrial research in dialog technologies. Association for Computational Linguistics, 89–96.Google ScholarDigital Library
- Heung-Yeung Shum, Xiao-dong He, and Di Li. 2018. From Eliza to XiaoIce: challenges and opportunities with social chatbots. Frontiers of Information Technology & Electronic Engineering 19, 1(2018), 10–26.Google ScholarCross Ref
- Emma VA Sylvester, Paul Bentzen, Ian R Bradbury, Marie Clément, Jon Pearce, John Horne, and Robert G Beiko. 2018. Applications of random forest feature selection for fine-scale genetic population assignment. Evolutionary applications 11, 2 (2018), 153–165. https://doi.org/10.1111/eva.12524Google Scholar
- Anu Venkatesh, Chandra Khatri, Ashwin Ram, Fenfei Guo, Raefer Gabriel, Ashish Nagar, Rohit Prasad, Ming Cheng, Behnam Hedayatnia, Angeliki Metallinou, 2018. On evaluating and comparing conversational agents. arXiv preprint arXiv:1801.03625 4 (2018), 60–68.Google Scholar
- Michael Vetter. 2002. Quality aspects of bots. In Software quality and software testing in internet times. Springer, 165–184. https://doi.org/10.1007/978-3-642-56333-1_11Google Scholar
- Marilyn A Walker, Diane J Litman, Candace A Kamm, and Alicia Abella. 1997. PARADISE: A Framework for Evaluating Spoken Dialogue Agents. In 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics. 271–280. https://doi.org/10.3115/976909.979652Google ScholarDigital Library
- Joseph Weizenbaum. 1966. ELIZA—a computer program for the study of natural language communication between man and machine. Commun. ACM 9, 1 (1966), 36–45. https://doi.org/10.1145/365153.365168Google ScholarDigital Library
- Anbang Xu, Zhe Liu, Yufan Guo, Vibha Sinha, and Rama Akkiraju. 2017. A new chatbot for customer service on social media. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM, 3506–3510. https://doi.org/10.1145/3025453.3025496Google ScholarDigital Library
- Li Zhou, Jianfeng Gao, Di Li, and Heung-Yeung Shum. 2020. The design and implementation of xiaoice, an empathetic social chatbot. Computational Linguistics 46, 1 (2020), 53–93. https://doi.org/10.1162/coli_a_00368Google ScholarDigital Library
- Michelle X Zhou, Gloria Mark, Jingyi Li, and Huahai Yang. 2019. Trusting Virtual Agents: The Effect of Personality. ACM Transactions on Interactive Intelligent Systems (TiiS) 9, 2-3(2019), 10. https://doi.org/10.1145/3232077Google ScholarDigital Library
- Rui Zhou, Jasmine Hentschel, and Neha Kumar. 2017. Goodbye text, hello emoji: mobile communication on wechat in China. In Proceedings of the 2017 CHI conference on human factors in computing systems. ACM, 748–759. https://doi.org/10.1145/3025453.3025800Google ScholarDigital Library
Recommendations
Chatbot with Touch and Graphics: An Interaction of Users for Emotional Expression and Turn-taking
CUI '20: Proceedings of the 2nd Conference on Conversational User InterfacesUse of chatbots for emotional exchange is recently increasing in various domains. However, as existing chatbots have been considered in terms of natural language processing techniques for interaction with text-based chatting, chatbot interaction with ...
Multi-platform Chatbot Modeling and Deployment with the Jarvis Framework
Advanced Information Systems EngineeringAbstractChatbot applications are increasingly adopted in various domains such as e-commerce or customer services as a direct communication channel between companies and end-users. Multiple frameworks have been developed to ease their definition and ...
E-commerce Distributed Chatbot System
BCI'19: Proceedings of the 9th Balkan Conference on InformaticsChatbot system attract huge interest in the recent years in many different fields in an attempt to increase the efficiency and shortens the business process execution time replacing the human-human communication with a human-machine conversations and ...
Comments