Abstract
Evaluation can be defined as a process of determining the significance of a research output. This is usually done by devising a well-structured study on this output using one or more evaluation measures in which a careful inspection is performed. This paper presents a review of evaluation techniques for Conversational Agents (CAs) and Natural Language Interfaces to Databases (NLIDBs). It then introduces the developed customized evaluation methodology for Conversation-Based Interface to Relational Databases (C-BIRDs). The evaluation methodology created has been divided into two groups of measures. The first is based on quantitative measures, including two measures: task success and dialogue length. The second group is based on a number of qualitative measures, including: prototype ease of use, naturalness of system responses, positive/negative emotion, appearance, text on screen, organization of information, and error message clarity. Then an elaboration is carried out on the devised methodology by adding a discussion and recommendations on the sample size, the experimental setup and the scaling in order to provide a comprehensive evaluation methodology for C-BIRDs. In conclusion the evaluation methodology created is better way for identifying the strengths and weaknesses of C-BIRDs in comparison to the usage of single measure evaluations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Litman, D., Pan, S.: Designing and evaluating an adaptive spoken dialogue system. User Model. User-Adapted Interact. 12(2–3), 111–137 (2002)
Walker, M., Hirschman, L., Aberdeen, J.: Evaluation for DARPA communicator spoken dialogue systems. In: Proceedings Second International Conference on Language Resources and Evaluation (2000)
Sanders, G., Scholtz, J.: Measurement and evaluation of embodied conversational agents. In: Embodied Conversational Agents, pp. 346–373. MIT Press (2000)
Allen, J., Byron, D., Dzikovska, M., Ferguson, G., Galescu, L., Stent, A.: Toward conversational human-computer interaction. Am. Assoc. Artif. Intell. 22(4), 27–37 (2001)
López-Cózar, R., Callejas, Z., Espejo, G., Griol, D.: Enhancement of conversational agents by means of multimodal interaction. In: Perez-Marin, D., Pascual-Nieto, I. (eds.) Conversational Agents and Natural Language Interaction: Techniques and Effective Practices, pp. 223–252 (2011)
Hung, V., Elvir, M., Gonzalez, A., DeMara, R.: Towards a method for evaluating naturalness in conversational dialog systems. In: Proceedings of the 2009 IEEE International Conference on Systems, Man and Cybernetics, San Antonio, TX, USA, pp. 1236–1241. IEEE Press (2009)
Lamel, L., Bennacef, S., Gauvain, J.L., Dartigues, H., Temem, J.N.: User evaluation of the MASK kiosk. Speech Commun. 38(1), 131–139 (2002)
Cassell, J., Bickmore, T.: Negotiated collusion: modeling social language and its relationship effects in intelligent agents. User Model. User-Adapted Interact. 13(1–2), 89–132 (2003)
Semeraro, G., Andersen, H.H., Andersen, V., Lops, P., Abbattista, F.: Evaluation and validation of a conversational agent embodied in a bookstore. In: Proceedings of the User Interfaces for all 7th International Conference on Universal Access: Theoretical Perspectives, Practice, and Experience, Paris, France, pp. 360–371. Springer (2003)
Bernsen, N.O., Dybkjær, L.: User interview-based progress evaluation of two successive conversational agent prototypes. In: INTETAIN, pp. 220–224. Springer (2005)
Bouwman, G., Hulstijn, J.: Dialogue strategy redesign with reliability measures. In: Proceedings of First International Conference on Language Resources and Evaluation, pp. 191–198 (1998)
Foster, M.E., Giuliani, M., Knoll, A.: Comparing objective and subjective measures of usability in a human-robot dialogue system. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Suntec, Singapore, pp. 879–887. Association for Computational Linguistics (2009)
Bigot, L., Jamet, E., Rouet, J.-F.: Searching information with a natural language dialogue system: a comparison of spoken vs. written modalities. Appl. Ergon. 35(6), 557–564 (2004)
Artstein, R., Gandhe, S., Gerten, J., Leuski, A., Traum, D.: Semi-formal evaluation of conversational characters. In: Orna, G., Michael, K., Shmuel, K., Shuly, W. (eds.) Languages: From Formal to Natural, pp. 22–35. Springer (2009)
Silvervarg, A., Jönsson, A.: Subjective and objective evaluation of conversational agents in learning environments for young teenagers. In: The Twenty-Second International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain. AAAI Press/International Joint Conferences on Artificial Intelligence (2011)
Kopp, S., Gesellensetter, L., Kramer, N., Wachsmuth, I.: A conversational agent as museum guide: design and evaluation of a real-world application. In: Lecture Notes in Computer Science, pp. 329–343. Springer (2005)
McKevitt, P., Partridge, D., Wilks, Y.: Why machines should analyse intention in natural language dialogue. Int. J. Hum.-Comput. Stud. 51(5), 947–989 (1999)
Bickmore, T., Giorgino, T.: Health dialog systems for patients and consumers. J. Biomed. Inform. 39(5), 556–571 (2006)
Yuan, X., Chee, Y.S.: Design and evaluation of Elva: an embodied tour guide in an interactive virtual art gallery: research Articles. Comput. Animat. Virtual Worlds 16(2), 109–119 (2005)
Palmer, M., Finin, S.T.: Workshop on the evaluation of natural language processing systems. Comput. Linguist. 16, 175–181 (1990)
Forsmark, M.: Evaluating Natural Language Access to Relational Databases. UMEA University, Computing Science, Sweden (2005)
Jung, H., Lee, G.G.: Multilingual question answering with high portability on relational databases. In: Proceedings of the 2002 Conference on Multilingual Summarization and Question Answering - Volume 19, pp. 1–8. Association for Computational Linguistics (2002)
Popescu, A.-M., Armanasu, A., Etzioni, O., Ko, D., Yates, A.: Modern natural language interfaces to databases: composing statistical parsing with semantic tractability. In: Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland. Association for Computational Linguistics (2004)
Sharma, H., Kumar, N., Jha, G.K., Sharma, K.G., Wyld, D.C., Wozniak, M., Chaki, N., Meghanathan, N., Nagamalai, D.: A natural language interface based on machine learning approach. In: Communications in Computer and Information Science, vol. 197 Trends in Network and Communications, pp. 549–557. Springer, Heidelberg (2011)
Tang, L., Mooney, R.: Automated construction of database interfaces: integrating statistical and relational learning for semantic parsing. In: Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora: Held in Conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13, Hong Kong, pp. 133–141. Association for Computational Linguistics (2000)
Yates, A., Etzioni, O., Weld, D.: A reliable natural language interface to household appliances. In: Proceedings of the 8th International Conference on Intelligent User Interfaces, Miami, Florida, USA, pp. 189–196. ACM (2003)
Minock, M.: A phrasal approach to natural language interfaces over databases. In: Lecture Notes in Computer Science, Volume 3513, 2005 Natural Language Processing and Information Systems, pp. 333–336. Springer, Heidelberg (2005)
Minock, M.: C-Phrase: a system for building robust natural language interfaces to databases. Data Knowl. Eng. 69(3), 290–302 (2010)
Xiao, J., Stasko, J., Catrambone, R.: Embodied conversational agents as a UI paradigm: a framework for evaluation. In: Proceedings of AAMAS 2002 workshop: Embodied Conversational Agents Let’s Specify and Evaluate Them!, Bologna, Italy (2002)
Molich, R., Nielsen, J.: Improving a human-computer dialogue. Commun. ACM 33(3), 338–348 (1990)
Nielsen, J., Molich, R.: Heuristic evaluation of user interfaces. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems: Empowering People, Seattle, Washington, United States, pp. 249–256. ACM (1990)
Nielsen, J., Landauer, T.: A mathematical model of the finding of usability problems. In: Proceedings of the INTERACT 1993 and CHI 1993 Conference on Human Factors in Computing Systems, Amsterdam, The Netherlands, pp. 206–213. ACM (1993)
Blackmon, M.H., Polson, P.G., Kitajima, M., Lewis, C.: Cognitive walkthrough for the web. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems: Changing Our World, Changing Ourselves, Minneapolis, Minnesota, USA, pp. 463–470. ACM (2002)
Blackmon, M.H., Kitajima, M., Polson, P.G.: Repairing usability problems identified by the cognitive walkthrough for the web. In: Proceedings of the SIGCHI conference on Human factors in computing systems, Ft. Lauderdale, Florida, USA, pp. 497–504. ACM (2003)
Gabrielli, S., Mirabella, V., Kimani, S., Catarci, T.: Supporting cognitive walkthrough with video data: a mobile learning evaluation study. In: Proceedings of the 7th International Conference on Human Computer Interaction with Mobile Devices & Services, Salzburg, Austria, pp. 77–82. ACM (2005)
Mahatody, T., Sagar, M., Kolski, C.: State of the art on the cognitive walkthrough method, its variants and evolutions. Int. J. Hum. Comput. Interact. 26(8), 741–785 (2010)
Baik, C., Jagadish, H.V., Li, Y.: Bridging the semantic gap with SQL query logs in natural language interfaces to databases. In: 2019 IEEE 35th International Conference on Data Engineering (ICDE), Macao, Macao, pp. 374–385 (2019)
Owda, M., Bandar, Z., Crockett, K.: Information extraction for SQL query generation in the conversation-based interfaces to relational databases (C-BIRD). In: Agent and Multi-Agent Systems: Technologies and Applications, pp. 44–53. Springer, Heidelberg (2011)
Yuan, C., Ryan, P., Ta, C., et al.: Criteria2Query: a natural language interface to clinical databases for cohort definition. J. Am. Med. Inform. Assoc. 26(4), 294–305 (2019)
Xu, B.: NADAQ: natural language database querying based on deep learning. IEEE Access 7, 35012–35017 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Owda, M., Owda, A.Y., Gasir, F. (2021). A Comprehensive Methodology for Evaluating Conversation-Based Interfaces to Relational Databases (C-BIRDs). In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Systems and Applications. IntelliSys 2020. Advances in Intelligent Systems and Computing, vol 1251. Springer, Cham. https://doi.org/10.1007/978-3-030-55187-2_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-55187-2_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-55186-5
Online ISBN: 978-3-030-55187-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)