Skip to main content

A Comprehensive Methodology for Evaluating Conversation-Based Interfaces to Relational Databases (C-BIRDs)

  • Conference paper
  • First Online:
Intelligent Systems and Applications (IntelliSys 2020)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1251))

Included in the following conference series:

Abstract

Evaluation can be defined as a process of determining the significance of a research output. This is usually done by devising a well-structured study on this output using one or more evaluation measures in which a careful inspection is performed. This paper presents a review of evaluation techniques for Conversational Agents (CAs) and Natural Language Interfaces to Databases (NLIDBs). It then introduces the developed customized evaluation methodology for Conversation-Based Interface to Relational Databases (C-BIRDs). The evaluation methodology created has been divided into two groups of measures. The first is based on quantitative measures, including two measures: task success and dialogue length. The second group is based on a number of qualitative measures, including: prototype ease of use, naturalness of system responses, positive/negative emotion, appearance, text on screen, organization of information, and error message clarity. Then an elaboration is carried out on the devised methodology by adding a discussion and recommendations on the sample size, the experimental setup and the scaling in order to provide a comprehensive evaluation methodology for C-BIRDs. In conclusion the evaluation methodology created is better way for identifying the strengths and weaknesses of C-BIRDs in comparison to the usage of single measure evaluations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Litman, D., Pan, S.: Designing and evaluating an adaptive spoken dialogue system. User Model. User-Adapted Interact. 12(2–3), 111–137 (2002)

    Article  MATH  Google Scholar 

  2. Walker, M., Hirschman, L., Aberdeen, J.: Evaluation for DARPA communicator spoken dialogue systems. In: Proceedings Second International Conference on Language Resources and Evaluation (2000)

    Google Scholar 

  3. Sanders, G., Scholtz, J.: Measurement and evaluation of embodied conversational agents. In: Embodied Conversational Agents, pp. 346–373. MIT Press (2000)

    Google Scholar 

  4. Allen, J., Byron, D., Dzikovska, M., Ferguson, G., Galescu, L., Stent, A.: Toward conversational human-computer interaction. Am. Assoc. Artif. Intell. 22(4), 27–37 (2001)

    Google Scholar 

  5. López-Cózar, R., Callejas, Z., Espejo, G., Griol, D.: Enhancement of conversational agents by means of multimodal interaction. In: Perez-Marin, D., Pascual-Nieto, I. (eds.) Conversational Agents and Natural Language Interaction: Techniques and Effective Practices, pp. 223–252 (2011)

    Google Scholar 

  6. Hung, V., Elvir, M., Gonzalez, A., DeMara, R.: Towards a method for evaluating naturalness in conversational dialog systems. In: Proceedings of the 2009 IEEE International Conference on Systems, Man and Cybernetics, San Antonio, TX, USA, pp. 1236–1241. IEEE Press (2009)

    Google Scholar 

  7. Lamel, L., Bennacef, S., Gauvain, J.L., Dartigues, H., Temem, J.N.: User evaluation of the MASK kiosk. Speech Commun. 38(1), 131–139 (2002)

    Article  MATH  Google Scholar 

  8. Cassell, J., Bickmore, T.: Negotiated collusion: modeling social language and its relationship effects in intelligent agents. User Model. User-Adapted Interact. 13(1–2), 89–132 (2003)

    Article  Google Scholar 

  9. Semeraro, G., Andersen, H.H., Andersen, V., Lops, P., Abbattista, F.: Evaluation and validation of a conversational agent embodied in a bookstore. In: Proceedings of the User Interfaces for all 7th International Conference on Universal Access: Theoretical Perspectives, Practice, and Experience, Paris, France, pp. 360–371. Springer (2003)

    Google Scholar 

  10. Bernsen, N.O., Dybkjær, L.: User interview-based progress evaluation of two successive conversational agent prototypes. In: INTETAIN, pp. 220–224. Springer (2005)

    Google Scholar 

  11. Bouwman, G., Hulstijn, J.: Dialogue strategy redesign with reliability measures. In: Proceedings of First International Conference on Language Resources and Evaluation, pp. 191–198 (1998)

    Google Scholar 

  12. Foster, M.E., Giuliani, M., Knoll, A.: Comparing objective and subjective measures of usability in a human-robot dialogue system. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Suntec, Singapore, pp. 879–887. Association for Computational Linguistics (2009)

    Google Scholar 

  13. Bigot, L., Jamet, E., Rouet, J.-F.: Searching information with a natural language dialogue system: a comparison of spoken vs. written modalities. Appl. Ergon. 35(6), 557–564 (2004)

    Article  Google Scholar 

  14. Artstein, R., Gandhe, S., Gerten, J., Leuski, A., Traum, D.: Semi-formal evaluation of conversational characters. In: Orna, G., Michael, K., Shmuel, K., Shuly, W. (eds.) Languages: From Formal to Natural, pp. 22–35. Springer (2009)

    Google Scholar 

  15. Silvervarg, A., Jönsson, A.: Subjective and objective evaluation of conversational agents in learning environments for young teenagers. In: The Twenty-Second International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain. AAAI Press/International Joint Conferences on Artificial Intelligence (2011)

    Google Scholar 

  16. Kopp, S., Gesellensetter, L., Kramer, N., Wachsmuth, I.: A conversational agent as museum guide: design and evaluation of a real-world application. In: Lecture Notes in Computer Science, pp. 329–343. Springer (2005)

    Google Scholar 

  17. McKevitt, P., Partridge, D., Wilks, Y.: Why machines should analyse intention in natural language dialogue. Int. J. Hum.-Comput. Stud. 51(5), 947–989 (1999)

    Article  Google Scholar 

  18. Bickmore, T., Giorgino, T.: Health dialog systems for patients and consumers. J. Biomed. Inform. 39(5), 556–571 (2006)

    Article  Google Scholar 

  19. Yuan, X., Chee, Y.S.: Design and evaluation of Elva: an embodied tour guide in an interactive virtual art gallery: research Articles. Comput. Animat. Virtual Worlds 16(2), 109–119 (2005)

    Article  Google Scholar 

  20. Palmer, M., Finin, S.T.: Workshop on the evaluation of natural language processing systems. Comput. Linguist. 16, 175–181 (1990)

    Google Scholar 

  21. Forsmark, M.: Evaluating Natural Language Access to Relational Databases. UMEA University, Computing Science, Sweden (2005)

    Google Scholar 

  22. Jung, H., Lee, G.G.: Multilingual question answering with high portability on relational databases. In: Proceedings of the 2002 Conference on Multilingual Summarization and Question Answering - Volume 19, pp. 1–8. Association for Computational Linguistics (2002)

    Google Scholar 

  23. Popescu, A.-M., Armanasu, A., Etzioni, O., Ko, D., Yates, A.: Modern natural language interfaces to databases: composing statistical parsing with semantic tractability. In: Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland. Association for Computational Linguistics (2004)

    Google Scholar 

  24. Sharma, H., Kumar, N., Jha, G.K., Sharma, K.G., Wyld, D.C., Wozniak, M., Chaki, N., Meghanathan, N., Nagamalai, D.: A natural language interface based on machine learning approach. In: Communications in Computer and Information Science, vol. 197 Trends in Network and Communications, pp. 549–557. Springer, Heidelberg (2011)

    Google Scholar 

  25. Tang, L., Mooney, R.: Automated construction of database interfaces: integrating statistical and relational learning for semantic parsing. In: Proceedings of the 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora: Held in Conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13, Hong Kong, pp. 133–141. Association for Computational Linguistics (2000)

    Google Scholar 

  26. Yates, A., Etzioni, O., Weld, D.: A reliable natural language interface to household appliances. In: Proceedings of the 8th International Conference on Intelligent User Interfaces, Miami, Florida, USA, pp. 189–196. ACM (2003)

    Google Scholar 

  27. Minock, M.: A phrasal approach to natural language interfaces over databases. In: Lecture Notes in Computer Science, Volume 3513, 2005 Natural Language Processing and Information Systems, pp. 333–336. Springer, Heidelberg (2005)

    Google Scholar 

  28. Minock, M.: C-Phrase: a system for building robust natural language interfaces to databases. Data Knowl. Eng. 69(3), 290–302 (2010)

    Article  Google Scholar 

  29. Xiao, J., Stasko, J., Catrambone, R.: Embodied conversational agents as a UI paradigm: a framework for evaluation. In: Proceedings of AAMAS 2002 workshop: Embodied Conversational Agents Let’s Specify and Evaluate Them!, Bologna, Italy (2002)

    Google Scholar 

  30. Molich, R., Nielsen, J.: Improving a human-computer dialogue. Commun. ACM 33(3), 338–348 (1990)

    Article  Google Scholar 

  31. Nielsen, J., Molich, R.: Heuristic evaluation of user interfaces. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems: Empowering People, Seattle, Washington, United States, pp. 249–256. ACM (1990)

    Google Scholar 

  32. Nielsen, J., Landauer, T.: A mathematical model of the finding of usability problems. In: Proceedings of the INTERACT 1993 and CHI 1993 Conference on Human Factors in Computing Systems, Amsterdam, The Netherlands, pp. 206–213. ACM (1993)

    Google Scholar 

  33. Blackmon, M.H., Polson, P.G., Kitajima, M., Lewis, C.: Cognitive walkthrough for the web. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems: Changing Our World, Changing Ourselves, Minneapolis, Minnesota, USA, pp. 463–470. ACM (2002)

    Google Scholar 

  34. Blackmon, M.H., Kitajima, M., Polson, P.G.: Repairing usability problems identified by the cognitive walkthrough for the web. In: Proceedings of the SIGCHI conference on Human factors in computing systems, Ft. Lauderdale, Florida, USA, pp. 497–504. ACM (2003)

    Google Scholar 

  35. Gabrielli, S., Mirabella, V., Kimani, S., Catarci, T.: Supporting cognitive walkthrough with video data: a mobile learning evaluation study. In: Proceedings of the 7th International Conference on Human Computer Interaction with Mobile Devices & Services, Salzburg, Austria, pp. 77–82. ACM (2005)

    Google Scholar 

  36. Mahatody, T., Sagar, M., Kolski, C.: State of the art on the cognitive walkthrough method, its variants and evolutions. Int. J. Hum. Comput. Interact. 26(8), 741–785 (2010)

    Article  Google Scholar 

  37. Baik, C., Jagadish, H.V., Li, Y.: Bridging the semantic gap with SQL query logs in natural language interfaces to databases. In: 2019 IEEE 35th International Conference on Data Engineering (ICDE), Macao, Macao, pp. 374–385 (2019)

    Google Scholar 

  38. Owda, M., Bandar, Z., Crockett, K.: Information extraction for SQL query generation in the conversation-based interfaces to relational databases (C-BIRD). In: Agent and Multi-Agent Systems: Technologies and Applications, pp. 44–53. Springer, Heidelberg (2011)

    Google Scholar 

  39. Yuan, C., Ryan, P., Ta, C., et al.: Criteria2Query: a natural language interface to clinical databases for cohort definition. J. Am. Med. Inform. Assoc. 26(4), 294–305 (2019)

    Article  Google Scholar 

  40. Xu, B.: NADAQ: natural language database querying based on deep learning. IEEE Access 7, 35012–35017 (2019)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Majdi Owda .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Owda, M., Owda, A.Y., Gasir, F. (2021). A Comprehensive Methodology for Evaluating Conversation-Based Interfaces to Relational Databases (C-BIRDs). In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Systems and Applications. IntelliSys 2020. Advances in Intelligent Systems and Computing, vol 1251. Springer, Cham. https://doi.org/10.1007/978-3-030-55187-2_17

Download citation

Publish with us

Policies and ethics