Skip to main content

How to Explain It to System Testers?

A Qualitative User Study About Understandability, Validatability, Predictability, and Trustworthiness

  • Conference paper
  • First Online:
Artificial Intelligence in HCI (HCII 2024)

Abstract

In the realm of explainable AI (XAI), limited research exists on user role-specific explanations. This study aims to determine the explanation needs for the user role “system tester of AI-based systems.” It investigates whether established explanation types adequately address the explainability requirements of ML-based application testers. Through a qualitative study (n = 12), we identified the explanation needs for three user tasks: test strategy determination, test case determination, and test result determination. The research yields five findings: F1) proposing a new explanation domain type, “system domain,” F2) proposing a new explanation structure, “hierarchical,” F3) identifying overlapping explanation content between two user groups, F4) considering identified inputs of a user task as explanation content candidates, and F5) highlighting the risk of combining the evaluation of assumed mental model representations with identifying explanation content in one study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Microsoft and Microsoft Teams are trademarks of the Microsoft group of companies.

  2. 2.

    Microsoft and Microsoft Teams are trademarks of the Microsoft group of companies.

References

  1. Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018). https://doi.org/10.1109/ACCESS.2018.2870052

    Article  Google Scholar 

  2. Andersen, B.S., Fagerhaug, T.: Root Cause Analysis: Simplified Tools and Techniques, 2 edn. ASQ Quality Press (2006). https://asq.org/quality-press/display-item?item=H1287

  3. Andrews, R.W., Lilly, J.M., Divya, S., Feigh, K.M.: The role of shared mental models in human-AI teams: a theoretical review. Theor. Issues Ergon. Sci. 24(2), 129–175 (2023). https://doi.org/10.1080/1463922X.2022.2061080

    Article  Google Scholar 

  4. Arya, V., et al.: One explanation does not fit all: a toolkit and taxonomy of AI explainability techniques (2019). https://doi.org/10.48550/arXiv.1909.03012

  5. Barnett, T.O., Constantine, L.L.: Modular Programming: Proceedings of a National Symposium. Information & systems Institute (1968)

    Google Scholar 

  6. Barredo Arrieta, A., et al.: Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inform. Fusion 58, 82–115 (2020). https://doi.org/10.1016/j.inffus.2019.12.012

    Article  Google Scholar 

  7. Basili, V.R.: Software Modeling and Measurement: The Goal/Question/Metric Paradigm (CS-TR-2956, UMIACS-TR-92-96). Technical report, University of Maryland, Institute for Advanced Computer Studies (1992). https://www.cs.umd.edu/~basili/publications/technical/T78.pdf. Accessed 26 Dec. 2023

  8. Bayram, F., Ahmed, B.S., Kassler, A.: From concept drift to model degradation: an overview on performance-aware drift detectors. Knowl.-Based Syst. 245, 108632 (2022). https://doi.org/10.1016/j.knosys.2022.108632, https://www.sciencedirect.com/science/article/pii/S0950705122002854

  9. Borg, M., Aasa, E., Etemadi, K., Monperrus, M.: Human, What Must I Tell You? IEEE Softw. 40(03), 9–14 (2023). https://doi.org/10.1109/MS.2023.3244638

    Article  Google Scholar 

  10. Chazette, L., Brunotte, W., Speith, T.: Explainable software systems: from requirements analysis to system evaluation. Requirements Eng. 27(4), 457–487 (2022). https://doi.org/10.1007/s00766-022-00393-5

    Article  Google Scholar 

  11. Chazette, L., Schneider, K.: Explainability as a non-functional requirement: challenges and recommendations. Requirements Eng. 25(4), 493–514 (2020). https://doi.org/10.1007/s00766-020-00333-1

    Article  Google Scholar 

  12. Clement, T., Kemmerzell, N., Abdelaal, M., Amberg, M.: Xair: a systematic metareview of explainable AI (XAI) aligned to the software development process. Mach. Learn. Knowl. Extract. 5(1), 78–108 (2023). https://doi.org/10.3390/make5010006

    Article  Google Scholar 

  13. Corbin, J., Strauss, A.: Basics of Qualitative Research (3rd ed.): Techniques and Procedures for Developing Grounded Theory, 3 edn.. SAGE Publications, Thousand Oaks (2008). https://doi.org/10.4135/9781452230153

  14. Creswell, J.S., David, C.J.: Research Design. Qualitative, Quantitative, and Mixed Method Approaches, 5th edn. SAGE Publications, Los Angeles (2018)

    Google Scholar 

  15. Degen, H.: Respect The User’s Time: Experience Architecture and Design for Efficiency, 1st edn. Helmut Degen, Plainsboro (2022). https://www.designforefficiency.com

  16. Degen, H., Budnik, C., Gross, R., Rothering, M.: How to explain it to a model manager? A qualitative user study about understandability, trustworthiness, actionability, and action efficacy. In: HCII 2023, Part I. LNCS, pp. 209–242. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-35891-3_14

  17. Furniss, D., Blandford, A., Curzon, P.: Confessions from a grounded theory PhD: experiences and lessons learnt. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. CHI 2011, New York, NY, USA, pp. 113–122. Association for Computing Machinery (2011). https://doi.org/10.1145/1978942.1978960

  18. Gentner, D.: Mental models, psychology of. In: Smelser, N.J., Baltes, P.B. (eds.) International Encyclopedia of the Social & Behavioral Sciences, pp. 9683–9687. Pergamon, Oxford (2001). https://doi.org/10.1016/B0-08-043076-7/01487-X, https://www.sciencedirect.com/science/article/pii/B008043076701487X

  19. Glaser, B.G., Strauss, A.L.: The Discovery of Grounded Theory: Strategies for Qualitative Research. Aldine de Gruyter, New York (1967)

    Google Scholar 

  20. Guest, G., Bunce, A., Johnson, L.: How many interviews are enough?: An experiment with data saturation and variability. Field Methods 18(1), 59–82 (2006). https://doi.org/10.1177/1525822X05279903

    Article  Google Scholar 

  21. Gunning, D., Aha, D.: DARPA’s explainable artificial intelligence (XAI) program. AI Magazine 40(2), 44–58 (2019). https://doi.org/10.1609/aimag.v40i2.2850, https://ojs.aaai.org/index.php/aimagazine/article/view/2850

  22. Hennink, M., Kaiser, B.N.: Sample sizes for saturation in qualitative research: a systematic review of empirical tests. Soc. Sci. Med. 292, 114523 (2022). https://doi.org/10.1016/j.socscimed.2021.114523

    Article  Google Scholar 

  23. Hoffman, R.R., Miller, T., Mueller, S.T., Klein, G., Clancey, W.J.: Explaining explanation, Part 4: a deep dive on deep nets. IEEE Intell. Syst. 33(03), 87–95 (2018). https://doi.org/10.1109/MIS.2018.033001421

    Article  Google Scholar 

  24. Hoffman, R.R., Mueller, S.T., Klein, G., Litman, J.: Metrics for explainable AI: Challenges and prospects (2019). https://doi.org/10.48550/arXiv.1812.04608

  25. Langer, M., et al.: What do we want from Explainable Artificial Intelligence (XAI)? - A stakeholder perspective on XAI and a conceptual model guiding interdisciplinary XAI research. Artif. Intell. 296, 103473 (2021). https://doi.org/10.1016/j.artint.2021.103473

  26. Miller, T.: Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2019). https://doi.org/10.1016/j.artint.2018.07.007

    Article  MathSciNet  Google Scholar 

  27. Mohseni, S., Zarei, N., Ragan, E.D.: A multidisciplinary survey and framework for design and evaluation of explainable AI systems. ACM Trans. Interact. Intell. Syst. 11(3-4) (2021). https://doi.org/10.1145/3387166

  28. Nishi, Y., Masuda, S., Ogawa, H., Uetsuki, K.: A test architecture for machine learning product. In: 2018 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), pp. 273–278 (2018). https://doi.org/10.1109/ICSTW.2018.00060

  29. Parnas, D.L.: On the criteria to be used in decomposing systems into modules. Commun. ACM 15(12), 1053–1058 (1972). https://doi.org/10.1145/361598.361623

    Article  Google Scholar 

  30. Piano, L., Garcea, F., Gatteschi, V., Lamberti, F., Morra, L.: Detecting drift in deep learning: a methodology primer. IT Professional 24(5), 53–60 (2022). https://doi.org/10.1109/MITP.2022.3191318

    Article  Google Scholar 

  31. van de Poel, I.: The relation between forward-looking and backward-looking responsibility. In: Vincent, N.A., van de Poel, I., van den Hoven, J. (eds.) Moral Responsibility: Beyond Free Will and Determinism, pp. 37–52. Springer, Dordrecht (2011). https://doi.org/10.1007/978-94-007-1878-4_3

  32. Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1(5), 206–215 (2019). https://doi.org/10.1038/s42256-019-0048-x

    Article  Google Scholar 

  33. Saraf, A.P., Chan, K., Popish, M., Browder, J., Schade, J.: Explainable artificial intelligence for aviation safety applications. In: AIAA AVIATION 2020 FORUM (2020). https://doi.org/10.2514/6.2020-2881, https://arc.aiaa.org/doi/abs/10.2514/6.2020-2881

  34. Schröder, T., Schulz, M.: Monitoring machine learning models: a categorization of challenges and methods. Data Sci. Manag. 5(3), 105–116 (2022). https://doi.org/10.1016/j.dsm.2022.07.004

  35. Stepin, I., Alonso, J.M., Catala, A., Pereira-Fariña, M.: A survey of contrastive and counterfactual explanation generation methods for explainable artificial intelligence. IEEE Access 9, 11974–12001 (2021). https://doi.org/10.1109/ACCESS.2021.3051315

    Article  Google Scholar 

  36. Sun, J., et al.: Investigating explainability of generative AI for code through scenario-based design. In: 27th International Conference on Intelligent User Interfaces. IUI ’22, New York, NY, USA, pp. 212–228. Association for Computing Machinery (2022). https://doi.org/10.1145/3490099.3511119

  37. Tantithamthavorn, C., Cito, J., Hemmati, H., Chandra, S.: Explainable AI for SE: challenges and future directions. IEEE Softw. 40(03), 29–33 (2023). https://doi.org/10.1109/MS.2023.3246686

    Article  Google Scholar 

  38. Triantafyllou, S.: Forward-looking and backward-looking responsibility attribution in multi-agent sequential decision making. In: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems. AAMAS ’23, Richland, SC, pp. 2952–2954. International Foundation for Autonomous Agents and Multiagent Systems (2023)

    Google Scholar 

  39. Turek, M.: Explainable Artificial Intelligence (XAI) (Aug 2016). https://www.darpa.mil/program/explainable-artificial-intelligence. Accessed 3 Mar 2020

  40. van der Waa, J., Schoonderwoerd, T., van Diggelen, J., Neerincx, M.: Interpretable confidence measures for decision support systems. Int. J. Hum.-Comput. Stud. 144, 102493 (2020). https://doi.org/10.1016/j.ijhcs.2020.102493

    Article  Google Scholar 

  41. Zhang, Y., Liao, Q.V., Bellamy, R.K.E.: Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. FAT* ’20. ACM (2020). https://doi.org/10.1145/3351095.3372852

Download references

Acknowledgment

The authors thank Parinitha Nagaraja for evaluating the initial mental model of explanations, and all study participants for their time and shared insights.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Helmut Degen .

Editor information

Editors and Affiliations

8Appendix

8Appendix

1.1 8.1User task specific explanation needs

Inputs, outputs, and needed explanations for user task UT 1:

  • Inputs

    • Application domain

    • Description of the system under test and system requirements

    • Architecture of ML models, data structure, and repository with metrics (e.g., quantity and types of available data)

    • Test constraints (e.g., cost, resources, quality, available hardware, available software, etc.)

    • Identified risks and their mitigations (if available)

  • Outputs

    • Suggested test strategies

    • For each test strategy: Prioritized list of test methods, test architecture (including design of the test environment), test plan, confidence level

  • Needed explanations

    • Why was the test strategy predicted? Identify the main drivers, such as the application domain, description of the system under test, system requirements, architecture of ML models, data structure, and repository.

    • Why was the prioritization of test methods predicted? Identify the main drivers, such as risks, driving requirements (non-functional requirements), ML model architecture, data structure, and repository.

    • Why was the test architecture predicted? Identify the main drivers, such as available and unavailable hardware, available and unavailable software, ML model architecture, data structure, and repository.

    • Why was the test plan predicted? Identify the main drivers, such as compliance with test constraints.

Inputs, outputs, and needed explanations for user task UT 2:

  • Inputs

    • Description of system under test, system requirements, system architecture, and design

    • ML-model architecture, data structure, and repository with metrics

    • Selected test strategy, including methods, architecture, design of test environment, and test plan

  • Outputs

    • Generated test cases

    • For each test case: selected input data, test metrics with actual measurements, confidence level

    • Implemented test environment

  • Needed explanations

    • Why were the test cases generated as predicted? Identify main drivers: system architecture, design, ML-model architecture, data structure, and repository.

    • Why are the generated test cases appropriate? Identify test coverage based on system requirements and architecture. Show value, e.g., error detection contributions, and ensure non-overlapping.

    • How long does it take to configure and execute each test case? Time for configuration and execution, considering system and test architecture.

    • What is the cost/benefit ratio per test case? Identify value vs. testing time, including configuration and execution.

    • Why was the test environment implemented this way? Identify connection between test architecture and implemented environment, and mapping of design patterns.

Inputs, outputs, and needed explanations for user task UT 3:

  • Inputs

    • Description of system under test

    • Generated test cases

    • For each test case: selected input data, test metrics with actual measurements

    • Implemented test environment

  • Outputs

    • Configured test environment

    • Determined test results, trace(s) to system requirement(s), confidence level

    • Bug report (for detected bugs)

    • Test report with test KPIs

    • Responsive actions for failed test results

  • Needed explanations

    • Why was the test environment configured as it was? Identify the connection between configuration parameters and test environment design. Show how parameters were derived from generated test cases.

    • How were the test results predicted? Demonstrate the link between test inputs and results. Identify the correlation between test results and model monitoring outcomes.

    • How were the test KPIs forecasted? Identify the relationship between individual KPIs and test results.

1.2 8.2Interview protocol

  • Step 1: Research introduction

  • Step 2: Job experience

  • Step 3: Introduction into the hypothetical application “Intelligent system testing application for ML-based applications”, that is capable of planning and executing an ML-based system test, including core input and outputs (see Fig. 1)

  • Evaluating the assumed mental model representation for each component separately:

    • Question 1: Which explanation content is missing and why? (addressing research question 1, RQ 1)

    • Question 2: Which proposed explanation content is not necessary and why? (RQ 1)

    • Question 3: How should the explanation content be rearranged and why? (RQ 1)

    • Question 4: Please rate the following statement: “The explanation content help the system tester to understand why the test strategy / test cases / test results was determined.” (5-point Likert scale) (RQ 1)

    • Question 5: Why did you select the rating? (RQ 1)

    • Question 6: Which explanation content helps the system tester to understand the determined test strategy / test cases / test report? (RQ 2)

    • Question 7: Which explanation content helps the system tester to understand why the test strategy / test cases / test results was determined? (RQ 3)

    • Question 8: Which explanation content helps the system tester to evaluate the validity of the determined test strategy / test cases / test results? (RQ 4)

    • Question 9: Which explanation content helps the system tester to predict how effective the selected test strategy / test cases / responsive actions will be? (RQ 5)

    • Question 10: Which explanation content helps the system tester to trust the determined test strategy / test cases / test results? (RQ 6)

  • Question 11: For understanding the preferred intent of an explanation. Rank the following statements about the intent of explanations (#1 means most important and #5 means least important). (RQ 7)

    • S1: The explanation content should help the system tester to understand the determined outcome.

    • S2: The explanation content should help the system tester to understand why the outcome was determined.

    • S3: The explanation content should help the system tester to evaluate the validity of the determined outcome.

    • S4: The explanation content should help the system tester to predict the effectiveness of the determined outcome.

    • S5: The explanation content should help the system tester to trust the determined outcome.

  • Question 12: Why did you select the top ranked statement. (RQ 7)

To answer questions 6 through 10, the participants were instructed that they could select none of the explanation content candidates, one, or multiple, including one or multiple groups of explanation content candidates.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Degen, H., Budnik, C. (2024). How to Explain It to System Testers?. In: Degen, H., Ntoa, S. (eds) Artificial Intelligence in HCI. HCII 2024. Lecture Notes in Computer Science(), vol 14734. Springer, Cham. https://doi.org/10.1007/978-3-031-60606-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-60606-9_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-60605-2

  • Online ISBN: 978-3-031-60606-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics