Abstract
In the realm of explainable AI (XAI), limited research exists on user role-specific explanations. This study aims to determine the explanation needs for the user role “system tester of AI-based systems.” It investigates whether established explanation types adequately address the explainability requirements of ML-based application testers. Through a qualitative study (n = 12), we identified the explanation needs for three user tasks: test strategy determination, test case determination, and test result determination. The research yields five findings: F1) proposing a new explanation domain type, “system domain,” F2) proposing a new explanation structure, “hierarchical,” F3) identifying overlapping explanation content between two user groups, F4) considering identified inputs of a user task as explanation content candidates, and F5) highlighting the risk of combining the evaluation of assumed mental model representations with identifying explanation content in one study.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Microsoft and Microsoft Teams are trademarks of the Microsoft group of companies.
- 2.
Microsoft and Microsoft Teams are trademarks of the Microsoft group of companies.
References
Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018). https://doi.org/10.1109/ACCESS.2018.2870052
Andersen, B.S., Fagerhaug, T.: Root Cause Analysis: Simplified Tools and Techniques, 2 edn. ASQ Quality Press (2006). https://asq.org/quality-press/display-item?item=H1287
Andrews, R.W., Lilly, J.M., Divya, S., Feigh, K.M.: The role of shared mental models in human-AI teams: a theoretical review. Theor. Issues Ergon. Sci. 24(2), 129–175 (2023). https://doi.org/10.1080/1463922X.2022.2061080
Arya, V., et al.: One explanation does not fit all: a toolkit and taxonomy of AI explainability techniques (2019). https://doi.org/10.48550/arXiv.1909.03012
Barnett, T.O., Constantine, L.L.: Modular Programming: Proceedings of a National Symposium. Information & systems Institute (1968)
Barredo Arrieta, A., et al.: Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inform. Fusion 58, 82–115 (2020). https://doi.org/10.1016/j.inffus.2019.12.012
Basili, V.R.: Software Modeling and Measurement: The Goal/Question/Metric Paradigm (CS-TR-2956, UMIACS-TR-92-96). Technical report, University of Maryland, Institute for Advanced Computer Studies (1992). https://www.cs.umd.edu/~basili/publications/technical/T78.pdf. Accessed 26 Dec. 2023
Bayram, F., Ahmed, B.S., Kassler, A.: From concept drift to model degradation: an overview on performance-aware drift detectors. Knowl.-Based Syst. 245, 108632 (2022). https://doi.org/10.1016/j.knosys.2022.108632, https://www.sciencedirect.com/science/article/pii/S0950705122002854
Borg, M., Aasa, E., Etemadi, K., Monperrus, M.: Human, What Must I Tell You? IEEE Softw. 40(03), 9–14 (2023). https://doi.org/10.1109/MS.2023.3244638
Chazette, L., Brunotte, W., Speith, T.: Explainable software systems: from requirements analysis to system evaluation. Requirements Eng. 27(4), 457–487 (2022). https://doi.org/10.1007/s00766-022-00393-5
Chazette, L., Schneider, K.: Explainability as a non-functional requirement: challenges and recommendations. Requirements Eng. 25(4), 493–514 (2020). https://doi.org/10.1007/s00766-020-00333-1
Clement, T., Kemmerzell, N., Abdelaal, M., Amberg, M.: Xair: a systematic metareview of explainable AI (XAI) aligned to the software development process. Mach. Learn. Knowl. Extract. 5(1), 78–108 (2023). https://doi.org/10.3390/make5010006
Corbin, J., Strauss, A.: Basics of Qualitative Research (3rd ed.): Techniques and Procedures for Developing Grounded Theory, 3 edn.. SAGE Publications, Thousand Oaks (2008). https://doi.org/10.4135/9781452230153
Creswell, J.S., David, C.J.: Research Design. Qualitative, Quantitative, and Mixed Method Approaches, 5th edn. SAGE Publications, Los Angeles (2018)
Degen, H.: Respect The User’s Time: Experience Architecture and Design for Efficiency, 1st edn. Helmut Degen, Plainsboro (2022). https://www.designforefficiency.com
Degen, H., Budnik, C., Gross, R., Rothering, M.: How to explain it to a model manager? A qualitative user study about understandability, trustworthiness, actionability, and action efficacy. In: HCII 2023, Part I. LNCS, pp. 209–242. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-35891-3_14
Furniss, D., Blandford, A., Curzon, P.: Confessions from a grounded theory PhD: experiences and lessons learnt. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. CHI 2011, New York, NY, USA, pp. 113–122. Association for Computing Machinery (2011). https://doi.org/10.1145/1978942.1978960
Gentner, D.: Mental models, psychology of. In: Smelser, N.J., Baltes, P.B. (eds.) International Encyclopedia of the Social & Behavioral Sciences, pp. 9683–9687. Pergamon, Oxford (2001). https://doi.org/10.1016/B0-08-043076-7/01487-X, https://www.sciencedirect.com/science/article/pii/B008043076701487X
Glaser, B.G., Strauss, A.L.: The Discovery of Grounded Theory: Strategies for Qualitative Research. Aldine de Gruyter, New York (1967)
Guest, G., Bunce, A., Johnson, L.: How many interviews are enough?: An experiment with data saturation and variability. Field Methods 18(1), 59–82 (2006). https://doi.org/10.1177/1525822X05279903
Gunning, D., Aha, D.: DARPA’s explainable artificial intelligence (XAI) program. AI Magazine 40(2), 44–58 (2019). https://doi.org/10.1609/aimag.v40i2.2850, https://ojs.aaai.org/index.php/aimagazine/article/view/2850
Hennink, M., Kaiser, B.N.: Sample sizes for saturation in qualitative research: a systematic review of empirical tests. Soc. Sci. Med. 292, 114523 (2022). https://doi.org/10.1016/j.socscimed.2021.114523
Hoffman, R.R., Miller, T., Mueller, S.T., Klein, G., Clancey, W.J.: Explaining explanation, Part 4: a deep dive on deep nets. IEEE Intell. Syst. 33(03), 87–95 (2018). https://doi.org/10.1109/MIS.2018.033001421
Hoffman, R.R., Mueller, S.T., Klein, G., Litman, J.: Metrics for explainable AI: Challenges and prospects (2019). https://doi.org/10.48550/arXiv.1812.04608
Langer, M., et al.: What do we want from Explainable Artificial Intelligence (XAI)? - A stakeholder perspective on XAI and a conceptual model guiding interdisciplinary XAI research. Artif. Intell. 296, 103473 (2021). https://doi.org/10.1016/j.artint.2021.103473
Miller, T.: Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2019). https://doi.org/10.1016/j.artint.2018.07.007
Mohseni, S., Zarei, N., Ragan, E.D.: A multidisciplinary survey and framework for design and evaluation of explainable AI systems. ACM Trans. Interact. Intell. Syst. 11(3-4) (2021). https://doi.org/10.1145/3387166
Nishi, Y., Masuda, S., Ogawa, H., Uetsuki, K.: A test architecture for machine learning product. In: 2018 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), pp. 273–278 (2018). https://doi.org/10.1109/ICSTW.2018.00060
Parnas, D.L.: On the criteria to be used in decomposing systems into modules. Commun. ACM 15(12), 1053–1058 (1972). https://doi.org/10.1145/361598.361623
Piano, L., Garcea, F., Gatteschi, V., Lamberti, F., Morra, L.: Detecting drift in deep learning: a methodology primer. IT Professional 24(5), 53–60 (2022). https://doi.org/10.1109/MITP.2022.3191318
van de Poel, I.: The relation between forward-looking and backward-looking responsibility. In: Vincent, N.A., van de Poel, I., van den Hoven, J. (eds.) Moral Responsibility: Beyond Free Will and Determinism, pp. 37–52. Springer, Dordrecht (2011). https://doi.org/10.1007/978-94-007-1878-4_3
Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1(5), 206–215 (2019). https://doi.org/10.1038/s42256-019-0048-x
Saraf, A.P., Chan, K., Popish, M., Browder, J., Schade, J.: Explainable artificial intelligence for aviation safety applications. In: AIAA AVIATION 2020 FORUM (2020). https://doi.org/10.2514/6.2020-2881, https://arc.aiaa.org/doi/abs/10.2514/6.2020-2881
Schröder, T., Schulz, M.: Monitoring machine learning models: a categorization of challenges and methods. Data Sci. Manag. 5(3), 105–116 (2022). https://doi.org/10.1016/j.dsm.2022.07.004
Stepin, I., Alonso, J.M., Catala, A., Pereira-Fariña, M.: A survey of contrastive and counterfactual explanation generation methods for explainable artificial intelligence. IEEE Access 9, 11974–12001 (2021). https://doi.org/10.1109/ACCESS.2021.3051315
Sun, J., et al.: Investigating explainability of generative AI for code through scenario-based design. In: 27th International Conference on Intelligent User Interfaces. IUI ’22, New York, NY, USA, pp. 212–228. Association for Computing Machinery (2022). https://doi.org/10.1145/3490099.3511119
Tantithamthavorn, C., Cito, J., Hemmati, H., Chandra, S.: Explainable AI for SE: challenges and future directions. IEEE Softw. 40(03), 29–33 (2023). https://doi.org/10.1109/MS.2023.3246686
Triantafyllou, S.: Forward-looking and backward-looking responsibility attribution in multi-agent sequential decision making. In: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems. AAMAS ’23, Richland, SC, pp. 2952–2954. International Foundation for Autonomous Agents and Multiagent Systems (2023)
Turek, M.: Explainable Artificial Intelligence (XAI) (Aug 2016). https://www.darpa.mil/program/explainable-artificial-intelligence. Accessed 3 Mar 2020
van der Waa, J., Schoonderwoerd, T., van Diggelen, J., Neerincx, M.: Interpretable confidence measures for decision support systems. Int. J. Hum.-Comput. Stud. 144, 102493 (2020). https://doi.org/10.1016/j.ijhcs.2020.102493
Zhang, Y., Liao, Q.V., Bellamy, R.K.E.: Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. FAT* ’20. ACM (2020). https://doi.org/10.1145/3351095.3372852
Acknowledgment
The authors thank Parinitha Nagaraja for evaluating the initial mental model of explanations, and all study participants for their time and shared insights.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
8Appendix
8Appendix
1.1 8.1User task specific explanation needs
Inputs, outputs, and needed explanations for user task UT 1:
-
Inputs
-
Application domain
-
Description of the system under test and system requirements
-
Architecture of ML models, data structure, and repository with metrics (e.g., quantity and types of available data)
-
Test constraints (e.g., cost, resources, quality, available hardware, available software, etc.)
-
Identified risks and their mitigations (if available)
-
-
Outputs
-
Suggested test strategies
-
For each test strategy: Prioritized list of test methods, test architecture (including design of the test environment), test plan, confidence level
-
-
Needed explanations
-
Why was the test strategy predicted? Identify the main drivers, such as the application domain, description of the system under test, system requirements, architecture of ML models, data structure, and repository.
-
Why was the prioritization of test methods predicted? Identify the main drivers, such as risks, driving requirements (non-functional requirements), ML model architecture, data structure, and repository.
-
Why was the test architecture predicted? Identify the main drivers, such as available and unavailable hardware, available and unavailable software, ML model architecture, data structure, and repository.
-
Why was the test plan predicted? Identify the main drivers, such as compliance with test constraints.
-
Inputs, outputs, and needed explanations for user task UT 2:
-
Inputs
-
Description of system under test, system requirements, system architecture, and design
-
ML-model architecture, data structure, and repository with metrics
-
Selected test strategy, including methods, architecture, design of test environment, and test plan
-
-
Outputs
-
Generated test cases
-
For each test case: selected input data, test metrics with actual measurements, confidence level
-
Implemented test environment
-
-
Needed explanations
-
Why were the test cases generated as predicted? Identify main drivers: system architecture, design, ML-model architecture, data structure, and repository.
-
Why are the generated test cases appropriate? Identify test coverage based on system requirements and architecture. Show value, e.g., error detection contributions, and ensure non-overlapping.
-
How long does it take to configure and execute each test case? Time for configuration and execution, considering system and test architecture.
-
What is the cost/benefit ratio per test case? Identify value vs. testing time, including configuration and execution.
-
Why was the test environment implemented this way? Identify connection between test architecture and implemented environment, and mapping of design patterns.
-
Inputs, outputs, and needed explanations for user task UT 3:
-
Inputs
-
Description of system under test
-
Generated test cases
-
For each test case: selected input data, test metrics with actual measurements
-
Implemented test environment
-
-
Outputs
-
Configured test environment
-
Determined test results, trace(s) to system requirement(s), confidence level
-
Bug report (for detected bugs)
-
Test report with test KPIs
-
Responsive actions for failed test results
-
-
Needed explanations
-
Why was the test environment configured as it was? Identify the connection between configuration parameters and test environment design. Show how parameters were derived from generated test cases.
-
How were the test results predicted? Demonstrate the link between test inputs and results. Identify the correlation between test results and model monitoring outcomes.
-
How were the test KPIs forecasted? Identify the relationship between individual KPIs and test results.
-
1.2 8.2Interview protocol
-
Step 1: Research introduction
-
Step 2: Job experience
-
Step 3: Introduction into the hypothetical application “Intelligent system testing application for ML-based applications”, that is capable of planning and executing an ML-based system test, including core input and outputs (see Fig. 1)
-
Evaluating the assumed mental model representation for each component separately:
-
Question 1: Which explanation content is missing and why? (addressing research question 1, RQ 1)
-
Question 2: Which proposed explanation content is not necessary and why? (RQ 1)
-
Question 3: How should the explanation content be rearranged and why? (RQ 1)
-
Question 4: Please rate the following statement: “The explanation content help the system tester to understand why the test strategy / test cases / test results was determined.” (5-point Likert scale) (RQ 1)
-
Question 5: Why did you select the rating? (RQ 1)
-
Question 6: Which explanation content helps the system tester to understand the determined test strategy / test cases / test report? (RQ 2)
-
Question 7: Which explanation content helps the system tester to understand why the test strategy / test cases / test results was determined? (RQ 3)
-
Question 8: Which explanation content helps the system tester to evaluate the validity of the determined test strategy / test cases / test results? (RQ 4)
-
Question 9: Which explanation content helps the system tester to predict how effective the selected test strategy / test cases / responsive actions will be? (RQ 5)
-
Question 10: Which explanation content helps the system tester to trust the determined test strategy / test cases / test results? (RQ 6)
-
-
Question 11: For understanding the preferred intent of an explanation. Rank the following statements about the intent of explanations (#1 means most important and #5 means least important). (RQ 7)
-
S1: The explanation content should help the system tester to understand the determined outcome.
-
S2: The explanation content should help the system tester to understand why the outcome was determined.
-
S3: The explanation content should help the system tester to evaluate the validity of the determined outcome.
-
S4: The explanation content should help the system tester to predict the effectiveness of the determined outcome.
-
S5: The explanation content should help the system tester to trust the determined outcome.
-
-
Question 12: Why did you select the top ranked statement. (RQ 7)
To answer questions 6 through 10, the participants were instructed that they could select none of the explanation content candidates, one, or multiple, including one or multiple groups of explanation content candidates.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Degen, H., Budnik, C. (2024). How to Explain It to System Testers?. In: Degen, H., Ntoa, S. (eds) Artificial Intelligence in HCI. HCII 2024. Lecture Notes in Computer Science(), vol 14734. Springer, Cham. https://doi.org/10.1007/978-3-031-60606-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-60606-9_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-60605-2
Online ISBN: 978-3-031-60606-9
eBook Packages: Computer ScienceComputer Science (R0)