How to Explain It to System Testers?

Degen, Helmut; Budnik, Christof

doi:10.1007/978-3-031-60606-9_10

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14734))

Included in the following conference series:

International Conference on Human-Computer Interaction

997 Accesses

Abstract

In the realm of explainable AI (XAI), limited research exists on user role-specific explanations. This study aims to determine the explanation needs for the user role “system tester of AI-based systems.” It investigates whether established explanation types adequately address the explainability requirements of ML-based application testers. Through a qualitative study (n = 12), we identified the explanation needs for three user tasks: test strategy determination, test case determination, and test result determination. The research yields five findings: F1) proposing a new explanation domain type, “system domain,” F2) proposing a new explanation structure, “hierarchical,” F3) identifying overlapping explanation content between two user groups, F4) considering identified inputs of a user task as explanation content candidates, and F5) highlighting the risk of combining the evaluation of assumed mental model representations with identifying explanation content in one study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Explainability Metrics and Properties for Counterfactual Explanation Methods

Increasing the Value of XAI for Users: A Psychological Perspective

Article Open access 17 July 2023

What Does It Mean to Explain? A User-Centered Study on AI Explainability

Notes

1.
Microsoft and Microsoft Teams are trademarks of the Microsoft group of companies.
2.
Microsoft and Microsoft Teams are trademarks of the Microsoft group of companies.

References

Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018). https://doi.org/10.1109/ACCESS.2018.2870052
Article Google Scholar
Andersen, B.S., Fagerhaug, T.: Root Cause Analysis: Simplified Tools and Techniques, 2 edn. ASQ Quality Press (2006). https://asq.org/quality-press/display-item?item=H1287
Andrews, R.W., Lilly, J.M., Divya, S., Feigh, K.M.: The role of shared mental models in human-AI teams: a theoretical review. Theor. Issues Ergon. Sci. 24(2), 129–175 (2023). https://doi.org/10.1080/1463922X.2022.2061080
Article Google Scholar
Arya, V., et al.: One explanation does not fit all: a toolkit and taxonomy of AI explainability techniques (2019). https://doi.org/10.48550/arXiv.1909.03012
Barnett, T.O., Constantine, L.L.: Modular Programming: Proceedings of a National Symposium. Information & systems Institute (1968)
Google Scholar
Barredo Arrieta, A., et al.: Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inform. Fusion 58, 82–115 (2020). https://doi.org/10.1016/j.inffus.2019.12.012
Article Google Scholar
Basili, V.R.: Software Modeling and Measurement: The Goal/Question/Metric Paradigm (CS-TR-2956, UMIACS-TR-92-96). Technical report, University of Maryland, Institute for Advanced Computer Studies (1992). https://www.cs.umd.edu/~basili/publications/technical/T78.pdf. Accessed 26 Dec. 2023
Bayram, F., Ahmed, B.S., Kassler, A.: From concept drift to model degradation: an overview on performance-aware drift detectors. Knowl.-Based Syst. 245, 108632 (2022). https://doi.org/10.1016/j.knosys.2022.108632, https://www.sciencedirect.com/science/article/pii/S0950705122002854
Borg, M., Aasa, E., Etemadi, K., Monperrus, M.: Human, What Must I Tell You? IEEE Softw. 40(03), 9–14 (2023). https://doi.org/10.1109/MS.2023.3244638
Article Google Scholar
Chazette, L., Brunotte, W., Speith, T.: Explainable software systems: from requirements analysis to system evaluation. Requirements Eng. 27(4), 457–487 (2022). https://doi.org/10.1007/s00766-022-00393-5
Article Google Scholar
Chazette, L., Schneider, K.: Explainability as a non-functional requirement: challenges and recommendations. Requirements Eng. 25(4), 493–514 (2020). https://doi.org/10.1007/s00766-020-00333-1
Article Google Scholar
Clement, T., Kemmerzell, N., Abdelaal, M., Amberg, M.: Xair: a systematic metareview of explainable AI (XAI) aligned to the software development process. Mach. Learn. Knowl. Extract. 5(1), 78–108 (2023). https://doi.org/10.3390/make5010006
Article Google Scholar
Corbin, J., Strauss, A.: Basics of Qualitative Research (3rd ed.): Techniques and Procedures for Developing Grounded Theory, 3 edn.. SAGE Publications, Thousand Oaks (2008). https://doi.org/10.4135/9781452230153
Creswell, J.S., David, C.J.: Research Design. Qualitative, Quantitative, and Mixed Method Approaches, 5th edn. SAGE Publications, Los Angeles (2018)
Google Scholar
Degen, H.: Respect The User’s Time: Experience Architecture and Design for Efficiency, 1st edn. Helmut Degen, Plainsboro (2022). https://www.designforefficiency.com
Degen, H., Budnik, C., Gross, R., Rothering, M.: How to explain it to a model manager? A qualitative user study about understandability, trustworthiness, actionability, and action efficacy. In: HCII 2023, Part I. LNCS, pp. 209–242. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-35891-3_14
Furniss, D., Blandford, A., Curzon, P.: Confessions from a grounded theory PhD: experiences and lessons learnt. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. CHI 2011, New York, NY, USA, pp. 113–122. Association for Computing Machinery (2011). https://doi.org/10.1145/1978942.1978960
Gentner, D.: Mental models, psychology of. In: Smelser, N.J., Baltes, P.B. (eds.) International Encyclopedia of the Social & Behavioral Sciences, pp. 9683–9687. Pergamon, Oxford (2001). https://doi.org/10.1016/B0-08-043076-7/01487-X, https://www.sciencedirect.com/science/article/pii/B008043076701487X
Glaser, B.G., Strauss, A.L.: The Discovery of Grounded Theory: Strategies for Qualitative Research. Aldine de Gruyter, New York (1967)
Google Scholar
Guest, G., Bunce, A., Johnson, L.: How many interviews are enough?: An experiment with data saturation and variability. Field Methods 18(1), 59–82 (2006). https://doi.org/10.1177/1525822X05279903
Article Google Scholar
Gunning, D., Aha, D.: DARPA’s explainable artificial intelligence (XAI) program. AI Magazine 40(2), 44–58 (2019). https://doi.org/10.1609/aimag.v40i2.2850, https://ojs.aaai.org/index.php/aimagazine/article/view/2850
Hennink, M., Kaiser, B.N.: Sample sizes for saturation in qualitative research: a systematic review of empirical tests. Soc. Sci. Med. 292, 114523 (2022). https://doi.org/10.1016/j.socscimed.2021.114523
Article Google Scholar
Hoffman, R.R., Miller, T., Mueller, S.T., Klein, G., Clancey, W.J.: Explaining explanation, Part 4: a deep dive on deep nets. IEEE Intell. Syst. 33(03), 87–95 (2018). https://doi.org/10.1109/MIS.2018.033001421
Article Google Scholar
Hoffman, R.R., Mueller, S.T., Klein, G., Litman, J.: Metrics for explainable AI: Challenges and prospects (2019). https://doi.org/10.48550/arXiv.1812.04608
Langer, M., et al.: What do we want from Explainable Artificial Intelligence (XAI)? - A stakeholder perspective on XAI and a conceptual model guiding interdisciplinary XAI research. Artif. Intell. 296, 103473 (2021). https://doi.org/10.1016/j.artint.2021.103473
Miller, T.: Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2019). https://doi.org/10.1016/j.artint.2018.07.007
Article MathSciNet Google Scholar
Mohseni, S., Zarei, N., Ragan, E.D.: A multidisciplinary survey and framework for design and evaluation of explainable AI systems. ACM Trans. Interact. Intell. Syst. 11(3-4) (2021). https://doi.org/10.1145/3387166
Nishi, Y., Masuda, S., Ogawa, H., Uetsuki, K.: A test architecture for machine learning product. In: 2018 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), pp. 273–278 (2018). https://doi.org/10.1109/ICSTW.2018.00060
Parnas, D.L.: On the criteria to be used in decomposing systems into modules. Commun. ACM 15(12), 1053–1058 (1972). https://doi.org/10.1145/361598.361623
Article Google Scholar
Piano, L., Garcea, F., Gatteschi, V., Lamberti, F., Morra, L.: Detecting drift in deep learning: a methodology primer. IT Professional 24(5), 53–60 (2022). https://doi.org/10.1109/MITP.2022.3191318
Article Google Scholar
van de Poel, I.: The relation between forward-looking and backward-looking responsibility. In: Vincent, N.A., van de Poel, I., van den Hoven, J. (eds.) Moral Responsibility: Beyond Free Will and Determinism, pp. 37–52. Springer, Dordrecht (2011). https://doi.org/10.1007/978-94-007-1878-4_3
Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1(5), 206–215 (2019). https://doi.org/10.1038/s42256-019-0048-x
Article Google Scholar
Saraf, A.P., Chan, K., Popish, M., Browder, J., Schade, J.: Explainable artificial intelligence for aviation safety applications. In: AIAA AVIATION 2020 FORUM (2020). https://doi.org/10.2514/6.2020-2881, https://arc.aiaa.org/doi/abs/10.2514/6.2020-2881
Schröder, T., Schulz, M.: Monitoring machine learning models: a categorization of challenges and methods. Data Sci. Manag. 5(3), 105–116 (2022). https://doi.org/10.1016/j.dsm.2022.07.004
Stepin, I., Alonso, J.M., Catala, A., Pereira-Fariña, M.: A survey of contrastive and counterfactual explanation generation methods for explainable artificial intelligence. IEEE Access 9, 11974–12001 (2021). https://doi.org/10.1109/ACCESS.2021.3051315
Article Google Scholar
Sun, J., et al.: Investigating explainability of generative AI for code through scenario-based design. In: 27th International Conference on Intelligent User Interfaces. IUI ’22, New York, NY, USA, pp. 212–228. Association for Computing Machinery (2022). https://doi.org/10.1145/3490099.3511119
Tantithamthavorn, C., Cito, J., Hemmati, H., Chandra, S.: Explainable AI for SE: challenges and future directions. IEEE Softw. 40(03), 29–33 (2023). https://doi.org/10.1109/MS.2023.3246686
Article Google Scholar
Triantafyllou, S.: Forward-looking and backward-looking responsibility attribution in multi-agent sequential decision making. In: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems. AAMAS ’23, Richland, SC, pp. 2952–2954. International Foundation for Autonomous Agents and Multiagent Systems (2023)
Google Scholar
Turek, M.: Explainable Artificial Intelligence (XAI) (Aug 2016). https://www.darpa.mil/program/explainable-artificial-intelligence. Accessed 3 Mar 2020
van der Waa, J., Schoonderwoerd, T., van Diggelen, J., Neerincx, M.: Interpretable confidence measures for decision support systems. Int. J. Hum.-Comput. Stud. 144, 102493 (2020). https://doi.org/10.1016/j.ijhcs.2020.102493
Article Google Scholar
Zhang, Y., Liao, Q.V., Bellamy, R.K.E.: Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. FAT* ’20. ACM (2020). https://doi.org/10.1145/3351095.3372852

Download references

Acknowledgment

The authors thank Parinitha Nagaraja for evaluating the initial mental model of explanations, and all study participants for their time and shared insights.

Author information

Authors and Affiliations

Siemens Technology, 755 College Road East, Princeton, NJ, 08540, USA
Helmut Degen & Christof Budnik

Authors

Helmut Degen
View author publications
You can also search for this author in PubMed Google Scholar
Christof Budnik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Helmut Degen .

Editor information

Editors and Affiliations

Siemens Corporation, Princeton, NJ, USA
Helmut Degen
Foundation for Research and Technology – Hellas (FORTH), Heraklion, Crete, Greece
Stavroula Ntoa

8Appendix

1.1 8.1User task specific explanation needs

Inputs, outputs, and needed explanations for user task UT 1:

Inputs
- Application domain
- Description of the system under test and system requirements
- Architecture of ML models, data structure, and repository with metrics (e.g., quantity and types of available data)
- Test constraints (e.g., cost, resources, quality, available hardware, available software, etc.)
- Identified risks and their mitigations (if available)
Outputs
- Suggested test strategies
- For each test strategy: Prioritized list of test methods, test architecture (including design of the test environment), test plan, confidence level
Needed explanations
- Why was the test strategy predicted? Identify the main drivers, such as the application domain, description of the system under test, system requirements, architecture of ML models, data structure, and repository.
- Why was the prioritization of test methods predicted? Identify the main drivers, such as risks, driving requirements (non-functional requirements), ML model architecture, data structure, and repository.
- Why was the test architecture predicted? Identify the main drivers, such as available and unavailable hardware, available and unavailable software, ML model architecture, data structure, and repository.
- Why was the test plan predicted? Identify the main drivers, such as compliance with test constraints.

Inputs, outputs, and needed explanations for user task UT 2:

Inputs
- Description of system under test, system requirements, system architecture, and design
- ML-model architecture, data structure, and repository with metrics
- Selected test strategy, including methods, architecture, design of test environment, and test plan
Outputs
- Generated test cases
- For each test case: selected input data, test metrics with actual measurements, confidence level
- Implemented test environment
Needed explanations
- Why were the test cases generated as predicted? Identify main drivers: system architecture, design, ML-model architecture, data structure, and repository.
- Why are the generated test cases appropriate? Identify test coverage based on system requirements and architecture. Show value, e.g., error detection contributions, and ensure non-overlapping.
- How long does it take to configure and execute each test case? Time for configuration and execution, considering system and test architecture.
- What is the cost/benefit ratio per test case? Identify value vs. testing time, including configuration and execution.
- Why was the test environment implemented this way? Identify connection between test architecture and implemented environment, and mapping of design patterns.

Inputs, outputs, and needed explanations for user task UT 3:

Inputs
- Description of system under test
- Generated test cases
- For each test case: selected input data, test metrics with actual measurements
- Implemented test environment
Outputs
- Configured test environment
- Determined test results, trace(s) to system requirement(s), confidence level
- Bug report (for detected bugs)
- Test report with test KPIs
- Responsive actions for failed test results
Needed explanations
- Why was the test environment configured as it was? Identify the connection between configuration parameters and test environment design. Show how parameters were derived from generated test cases.
- How were the test results predicted? Demonstrate the link between test inputs and results. Identify the correlation between test results and model monitoring outcomes.
- How were the test KPIs forecasted? Identify the relationship between individual KPIs and test results.

1.2 8.2Interview protocol

Step 1: Research introduction
Step 2: Job experience
Step 3: Introduction into the hypothetical application “Intelligent system testing application for ML-based applications”, that is capable of planning and executing an ML-based system test, including core input and outputs (see Fig. 1)
Evaluating the assumed mental model representation for each component separately:
- Question 1: Which explanation content is missing and why? (addressing research question 1, RQ 1)
- Question 2: Which proposed explanation content is not necessary and why? (RQ 1)
- Question 3: How should the explanation content be rearranged and why? (RQ 1)
- Question 4: Please rate the following statement: “The explanation content help the system tester to understand why the test strategy / test cases / test results was determined.” (5-point Likert scale) (RQ 1)
- Question 5: Why did you select the rating? (RQ 1)
- Question 6: Which explanation content helps the system tester to understand the determined test strategy / test cases / test report? (RQ 2)
- Question 7: Which explanation content helps the system tester to understand why the test strategy / test cases / test results was determined? (RQ 3)
- Question 8: Which explanation content helps the system tester to evaluate the validity of the determined test strategy / test cases / test results? (RQ 4)
- Question 9: Which explanation content helps the system tester to predict how effective the selected test strategy / test cases / responsive actions will be? (RQ 5)
- Question 10: Which explanation content helps the system tester to trust the determined test strategy / test cases / test results? (RQ 6)
Question 11: For understanding the preferred intent of an explanation. Rank the following statements about the intent of explanations (#1 means most important and #5 means least important). (RQ 7)
- S1: The explanation content should help the system tester to understand the determined outcome.
- S2: The explanation content should help the system tester to understand why the outcome was determined.
- S3: The explanation content should help the system tester to evaluate the validity of the determined outcome.
- S4: The explanation content should help the system tester to predict the effectiveness of the determined outcome.
- S5: The explanation content should help the system tester to trust the determined outcome.
Question 12: Why did you select the top ranked statement. (RQ 7)

To answer questions 6 through 10, the participants were instructed that they could select none of the explanation content candidates, one, or multiple, including one or multiple groups of explanation content candidates.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Degen, H., Budnik, C. (2024). How to Explain It to System Testers?. In: Degen, H., Ntoa, S. (eds) Artificial Intelligence in HCI. HCII 2024. Lecture Notes in Computer Science(), vol 14734. Springer, Cham. https://doi.org/10.1007/978-3-031-60606-9_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-60606-9_10
Published: 01 June 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-60605-2
Online ISBN: 978-3-031-60606-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

How to Explain It to System Testers?

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Explainability Metrics and Properties for Counterfactual Explanation Methods

Increasing the Value of XAI for Users: A Psychological Perspective

What Does It Mean to Explain? A User-Centered Study on AI Explainability

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

8Appendix

8Appendix

1.1 8.1User task specific explanation needs

1.2 8.2Interview protocol

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us