Skip to main content

How to Explain It to a Model Manager?

A Qualitative User Study About Understandability, Trustworthiness, Actionability, and Action Efficacy

  • Conference paper
  • First Online:
Artificial Intelligence in HCI (HCII 2023)

Abstract

In the context of explainable AI (XAI), little research has been done to show how user role specific explanations look like. This research aims to find out the explanation needs for a user role called “model manager”, a user monitoring multiple AI-based systems for quality assurance in manufacturing. The question this research attempts to answer is what are the explainability needs of the model manager. By using a design analysis technique (task questions), a concept (UI mockup) was created in a controlled way. Additionally, a causal chain model was created and used as an assumed representation of the mental model for explanations. Furthermore, several options of confidence levels were explored. In a qualitative user study (cognitive walkthrough) with ten participants, it was investigated which explanations are needed to support understandability, trustworthiness, and actionability. The research concludes four findings: F1) A mental model for explanations is an effective way to identify uncertainty addressing explanation content that addresses target user group specific needs. F2) “AI domain” and “application domain” explanations are identified as new explanation categories. F3) “show your work” and “singular” explanations are identified as new explanation categories. F4) “actionability” is identified as a new explanation quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018). https://doi.org/10.1109/ACCESS.2018.2870052

    Article  Google Scholar 

  2. Alipour, K., Ray, A., Lin, X., Cogswell, M., Schulze, J.P., Yao, Y., Burachas, G.T.: Improving users’ mental model with attention-directed counterfactual edits (2021). https://doi.org/10.48550/ARXIV.2110.06863

  3. Andersen, B.S., Fagerhaug, T.: Root cause analysis: simplified tools and techniques. ASQ Quality Press, 2 edn. (2006). https://asq.org/quality-press/display-item?item=H1287

  4. Andrews, R.W., Lilly, J.M., Divya, S., Feigh, K.M.: The role of shared mental models in human-AI teams: a theoretical review. Theoret. Issues Ergon. Sci. 24(2), 1–47 (2022). https://doi.org/10.1080/1463922X.2022.2061080

  5. Angelov, P.P., Soares, E.A., Jiang, R., Arnold, N.I., Atkinson, P.M.: Explainable artificial intelligence: an analytical review. WIREs Data Min. Knowl. Discov. 11(5), e1424 (2021). https://doi.org/10.1002/widm.1424

  6. Antona, M., Margetis, G., Ntoa, S., Degen, H.: Special Issue on AI in HCI. Int. J. Hum.-Comput. Interact. 39(9), 1–4 (2023). https://doi.org/10.1080/10447318.2023.2177421

  7. Barredo Arrieta, A., et al.: Explainable artificial intelligence (xai): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fus. 58, 82–115 (2020). https://doi.org/10.1016/j.inffus.2019.12.012. http://www.sciencedirect.com/science/article/pii/S1566253519308103

  8. Carletti, M., Masiero, C., Beghi, A., Susto, G.A.: Explainable machine learning in industry 4.0: evaluating feature importance in anomaly detection to enable root cause analysis. In: 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), pp. 21–26 (2019). https://doi.org/10.1109/SMC.2019.8913901

  9. Carroll, J.M.: Why should humans trust AI? Interactions 29(4), 73–77 (2022). https://doi.org/10.1145/3538392

  10. Chen, H.Y., Lee, C.H.: Vibration signals analysis by explainable artificial intelligence (xai) approach: application on bearing faults diagnosis. IEEE Access 8, 134246–134256 (2020). https://doi.org/10.1109/ACCESS.2020.3006491

    Article  Google Scholar 

  11. Chouhan, S., Wilbik, A., Dijkman, R.: Explanation of anomalies in business process event logs with linguistic summaries. In: 2022 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 1–7 (2022). https://doi.org/10.1109/FUZZ-IEEE55066.2022.9882673

  12. Creswell, J.S., David, C.J.: Research Design. Qualitative, quantitative, and mixed method approaches. SAGE Publications, Los Angeles, CA, USA, 5 edn. (2018)

    Google Scholar 

  13. Degen, H.: Respect the user’s time: experience architecture and design for efficiency. Helmut Degen, Plainsboro, NJ, USA, 1 edn. (2022). https://www.designforefficiency.com

  14. Degen, H., Budnik, C., Conte, G., Lintereur, A., Weber, S.: How to explain it to energy engineers? A qualitative user study about trustworthiness, understandability, and actionability. In: Stephanidis, C., et al. (eds.) HCI International 2022 - Late Breaking Papers: Multimodality, eXtended Reality, and Artificial Intelligence. 24th HCI International Conference, HCII 2022, Virtual Event, 24–01 June 2022, Proceedings, pp. 1–23. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-21707-4_20

  15. Dodge, J., et al.: From “no clear winner” to an effective explainable artificial intelligence process: an empirical journey. Appl. AI Lett. 2(4), e36 (2021). https://doi.org/10.1002/ail2.36. https://onlinelibrary.wiley.com/doi/abs/10.1002/ail2.36

  16. Dodge, J., Penney, S., Hilderbrand, C., Anderson, A., Burnett, M.: How the experts do it: assessing and explaining agent behaviors in real-time strategy games. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, pp. 1–12. CHI 2018, Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3173574.3174136

  17. Došilović, F.K., Brčić, M., Hlupić, N.: Explainable artificial intelligence: a survey. In: 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 0210–0215 (2018). https://doi.org/10.23919/MIPRO.2018.8400040

  18. Galanti, R., Coma-Puig, B., Leoni, M.d., Carmona, J., Navarin, N.: Explainable predictive process monitoring. In: 2020 2nd International Conference on Process Mining (ICPM), pp. 1–8 (2020). https://doi.org/10.1109/ICPM49681.2020.00012

  19. Garibay, O.O., et al.: Six human-centered artificial intelligence grand challenges. Int. J. Hum.–Comput. Inter. 39(3), 391–437 (2023). https://doi.org/10.1080/10447318.2022.2153320

  20. Gentner, D.: Mental models, psychology of. In: Smelser, N.J., Baltes, P.B. (eds.) International Encyclopedia of the Social & Behavioral Sciences, pp. 9683–9687. Pergamon, Oxford (2001). https://doi.org/10.1016/B0-08-043076-7/01487-X. https://www.sciencedirect.com/science/article/pii/B008043076701487X

  21. Granollers, T., Lorés, J.: Incorporation of users in the evaluation of usability by cognitive walkthrough. In: Navarro-Prieto, R., Vidal, J.L. (eds.) HCI related papers of Interacción 2004, pp. 243–255. Springer, Dordrecht (2006). https://doi.org/10.1007/1-4020-4205-1

  22. Guest, G., Bunce, A., Johnson, L.: How many interviews are enough?: An experiment with data saturation and variability. Field Methods 18(1), 59–82 (2006). https://doi.org/10.1177/1525822X05279903

  23. Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A survey of methods for explaining black box models. ACM Comput. Surv. 51(5), 1–42 (2018). https://doi.org/10.1145/3236009

  24. Gunning, D., Aha, D.: DARPA’s Explainable Artificial Intelligence (XAI) Program. AI Magaz. 40(2), 44–58 (2019). https://doi.org/10.1609/aimag.v40i2.2850. https://ojs.aaai.org/index.php/aimagazine/article/view/2850

  25. Hennink, M., Kaiser, B.N.: Sample sizes for saturation in qualitative research: a systematic review of empirical tests. Soc. Sci. Med. 292, 114523 (2022). https://doi.org/10.1016/j.socscimed.2021.114523. https://www.sciencedirect.com/science/article/pii/S0277953621008558

  26. Hoffman, R.R., Miller, T., Mueller, S.T., Klein, G., Clancey, W.J.: Explaining explanation, part 4: a deep dive on deep nets. IEEE Intell. Syst. 33(03), 87–95 (2018). https://doi.org/10.1109/MIS.2018.033001421

    Article  Google Scholar 

  27. Hu, Z.F., Kuflik, T., Mocanu, I.G., Najafian, S., Shulner Tal, A.: Recent studies of XAI - review. In: Adjunct Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization, pp. 421–431. UMAP 2021, Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3450614.3463354

  28. Islam, M.R., Ahmed, M.U., Barua, S., Begum, S.: A systematic review of explainable artificial intelligence in terms of different application domains and tasks. Appl. Sci. 12(3), 1353 (2022). https://doi.org/10.3390/app12031353. https://www.mdpi.com/2076-3417/12/3/1353

  29. ISO 9241–110:2020(E): ergonomics of human-system interaction - Part 110: dialogue principles. Standard, International Organization for Standardization, Geneva, CH (2020). https://www.iso.org/obp/ui/#iso:std:iso:9241:-110:ed-2:v1:en

  30. Johnson-Laird, P.N.: Mental Models: Towards a Cognitive Science of Language, Inference, and Consciousness. Harvard University Press, USA (1986)

    Google Scholar 

  31. Merry, M., Riddle, P., Warren, J.: A mental models approach for defining explainable artificial intelligence. BMC Med. Inf. Dec. Making 21(1), 344 (2021). https://doi.org/10.1186/s12911-021-01703-7

  32. Miller, T.: Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  33. Mohseni, S., Zarei, N., Ragan, E.D.: A multidisciplinary survey and framework for design and evaluation of explainable AI systems. ACM Trans. Interact. Intell. Syst. 11(3–4), 3387166 (2021). https://doi.org/10.1145/3387166

  34. Mueller, S.T., Hoffman, R.R., Clancey, W., Emrey, A., Klein, G.: Explanation in human-AI systems: a literature meta-review, synopsis of key ideas and publications, and bibliography for explainable AI (2019). https://doi.org/10.48550/ARXIV.1902.01876

  35. Panigutti, C., Beretta, A., Giannotti, F., Pedreschi, D.: Understanding the impact of explanations on advice-taking: a user study for AI-based clinical decision support systems. In: CHI Conference on Human Factors in Computing Systems. CHI 2022, Association for Computing Machinery, New York, NY, USA (2022). https://doi.org/10.1145/3491102.3502104

  36. Park, S., Moon, J., Hwang, E.: Explainable anomaly detection for district heating based on shapley additive explanations. In: 2020 International Conference on Data Mining Workshops (ICDMW), pp. 762–765 (2020). https://doi.org/10.1109/ICDMW51313.2020.00111

  37. Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1(5), 206–215 (2019). https://doi.org/10.1038/s42256-019-0048-x

  38. Saraf, A.P., Chan, K., Popish, M., Browder, J., Schade, J.: Explainable artificial intelligence for aviation safety applications. In: AIAA Aviation 2020 Forum (2020). https://doi.org/10.2514/6.2020-2881. https://arc.aiaa.org/doi/abs/10.2514/6.2020-2881

  39. Srinivasan, S., Arjunan, P., Jin, B., Sangiovanni-Vincentelli, A.L., Sultan, Z., Poolla, K.: Explainable AI for chiller fault-detection systems: gaining human trust. Computer 54(10), 60–68 (2021)

    Article  Google Scholar 

  40. Turek, M.: Explainable Artificial Intelligence (XAI) (2016). https://www.darpa.mil/program/explainable-artificial-intelligence. Accessed 3 Mar 2020

  41. Vilone, G., Longo, L.: Explainable artificial intelligence: a systematic review (2020). https://doi.org/10.48550/ARXIV.2006.00093

Download references

Acknowledgment

The authors want to thank Andrea D’zousa for her support designing parts of the concept. The authors also thank Mike Little for fruitful discussions and term recommendations. We thank Tian Eu Lau for contributions to define the Model Manager role. We thank Michael Lebacher and Stefan-Hagen Weber for inspiring discussions throughout the project. Finally, we thank all participants for their interview time and shared insights.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Helmut Degen .

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Screener Questions

In your current job:

  • Do you work with AI solutions? (general fit)

  • Do you aim to optimize the up-time of an AI-solution? (reflection of user goal UG 1)

  • Do you aim to ensure a defined quality goal of an AI solution? (reflection of user goal UG 2)

  • Do you identify quality deviations of an AI solution? (reflection of user task UT 1)

  • Do you troubleshoot quality deviations of an AI solution? (reflection of user task UT 2)

  • Do you check that the troubleshooting of an AI solution was successful? (reflection of user task UT 3)

  • Do you respond to customer requests? (reflection of user task UT 4)

1.2 A.2 Concept

View 1 (Fig. 2 in Appendix) shows a list of automatically reported incidents. Each incident is described with several attributes. One attribute is an incident number to identify an incident. The second attribute is a customer name. We assume that a model manager is responsible for several customers and incidents have been reported from one customer. The third attribute is the host which is a hardware device on which the incident occurred. Each incident has a category. In our examples, we distinguish here between model, hardware, and software incidents. An incident has a date and a time of occurrence. Furthermore, an incident has a criticality level that indicates the severity of an incident. An incident has a status and an owner.

Fig. 2.
figure 2

View 1: Overview about reported incidents

The following views show details of the incident with the incident number IN-0001 (the first one of the incident list). On the left-hand side, key incident attributes are shown, as they have been used in view 1. The incident category has a more refined description. On our case, “hidden feature draft” was added as a detail. The incident has four detailed views, represented by views 2 through view 5. View 2 shows the symptom (see also “symptom” in Fig. 1) and view 3 shows the suggested causes (see also “root cause 1” and “root cause 2” in Fig. 1). View 4 shows suggested actions (see also “suggested action 1” and “suggested action 2” in Fig. 1) and view 5 shows the efficacy of the selected and initiated action.

We now look at each view more in detail. View 2 (see Fig. 3 in Appendix) shows the symptom of the incident. The symptom is from the AI domain. In our example, the f1-score was selected as the metric for the symptoms. Other metrics can be used, too. The detailed view show that the f1-score dropped from 61% to 37% at a certain date and time.

Fig. 3.
figure 3

View 2: Symptom

View 3 (see Fig. 4 in Appendix) shows suggested causes. On the left-hand side, one or several suggested causes are displayed, and they are sorted by a automatically determined confidence level. The suggested cause with the highest confidence level is listed first. The detailed view for a suggested cause shows root cause 1 (application domain) and root cause 2 (AI domain). For the root cause 1, a time series chart is displayed to visualize when the event occurred. In our case, the event was the change of the soldering paste. A mapping of the root cause 1 time series chart (change of soldering paste) and the symptom time series chart (drop of f1-score) is explicitly mapped (see dotted vertical lines, marked as 3.4). The suggested cause (root cause 1, application domain) and the root cause 2 (AI domain) are described.

View 4 (see Fig. 5 in Appendix) shows suggested actions. The suggested actions refer to one of the suggested causes. One element in view 4 shows how the criticality of the incident is determined (see element 4.3). The area with suggested actions (element 4.4) is separated into two parts. The first part shows an immediate action to reduce additional damages from the symptom and the suggested causes. The second group are suggested actions that have the intend to resolve the selected suggested cause. Several suggested actions can be listed and ranked by their confidence level. For each suggested action, the action, the estimated time, the return of investment is shown, including people that need to be involved when a suggested action is selected to be initiated.

Fig. 4.
figure 4

View 3: Symptom and suggested causes

Fig. 5.
figure 5

View 4: Suggested actions

Fig. 6.
figure 6

View 5: Action efficacy

After an initiated action was fully executed, View 5 (see Fig. 6 in Appendix) shows whether the action was effective. Effective means that it has resolved the root cause and that the symptom has disappeared. View 5 is separated into two parts. In the top part (elements 5.3 and 5.4), the efficacy of the initiated action is summarized. The event overview (element 5.5) shows all events along several time series charts with synchronization lines across the charts (element 5.6). The efficacy chart top bar is a summary of the sequence of events that provides a binary visual clue whether the action was successful (green check mark, as shown here) or not (red cross, not shown here). It also shows the key start event (root cause 1) and finish event (last action), the change of the metric values (here f1-score) and a graphical overview about the f1-score across the timeline. The event overview shows for the application domain and the AI domain all events across the timelines and synchronization lines for application events (change of soldering paste) and AI solution events (f1-score dropped, “old” model retired, “new” model launched in shadowing mode).

Fig. 7.
figure 7

View 6: Confidence level, options 1, 2, 3a, 3b, 3c

Fig. 8.
figure 8

View 6: Confidence level, options 4a, 4b, 4c

1.3 A.3 Confidence Level Options

Option 1 does not show a confidence level to the user at all. The system automatically selects the item with the highest confidence level, without further explanations. Option 2 shows several suggested items without a confidence level for the user. The item with the highest confidence level is selected automatically. An additional explanation for each item is not given. Option 3 uses a qualitative confidence level, with three variations. Option 3a shows a qualitative confidence level (e.g., high, medium, low) and a short explanation from the application domain is given. Option 3b shows a qualitative confidence level with a short explanation from the AI domain. Option 3c shows a qualitative confidence level with a short explanation from the application domain as well as from the AI domain (see Fig. 7 in Appendix).

Another option shows a confidence level with a quantitative confidence level, with three variations. Option 4a shows a quantitative confidence level as a percentage value and a checklist from the application domain. The relative number of met checkpoints determines the percentage value. Option 4b shows a quantitative confidence level with a checklist from the AI domain. Option 4c shows a quantitative confidence level with a checklist from the application domain as well as from the AI domain (see Fig. 8 in Appendix).

1.4 A.4 Questions

Topic 1: Understandability

  • Question 1.1: Which element(s) of any views help you best to understand what happened and why? (Answer options: no element, one or multiple elements)

  • Question 1.2: Which view helps you best to understand what happened and why? (Answer option: one selected view (out of five presented views))

  • Question 1.3: Why have you selected this view?

Topic 2: Trustworthiness

  • Question 2.1: Which element(s) of any views help you best to trust the outcome of the application? (Answer options: no element, one or multiple elements)

  • Question 2.2: Which view helps you best to trust the outcome of the application? (Answer option: one selected view (out of five presented views))

  • Question 2.3: Why have you selected this view?

Topic 3: Initiate Action

  • Question 3.1: Which element(s) of any view help you best to initiate a responsive action? (Answer options: no element, one or multiple elements)

  • Question 3.2: Which view helps you best to initiate a responsive action? (Answer option: one selected view (out of five presented views))

  • Question 3.3: Why have you selected this view?

Topic 4: Check Efficacy of Initiated Action

  • Question 3.1: Which element(s) of any view help you best to check the effectiveness of the initiated action? (Answer options: no element, one or multiple elements)

  • Question 3.2: Which view helps you best to check the effectiveness of the initiated action? (Answer option: one selected view (out of five presented views))

  • Question 3.3: Why have you selected this view?

Topic 5: Mapping of Application Domain Explanations to AI Domain Explanations

  • Question 5.1: Rate the following statement: The mapping of the AI solution information to the application domain information is important for me to understand what happened and why. (Answer options: 5-point Likert scale)

  • Question 5.2: Rate the following statement: The mapping of the AI solution information to the application domain information is important for me to understand that the initiated action was effective or not. (Answer options: 5-point Likert scale)

Topic 6: Validate the Causal Chain Model as the Assumed Mental Model

  • Question 6.1: The causal chain model reflects the key elements of an incident very well. (Answer options: 5-point Likert scale)

  • Question 6.2: In addition to the responsive action: Which other content do you need to see to understand what happened and why per incident? (Answer options: root cause 1, causal chain, root cause 2, responsive action, none, something else; multiple selections are possible)

  • Question 6.3: In addition to the responsive action: Which other content do you need to see to trust the outcome of the application? (Answer options: root cause 1, causal chain, root cause 2, responsive action, none, something else; multiple selections are possible)

Topic 7: Confidence Level

  • Question 7.1: Which option to present one or more system selected options do you prefer? (Answer option: one confidence level option)

  • Question 7.2: Why have you selected this option?

  • Question 7.3: Rate the statement: The selected option for showing and explaining the confidence level is important for me to understand what happened and why. (Answer options: 5-point Likert scale)

  • Question 7.4: Rate the statement: The selected option for showing and explaining the confidence level is important for me to trust the outcome of the application. (Answer options: 5-point Likert scale)

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Degen, H., Budnik, C., Gross, R., Rothering, M. (2023). How to Explain It to a Model Manager?. In: Degen, H., Ntoa, S. (eds) Artificial Intelligence in HCI. HCII 2023. Lecture Notes in Computer Science(), vol 14050. Springer, Cham. https://doi.org/10.1007/978-3-031-35891-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-35891-3_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-35890-6

  • Online ISBN: 978-3-031-35891-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics