XAI with Machine Teaching When Humans Are (Not) Informed About the Irrelevant Features

Håvardstun, Brigt Arve Toppe; Ferri, Cèsar; Hernández-Orallo, Jose; Parviainen, Pekka; Telle, Jan Arne

doi:10.1007/978-3-031-43418-1_23

Brigt Arve Toppe Håvardstun¹²,
Cèsar Ferri¹³,
Jose Hernández-Orallo¹³,
Pekka Parviainen¹² &
…
Jan Arne Telle¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14171))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

823 Accesses

Abstract

Exemplar-based explainable artificial intelligence (XAI) aims at creating human understanding about the behaviour of an AI system, usually a machine learning model, through examples. The advantage of this approach is that the human creates their own explanation in their own internal language. However, what examples should be chosen? Existing frameworks fall short in capturing all the elements that contribute to this process. In this paper, we propose a comprehensive XAI framework based on machine teaching. The traditional trade-off between the fidelity and the complexity of the explanation is transformed here into a trade-off between the complexity of the examples and the fidelity the human achieves about the behaviour of the ML system to be explained. We analyse a concept class of Boolean functions that is learned by a convolutional neural network classifier over a dataset of images of possibly rotated and resized letters. We assume the human learner has a strong prior (Karnaugh maps over Boolean functions). Our explanation procedure then behaves like a machine teaching session optimising the trade-off between examples and fidelity. We include an experimental evaluation and several human studies where we analyse the capacity of teaching humans these Boolean function by means of the explanatory examples generated by our framework. We explore the effect of telling the essential features to the human and the priors, and see that the identification is more successful than by randomly sampling the examples.

A preliminary version of this work was presented as a poster at AAIP@IJCLR2022. Supported by the Norwegian Research Council, project Machine Teaching for XAI.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/BrigtHaavardstun/ExplainableAI.
2.
We conducted t-tests for all pairs of groups to test whether means differ statistically significantly and got p-values 0.0297, 0.0013, and 0.0747 for pairs (I, II), (I, III) and (II, III), respectively.
3.
We use \(\delta ^* = \textit{Number of present letters}\).
4.
We say that the system is aligned when the prior of \(L_M\) is similar to the prior of \(L_H\).

References

Basu, S., Christensen, J.: Teaching classification boundaries to humans. In: Twenty-Seventh AAAI Conference on Artificial Intelligence (2013)
Google Scholar
Chen, Y., Mac Aodha, O., Su, S., Perona, P., Yue, Y.: Near-optimal machine teaching via explanatory teaching sets. In: International Conference on Artificial Intelligence and Statistics, pp. 1970–1978 (2018)
Google Scholar
Domingos, P.: Knowledge discovery via multiple models. Intell. Data Anal. 2(1–4), 187–202 (1998)
Article Google Scholar
Feldman, J.: Minimization of Boolean complexity in human concept learning. Nature 407(4), 630–633 (2000)
Article Google Scholar
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A survey of methods for explaining black box models. ACM Comput. Surv. 51(5), 93 (2018)
Google Scholar
Hadfield-Menell, D., Russell, S.J., Abbeel, P., Dragan, A.: Cooperative inverse reinforcement learning. In: NIPS, pp. 3909–3917 (2016)
Google Scholar
Hernández-Orallo, J.: Gazing into clever hans machines. Nat. Mach. Intell. 1(4), 172 (2019)
Article MathSciNet Google Scholar
Hernández-Orallo, J., Ferri, C.: Teaching and explanations: aligning priors between machines and humans. In: Human-Like Machine Intelligence, pp. 171–198 (2021)
Google Scholar
Ho, M.K., Littman, M., MacGlashan, J., Cushman, F., Austerweil, J.L.: Showing versus doing: teaching by demonstration. In: NIPS, pp. 3027–3035. Curran (2016). www.papers.nips.cc/paper/6413-showing-versus-doing-teaching-by-demonstration.pdf
Hoosain, R.: The processing of negation. J. Verbal Learn. Verbal Behav. 12(6), 618–626 (1973). https://doi.org/10.1016/S0022-5371(73)80041-6, www.sciencedirect.com/science/article/pii/S0022537173800416
Karnaugh, M.: The map method for synthesis of combinational logic circuits. Trans. Am. Inst. Electr. Engineers Part I: Commun. Electron. 72(5), 593–599 (1953). https://doi.org/10.1109/TCE.1953.6371932
Article MathSciNet Google Scholar
Khan, F., Mutlu, B., Zhu, J.: How do humans teach: on curriculum learning and teaching dimension. In: NIPS, pp. 1449–1457 (2011)
Google Scholar
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791
Article Google Scholar
Lipton, P.: Contrastive explanation. Roy. Inst. Phil. Suppl. 27, 247–266 (1990)
Article Google Scholar
Liu, W., Dai, B., Li, X., Liu, Z., Rehg, J.M., Song, L.: Towards black-box iterative machine teaching. arXiv preprint arXiv:1710.07742 (2017)
Molnar, C.: Interpretable machine learning. https://lulu.com/ (2020)
Ortega, A., Fierrez, J., Morales, A., Wang, Z., Ribeiro, T.: Symbolic AI for XAI: evaluating LFIT inductive programming for fair and explainable automatic recruitment. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 78–87 (2021)
Google Scholar
Ouyang, L.: Bayesian inference of regular expressions from human-generated example strings. arXiv:1805.08427 (2018)
Pisano, G., Ciatto, G., Calegari, R., Omicini, A.: Neuro-symbolic computation for xai: towards a unified model. In: WOA, vol. 1613, p. 101 (2020)
Google Scholar
Rahwan, I., et al.: Machine behaviour. Nature 568(7753), 477 (2019)
Article Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: Anchors: High-precision model-agnostic explanations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Google Scholar
Samek, W., Müller, K.-R.: Towards explainable artificial intelligence. In: Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., Müller, K.-R. (eds.) Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. LNCS (LNAI), vol. 11700, pp. 5–22. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28954-6_1
Chapter Google Scholar
Telle, J.A., Hernández-Orallo, J., Ferri, C.: The teaching size: computable teachers and learners for universal languages. Mach. Learn. 108, 1653–1675 (2019). https://doi.org/10.1007/s10994-019-05821-2
Article MathSciNet MATH Google Scholar
van der Waa, J., Nieuwburg, E., Cremers, A., Neerincx, M.: Evaluating XAI: a comparison of rule-based and example-based explanations. Artif. Intell. 291, 103404 (2021)
Article MathSciNet Google Scholar
Yang, S.C.H., Vong, W.K., Sojitra, R.B., Folke, T., Shafto, P.: Mitigating belief projection in explainable artificial intelligence via Bayesian teaching. Sci. Rep. 11(1), 9863 (2021). https://doi.org/10.1038/s41598-021-89267-4. www.nature.com/articles/s41598-021-89267-4
Zhu, X.: Machine teaching: an inverse problem to machine learning and an approach toward optimal education. In: AAAI, pp. 4083–4087 (2015)
Google Scholar
Zhu, X., Singla, A., Zilles, S., Rafferty, A.N.: An overview of machine teaching (2018). arxiv.org/abs/1801.05927

Download references

Author information

Authors and Affiliations

Department of Informatics, University of Bergen, Bergen, Norway
Brigt Arve Toppe Håvardstun, Pekka Parviainen & Jan Arne Telle
VRAIN, Universitat Politècnica de València, Valencia, Spain
Cèsar Ferri & Jose Hernández-Orallo

Authors

Brigt Arve Toppe Håvardstun
View author publications
You can also search for this author in PubMed Google Scholar
Cèsar Ferri
View author publications
You can also search for this author in PubMed Google Scholar
Jose Hernández-Orallo
View author publications
You can also search for this author in PubMed Google Scholar
Pekka Parviainen
View author publications
You can also search for this author in PubMed Google Scholar
Jan Arne Telle
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Brigt Arve Toppe Håvardstun .

Editor information

Editors and Affiliations

University of Michigan, Ann Arbor, MI, USA
Danai Koutra
University of Vienna, Vienna, Austria
Claudia Plant
Max Planck Institute for Software Systems, Kaiserslautern, Germany
Manuel Gomez Rodriguez
Politecnico di Torino, Turin, Italy
Elena Baralis
CENTAI, Turin, Italy
Francesco Bonchi

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 425 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Håvardstun, B.A.T., Ferri, C., Hernández-Orallo, J., Parviainen, P., Telle, J.A. (2023). XAI with Machine Teaching When Humans Are (Not) Informed About the Irrelevant Features. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14171. Springer, Cham. https://doi.org/10.1007/978-3-031-43418-1_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-43418-1_23
Published: 17 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43417-4
Online ISBN: 978-3-031-43418-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

XAI with Machine Teaching When Humans Are (Not) Informed About the Irrelevant Features