Explainability Metrics and Properties for Counterfactual Explanation Methods

Singh, Vandita; Cyras, Kristijonas; Inam, Rafia

doi:10.1007/978-3-031-15565-9_10

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13283))

Included in the following conference series:

International Workshop on Explainable, Transparent Autonomous Agents and Multi-Agent Systems

1010 Accesses
2 Citations

Abstract

The increasing application of Explainable AI (XAI) methods to enhance the transparency and trustworthiness of AI systems designates the need to quantitatively assess and analyze the theoretical and behavioral characteristics of explanations generated by these methods. A fair amount of metrics and properties exist, however these metrics are method-specific, complex and at times hard to interpret. This work focuses on (i) identification of these metrics and properties applicable to the selected post-hoc counterfactual explanation methods (a mechanism for generating explanations), (ii) assessing the applicability of the identified metrics and properties to compare counterfactual examples across explanation methods, and (iii) analyzing the properties of those counterfactual explanation methods. A pipeline is designed to implement the proof-of-concept tool, comprising of the following steps-selecting a data set, training some suitable classifier, deploying counterfactual generation method(s), and implementing defined XAI metrics to infer properties satisfied by explanation methods. The outcome of the experiments reflects that desirable properties for counterfactual explanations are more or less satisfied as measured by different metrics. Certain inconsistencies were identified in the counterfactual explanation methods such as the resulting counterfactual instances failed to be pushed to the desired class, defeating one of the main purposes of obtaining counterfactual explanations. Besides, several other properties have been discussed to analyze counterfactual explanation methods.

Supported by Ericsson Research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Meta Survey of Quality Evaluation Criteria in Explanation Methods

How to Explain It to System Testers?

Metrics for Evaluating Actionability in Explainable AI

References

Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., Kim, B.: Sanity checks for saliency maps (2020)
Google Scholar
Apley, D.W., Zhu, J.: Visualizing the effects of predictor variables in black box supervised learning models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 82(4), 1059–1086 (2020)
Article MathSciNet Google Scholar
Bhatt, U., Weller, A., Moura, J.M.F.: Evaluating and aggregating feature-based model explanations. In: Bessiere, C. (ed.) 29th International Joint Conference on Artificial Intelligence, Yokohama, pp. 3016–3022. IJCAI (2020). https://doi.org/10.24963/ijcai.2020/417. https://www.ijcai.org/proceedings/2020/417
Camburu, O.M., Giunchiglia, E., Foerster, J., Lukasiewicz, T., Blunsom, P.: Can I trust the explainer? Verifying post-hoc explanatory methods. In: NeurIPS 2019 Workshop on Safety and Robustness in Decision Making, Vancouver (2019). http://arxiv.org/abs/1910.02065
Covert, I., Lundberg, S., Lee, S.: Understanding global feature contributions through additive importance measures. CoRR abs/2004.00668 (2020). https://arxiv.org/abs/2004.00668
Cyras, K., et al.: Machine reasoning explainability (2020)
Google Scholar
Dhurandhar, A.: Explanations based on the missing: towards contrastive explanations with pertinent negatives. CoRR abs/1802.07623 (2018). http://arxiv.org/abs/1802.07623
Dhurandhar, A., Pedapati, T., Balakrishnan, A., Chen, P., Shanmugam, K., Puri, R.: Model agnostic contrastive explanations for structured data. CoRR abs/1906.00117 (2019). http://arxiv.org/abs/1906.00117
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics 7(2), 179–188 (1936). https://doi.org/10.1111/j.1469-1809.1936.tb02137.x. https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1469-1809.1936.tb02137.x
Ghidini, V., Perotti, A., Schifanella, R.: Quantitative and ontology-based comparison of explanations for image classification. In: Nicosia, G., Pardalos, P., Umeton, R., Giuffrida, G., Sciacca, V. (eds.) LOD 2019. LNCS, vol. 11943, pp. 58–70. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-37599-7_6
Chapter Google Scholar
Guidotti, R.: Evaluating local explanation methods on ground truth. Artif. Intell. 291, 103428 (2021). https://doi.org/10.1016/j.artint.2020.103428. https://www.sciencedirect.com/science/article/pii/S0004370220301776
Gunning, D., Aha, D.: Darpa’s explainable artificial intelligence (XAI) program. AI Mag. 40(2), 44–58 (2019). https://doi.org/10.1609/aimag.v40i2.2850. https://ojs.aaai.org/index.php/aimagazine/article/view/2850
Gurumoorthy, K.S., Dhurandhar, A., Cecchi, G., Aggarwal, C.: Efficient data representation by selecting prototypes with importance weights (2019)
Google Scholar
Hoffman, R.R., Mueller, S.T., Klein, G., Litman, J.: Metrics for explainable AI: challenges and prospects, pp. 1–50 (2020). http://arxiv.org/abs/1812.04608
Inam, R., Terra, A., Mujumdar, A., Fersman, E., Feljan., A.V.: Explainable AI - how humans can trust AI (2021). https://www.ericsson.com/en/reports-and-papers/white-papers/explainable-ai-how-humans-can-trust-ai
Kim, B., et al.: Interpretability beyond feature attribution: quantitative testing with concept activation vectors (TCAV). In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, 10–15 July 2018, vol. 80, pp. 2668–2677. PMLR (2018). http://proceedings.mlr.press/v80/kim18d.html
Kotu, V., Deshpande, B.: Chapter 4 - Classification. In: Kotu, V., Deshpande, B. (eds.) Data Science, 2nd edn, pp. 65–163. Morgan Kaufmann (2019). https://doi.org/10.1016/B978-0-12-814761-0.00004-6. https://www.sciencedirect.com/science/article/pii/B9780128147610000046
Laugel, T., Lesot, M.J., Marsala, C., Detyniecki, M.: Issues with post-hoc counterfactual explanations: a discussion. In: Workshop on Human In the Loop Learning (HILL), Long Beach, CA (2019). http://arxiv.org/abs/1906.04774
Laugel, T., Lesot, M.J., Marsala, C., Renard, X., Detyniecki, M.: The dangers of post-hoc interpretability: unjustified counterfactual explanations. In: IJCAI International Joint Conference on Artificial Intelligence, August 2019, pp. 2801–2807 (2019). https://doi.org/10.24963/ijcai.2019/388
LeCun, Y., Kavukcuoglu, K., Farabet, C.: Convolutional networks and applications in vision, pp. 253–256. IEEE (2010)
Google Scholar
Van Looveren, A., Klaise, J.: Interpretable counterfactual explanations guided by prototypes. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds.) ECML PKDD 2021. LNCS (LNAI), vol. 12976, pp. 650–665. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86520-7_40
Chapter Google Scholar
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017). https://proceedings.neurips.cc/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf
Montavon, G., Samek, W., Müller, K.R.: Methods for interpreting and understanding deep neural networks. Digit. Signal Process. 73, 1–15 (2018). https://doi.org/10.1016/j.dsp.2017.10.011. https://www.sciencedirect.com/science/article/pii/S1051200417302385
Mothilal, R.K., Mahajan, D., Tan, C., Sharma, A.: Towards unifying feature attribution and counterfactual explanations: different means to the same end (2021)
Google Scholar
Mothilal, R.K., Sharma, A., Tan, C.: Explaining machine learning classifiers through diverse counterfactual explanations. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, FAT* 2020, New York, NY, USA, pp. 607–617. Association for Computing Machinery (2020). https://doi.org/10.1145/3351095.3372850
Poyiadzi, R., Sokol, K., Santos-Rodríguez, R., De Bie, T., Flach, P.: FACE: feasible and actionable counterfactual explanations. In: Markham, A.N., Powles, J., Walsh, T., Washington, A.L. (eds.) AAAI/ACM Conference on AI, Ethics, and Society, New York, NY, pp. 344–350. ACM (2020). https://doi.org/10.1145/3375627.3375850. https://dl.acm.org/doi/10.1145/3375627.3375850
Ribeiro, M.T., Singh, S., Guestrin, C.: Nothing else matters: model-agnostic explanations by identifying prediction invariance (2016)
Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should I trust you?”: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, New York, NY, USA, pp. 1135–1144. Association for Computing Machinery (2016). https://doi.org/10.1145/2939672.2939778
Russell, C.: Efficient search for diverse coherent explanations. In: Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* 2019, New York, NY, USA, pp. 20–28. Association for Computing Machinery (2019). https://doi.org/10.1145/3287560.3287569
Samek, W., Montavon, G., Binder, A., Lapuschkin, S., Müller, K.R.: Interpreting the predictions of complex ml models by layer-wise relevance propagation (2016)
Google Scholar
Sharma, S., Henderson, J., Ghosh, J.: CERTIFAI: a common framework to provide explanations and analyse the fairness and robustness of black-box models. In: Markham, A.N., Powles, J., Walsh, T., Washington, A.L. (eds.) AAAI/ACM Conference on AI, Ethics, and Society, New York, NY, pp. 166–172. ACM (2020). https://doi.org/10.1145/3375627.3375812
Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences (2019)
Google Scholar
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps (2013)
Google Scholar
Singh, V.: Explainable AI metrics and properties for evaluation and analysis of counterfactual explanations. Master’s thesis, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology (2021). http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-462552
Sinnott, R., Duan, H., Sun, Y.: Chapter 15 - A case study in big data analytics: exploring twitter sentiment analysis and the weather. In: Buyya, R., Calheiros, R.N., Dastjerdi, A.V. (eds.) Big Data, pp. 357–388. Morgan Kaufmann (2016). https://doi.org/10.1016/B978-0-12-805394-2.00015-5. https://www.sciencedirect.com/science/article/pii/B9780128053942000155
Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017). https://proceedings.neurips.cc/paper/2017/file/cb8da6767461f2812ae4290eac7cbc42-Paper.pdf
Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks (2017)
Google Scholar
Wachter, S., Mittelstadt, B., Russell, C.: Counterfactual explanations without opening the black box: automated decisions and the GDPR. Harvard J. Law Technol. 31(2), October 2017. https://doi.org/10.2139/ssrn.3063289. https://dx.doi.org/10.2139/ssrn.3063289
White, A., d’Avila Garcez, A.: Measurable counterfactual local explanations for any classifier (2019). http://arxiv.org/abs/1908.03020
Yang, M., Kim, B.: Benchmarking attribution methods with relative feature importance (2018, 2019). http://arxiv.org/abs/1907.09701
Zhou, J., Gandomi, A.H., Chen, F., Holzinger, A.: Evaluating the quality of machine learning explanations: a survey on methods and metrics. Electronics 10(5) (2021). https://doi.org/10.3390/electronics10050593. https://www.mdpi.com/2079-9292/10/5/593

Download references

Author information

Authors and Affiliations

Ericsson Research, Stockholm, Sweden
Vandita Singh, Kristijonas Cyras & Rafia Inam

Authors

Vandita Singh
View author publications
You can also search for this author in PubMed Google Scholar
Kristijonas Cyras
View author publications
You can also search for this author in PubMed Google Scholar
Rafia Inam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vandita Singh .

Editor information

Editors and Affiliations

University of Applied Sciences and Arts of Western Switzerland, Sierre, Switzerland
Davide Calvaresi
Luxembourg Institute of Science and Technology, Esch-sur-Alzette, Luxembourg
Amro Najjar
Victoria University of Wellington, Wellington, New Zealand
Michael Winikoff
Umeå University, Umeå, Sweden
Kary Främling

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Singh, V., Cyras, K., Inam, R. (2022). Explainability Metrics and Properties for Counterfactual Explanation Methods. In: Calvaresi, D., Najjar, A., Winikoff, M., Främling, K. (eds) Explainable and Transparent AI and Multi-Agent Systems. EXTRAAMAS 2022. Lecture Notes in Computer Science(), vol 13283. Springer, Cham. https://doi.org/10.1007/978-3-031-15565-9_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-15565-9_10
Published: 23 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15564-2
Online ISBN: 978-3-031-15565-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Explainability Metrics and Properties for Counterfactual Explanation Methods