Abstract
The increasing application of Explainable AI (XAI) methods to enhance the transparency and trustworthiness of AI systems designates the need to quantitatively assess and analyze the theoretical and behavioral characteristics of explanations generated by these methods. A fair amount of metrics and properties exist, however these metrics are method-specific, complex and at times hard to interpret. This work focuses on (i) identification of these metrics and properties applicable to the selected post-hoc counterfactual explanation methods (a mechanism for generating explanations), (ii) assessing the applicability of the identified metrics and properties to compare counterfactual examples across explanation methods, and (iii) analyzing the properties of those counterfactual explanation methods. A pipeline is designed to implement the proof-of-concept tool, comprising of the following steps-selecting a data set, training some suitable classifier, deploying counterfactual generation method(s), and implementing defined XAI metrics to infer properties satisfied by explanation methods. The outcome of the experiments reflects that desirable properties for counterfactual explanations are more or less satisfied as measured by different metrics. Certain inconsistencies were identified in the counterfactual explanation methods such as the resulting counterfactual instances failed to be pushed to the desired class, defeating one of the main purposes of obtaining counterfactual explanations. Besides, several other properties have been discussed to analyze counterfactual explanation methods.
Supported by Ericsson Research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., Kim, B.: Sanity checks for saliency maps (2020)
Apley, D.W., Zhu, J.: Visualizing the effects of predictor variables in black box supervised learning models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 82(4), 1059–1086 (2020)
Bhatt, U., Weller, A., Moura, J.M.F.: Evaluating and aggregating feature-based model explanations. In: Bessiere, C. (ed.) 29th International Joint Conference on Artificial Intelligence, Yokohama, pp. 3016–3022. IJCAI (2020). https://doi.org/10.24963/ijcai.2020/417. https://www.ijcai.org/proceedings/2020/417
Camburu, O.M., Giunchiglia, E., Foerster, J., Lukasiewicz, T., Blunsom, P.: Can I trust the explainer? Verifying post-hoc explanatory methods. In: NeurIPS 2019 Workshop on Safety and Robustness in Decision Making, Vancouver (2019). http://arxiv.org/abs/1910.02065
Covert, I., Lundberg, S., Lee, S.: Understanding global feature contributions through additive importance measures. CoRR abs/2004.00668 (2020). https://arxiv.org/abs/2004.00668
Cyras, K., et al.: Machine reasoning explainability (2020)
Dhurandhar, A.: Explanations based on the missing: towards contrastive explanations with pertinent negatives. CoRR abs/1802.07623 (2018). http://arxiv.org/abs/1802.07623
Dhurandhar, A., Pedapati, T., Balakrishnan, A., Chen, P., Shanmugam, K., Puri, R.: Model agnostic contrastive explanations for structured data. CoRR abs/1906.00117 (2019). http://arxiv.org/abs/1906.00117
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics 7(2), 179–188 (1936). https://doi.org/10.1111/j.1469-1809.1936.tb02137.x. https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1469-1809.1936.tb02137.x
Ghidini, V., Perotti, A., Schifanella, R.: Quantitative and ontology-based comparison of explanations for image classification. In: Nicosia, G., Pardalos, P., Umeton, R., Giuffrida, G., Sciacca, V. (eds.) LOD 2019. LNCS, vol. 11943, pp. 58–70. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-37599-7_6
Guidotti, R.: Evaluating local explanation methods on ground truth. Artif. Intell. 291, 103428 (2021). https://doi.org/10.1016/j.artint.2020.103428. https://www.sciencedirect.com/science/article/pii/S0004370220301776
Gunning, D., Aha, D.: Darpa’s explainable artificial intelligence (XAI) program. AI Mag. 40(2), 44–58 (2019). https://doi.org/10.1609/aimag.v40i2.2850. https://ojs.aaai.org/index.php/aimagazine/article/view/2850
Gurumoorthy, K.S., Dhurandhar, A., Cecchi, G., Aggarwal, C.: Efficient data representation by selecting prototypes with importance weights (2019)
Hoffman, R.R., Mueller, S.T., Klein, G., Litman, J.: Metrics for explainable AI: challenges and prospects, pp. 1–50 (2020). http://arxiv.org/abs/1812.04608
Inam, R., Terra, A., Mujumdar, A., Fersman, E., Feljan., A.V.: Explainable AI - how humans can trust AI (2021). https://www.ericsson.com/en/reports-and-papers/white-papers/explainable-ai-how-humans-can-trust-ai
Kim, B., et al.: Interpretability beyond feature attribution: quantitative testing with concept activation vectors (TCAV). In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, 10–15 July 2018, vol. 80, pp. 2668–2677. PMLR (2018). http://proceedings.mlr.press/v80/kim18d.html
Kotu, V., Deshpande, B.: Chapter 4 - Classification. In: Kotu, V., Deshpande, B. (eds.) Data Science, 2nd edn, pp. 65–163. Morgan Kaufmann (2019). https://doi.org/10.1016/B978-0-12-814761-0.00004-6. https://www.sciencedirect.com/science/article/pii/B9780128147610000046
Laugel, T., Lesot, M.J., Marsala, C., Detyniecki, M.: Issues with post-hoc counterfactual explanations: a discussion. In: Workshop on Human In the Loop Learning (HILL), Long Beach, CA (2019). http://arxiv.org/abs/1906.04774
Laugel, T., Lesot, M.J., Marsala, C., Renard, X., Detyniecki, M.: The dangers of post-hoc interpretability: unjustified counterfactual explanations. In: IJCAI International Joint Conference on Artificial Intelligence, August 2019, pp. 2801–2807 (2019). https://doi.org/10.24963/ijcai.2019/388
LeCun, Y., Kavukcuoglu, K., Farabet, C.: Convolutional networks and applications in vision, pp. 253–256. IEEE (2010)
Van Looveren, A., Klaise, J.: Interpretable counterfactual explanations guided by prototypes. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds.) ECML PKDD 2021. LNCS (LNAI), vol. 12976, pp. 650–665. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86520-7_40
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017). https://proceedings.neurips.cc/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf
Montavon, G., Samek, W., Müller, K.R.: Methods for interpreting and understanding deep neural networks. Digit. Signal Process. 73, 1–15 (2018). https://doi.org/10.1016/j.dsp.2017.10.011. https://www.sciencedirect.com/science/article/pii/S1051200417302385
Mothilal, R.K., Mahajan, D., Tan, C., Sharma, A.: Towards unifying feature attribution and counterfactual explanations: different means to the same end (2021)
Mothilal, R.K., Sharma, A., Tan, C.: Explaining machine learning classifiers through diverse counterfactual explanations. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, FAT* 2020, New York, NY, USA, pp. 607–617. Association for Computing Machinery (2020). https://doi.org/10.1145/3351095.3372850
Poyiadzi, R., Sokol, K., Santos-RodrĂguez, R., De Bie, T., Flach, P.: FACE: feasible and actionable counterfactual explanations. In: Markham, A.N., Powles, J., Walsh, T., Washington, A.L. (eds.) AAAI/ACM Conference on AI, Ethics, and Society, New York, NY, pp. 344–350. ACM (2020). https://doi.org/10.1145/3375627.3375850. https://dl.acm.org/doi/10.1145/3375627.3375850
Ribeiro, M.T., Singh, S., Guestrin, C.: Nothing else matters: model-agnostic explanations by identifying prediction invariance (2016)
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should I trust you?”: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, New York, NY, USA, pp. 1135–1144. Association for Computing Machinery (2016). https://doi.org/10.1145/2939672.2939778
Russell, C.: Efficient search for diverse coherent explanations. In: Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* 2019, New York, NY, USA, pp. 20–28. Association for Computing Machinery (2019). https://doi.org/10.1145/3287560.3287569
Samek, W., Montavon, G., Binder, A., Lapuschkin, S., MĂĽller, K.R.: Interpreting the predictions of complex ml models by layer-wise relevance propagation (2016)
Sharma, S., Henderson, J., Ghosh, J.: CERTIFAI: a common framework to provide explanations and analyse the fairness and robustness of black-box models. In: Markham, A.N., Powles, J., Walsh, T., Washington, A.L. (eds.) AAAI/ACM Conference on AI, Ethics, and Society, New York, NY, pp. 166–172. ACM (2020). https://doi.org/10.1145/3375627.3375812
Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences (2019)
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps (2013)
Singh, V.: Explainable AI metrics and properties for evaluation and analysis of counterfactual explanations. Master’s thesis, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology (2021). http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-462552
Sinnott, R., Duan, H., Sun, Y.: Chapter 15 - A case study in big data analytics: exploring twitter sentiment analysis and the weather. In: Buyya, R., Calheiros, R.N., Dastjerdi, A.V. (eds.) Big Data, pp. 357–388. Morgan Kaufmann (2016). https://doi.org/10.1016/B978-0-12-805394-2.00015-5. https://www.sciencedirect.com/science/article/pii/B9780128053942000155
Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017). https://proceedings.neurips.cc/paper/2017/file/cb8da6767461f2812ae4290eac7cbc42-Paper.pdf
Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks (2017)
Wachter, S., Mittelstadt, B., Russell, C.: Counterfactual explanations without opening the black box: automated decisions and the GDPR. Harvard J. Law Technol. 31(2), October 2017. https://doi.org/10.2139/ssrn.3063289. https://dx.doi.org/10.2139/ssrn.3063289
White, A., d’Avila Garcez, A.: Measurable counterfactual local explanations for any classifier (2019). http://arxiv.org/abs/1908.03020
Yang, M., Kim, B.: Benchmarking attribution methods with relative feature importance (2018, 2019). http://arxiv.org/abs/1907.09701
Zhou, J., Gandomi, A.H., Chen, F., Holzinger, A.: Evaluating the quality of machine learning explanations: a survey on methods and metrics. Electronics 10(5) (2021). https://doi.org/10.3390/electronics10050593. https://www.mdpi.com/2079-9292/10/5/593
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Singh, V., Cyras, K., Inam, R. (2022). Explainability Metrics and Properties for Counterfactual Explanation Methods. In: Calvaresi, D., Najjar, A., Winikoff, M., Främling, K. (eds) Explainable and Transparent AI and Multi-Agent Systems. EXTRAAMAS 2022. Lecture Notes in Computer Science(), vol 13283. Springer, Cham. https://doi.org/10.1007/978-3-031-15565-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-15565-9_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15564-2
Online ISBN: 978-3-031-15565-9
eBook Packages: Computer ScienceComputer Science (R0)