Abstract
This study investigates the evolving dynamics of commonly used feature attribution (FA) values during training of neural networks. As models transition from a state of high uncertainty to low uncertainty, we show that the features’ significance also changes, which is inline with the general learning theory of deep neural networks. During model training, we compute FA scores through Layer-wise Relevance Propagation (LRP) and Gradient-weighted Class Activation Mapping (Grad-CAM), which are selected for their efficiency and speed of computation. We summarize the attribution scores in terms of the sum of the absolute values of FA scores and their entropy. We further analyze these summary scores in relation to the models’ generalization capabilities. The analysis identifies trends where FA values increase in magnitude while entropy decreases during the training process, regardless of model generalization, suggesting independence of overfitting. This research offers a unique view on the application of FA methods in explainable artificial intelligence (XAI) and raises intriguing questions about their behavior across varying model architectures and datasets, which may have implications for future work combining XAI and uncertainty estimation in machine learning.
We gratefully acknowledge funding by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation): TRR 318/1 2021 - 438445824.
E. Terzieva and M. Muschalik—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The source code of the descriptive analysis conducted in Sect. 3 and Sect. 4 is publicly available at https://github.com/EliTerzieva1995/Identifying-Trends-in-Feature-Attributions-during-Training-of-Neural-Networks. This repository also contains the appendix and further supplementary material of this work.
- 2.
For a detailed description of the models used and the datasets we refer to the supplementary material (Section A.1 and Section A.2).
- 3.
For details on the negative FA values we refer to the appendix.
References
Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018). https://doi.org/10.1109/ACCESS.2018.2870052
Ancona, M., Ceolini, E., Öztireli, C., Gross, M.: Gradient-Based Attribution Methods, pp. 169–191. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28954-6_9
Hüllermeier, E., Waegeman, W.: Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Mach. Learn. 110(3), 457–506 (2021). https://doi.org/10.1007/s10994-021-05946-3
Kendall, A., Gal, Y.: What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? CoRR abs/1703.04977 (2017)
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report (2009)
Lapuschkin, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.R., Samek, W.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10, e0130140 (2015). https://doi.org/10.1371/journal.pone.0130140
Löfström, H., Löfström, T., Johansson, U., Sönströd, C.: Calibrated Explanations: with Uncertainty Information and Counterfactuals. CoRR abs/2305.02305 (2023)
Montavon, G., Binder, A., Lapuschkin, S., Samek, W., Müller, K.-R.: Layer-wise relevance propagation: an overview. In: Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., Müller, K.-R. (eds.) Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. LNCS (LNAI), vol. 11700, pp. 193–209. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28954-6_10
Schwalbe, G., Finzel, B.: A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Min. Knowl. Disc. (2023). https://doi.org/10.1007/s10618-022-00867-8
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: IEEE International Conference on Computer Vision, ICCV, pp. 618–626. IEEE Computer Society (2017). https://doi.org/10.1109/ICCV.2017.74
Watson, D.S., O’Hara, J., Tax, N., Mudd, R., Guy, I.: Explaining Predictive Uncertainty with Information Theoretic Shapley Values. CoRR abs/2306.05724 (2023)
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. CoRR abs/1708.07747 (2017)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Terzieva, E., Muschalik, M., Hofman, P., Hüllermeier, E. (2025). Identifying Trends in Feature Attributions During Training of Neural Networks. In: Meo, R., Silvestri, F. (eds) Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2023. Communications in Computer and Information Science, vol 2134. Springer, Cham. https://doi.org/10.1007/978-3-031-74627-7_29
Download citation
DOI: https://doi.org/10.1007/978-3-031-74627-7_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-74626-0
Online ISBN: 978-3-031-74627-7
eBook Packages: Artificial Intelligence (R0)