Identifying Trends in Feature Attributions During Training of Neural Networks

Terzieva, Elena; Muschalik, Maximilian; Hofman, Paul; Hüllermeier, Eyke

doi:10.1007/978-3-031-74627-7_29

Elena Terzieva⁴,
Maximilian Muschalik^4,5,
Paul Hofman^4,5 &
…
Eyke Hüllermeier^4,5

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2134))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

74 Accesses

Abstract

This study investigates the evolving dynamics of commonly used feature attribution (FA) values during training of neural networks. As models transition from a state of high uncertainty to low uncertainty, we show that the features’ significance also changes, which is inline with the general learning theory of deep neural networks. During model training, we compute FA scores through Layer-wise Relevance Propagation (LRP) and Gradient-weighted Class Activation Mapping (Grad-CAM), which are selected for their efficiency and speed of computation. We summarize the attribution scores in terms of the sum of the absolute values of FA scores and their entropy. We further analyze these summary scores in relation to the models’ generalization capabilities. The analysis identifies trends where FA values increase in magnitude while entropy decreases during the training process, regardless of model generalization, suggesting independence of overfitting. This research offers a unique view on the application of FA methods in explainable artificial intelligence (XAI) and raises intriguing questions about their behavior across varying model architectures and datasets, which may have implications for future work combining XAI and uncertainty estimation in machine learning.

We gratefully acknowledge funding by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation): TRR 318/1 2021 - 438445824.

E. Terzieva and M. Muschalik—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Toward Understanding the Disagreement Problem in Neural Network Feature Attribution

Deep Interpretation with Sign Separated and Contribution Recognized Decomposition

Explaining Neural Networks - Deep and Shallow

Notes

1.
The source code of the descriptive analysis conducted in Sect. 3 and Sect. 4 is publicly available at https://github.com/EliTerzieva1995/Identifying-Trends-in-Feature-Attributions-during-Training-of-Neural-Networks. This repository also contains the appendix and further supplementary material of this work.
2.
For a detailed description of the models used and the datasets we refer to the supplementary material (Section A.1 and Section A.2).
3.
For details on the negative FA values we refer to the appendix.

References

Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018). https://doi.org/10.1109/ACCESS.2018.2870052
Article Google Scholar
Ancona, M., Ceolini, E., Öztireli, C., Gross, M.: Gradient-Based Attribution Methods, pp. 169–191. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28954-6_9
Hüllermeier, E., Waegeman, W.: Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Mach. Learn. 110(3), 457–506 (2021). https://doi.org/10.1007/s10994-021-05946-3
Article MathSciNet MATH Google Scholar
Kendall, A., Gal, Y.: What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? CoRR abs/1703.04977 (2017)
Google Scholar
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report (2009)
Google Scholar
Lapuschkin, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.R., Samek, W.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10, e0130140 (2015). https://doi.org/10.1371/journal.pone.0130140
Löfström, H., Löfström, T., Johansson, U., Sönströd, C.: Calibrated Explanations: with Uncertainty Information and Counterfactuals. CoRR abs/2305.02305 (2023)
Google Scholar
Montavon, G., Binder, A., Lapuschkin, S., Samek, W., Müller, K.-R.: Layer-wise relevance propagation: an overview. In: Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., Müller, K.-R. (eds.) Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. LNCS (LNAI), vol. 11700, pp. 193–209. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28954-6_10
Chapter MATH Google Scholar
Schwalbe, G., Finzel, B.: A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Min. Knowl. Disc. (2023). https://doi.org/10.1007/s10618-022-00867-8
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: IEEE International Conference on Computer Vision, ICCV, pp. 618–626. IEEE Computer Society (2017). https://doi.org/10.1109/ICCV.2017.74
Watson, D.S., O’Hara, J., Tax, N., Mudd, R., Guy, I.: Explaining Predictive Uncertainty with Information Theoretic Shapley Values. CoRR abs/2306.05724 (2023)
Google Scholar
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. CoRR abs/1708.07747 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Informatics, LMU Munich, Munich, Germany
Elena Terzieva, Maximilian Muschalik, Paul Hofman & Eyke Hüllermeier
MCML Munich Center for Machine Learning, Munich, Germany
Maximilian Muschalik, Paul Hofman & Eyke Hüllermeier

Authors

Elena Terzieva
View author publications
You can also search for this author in PubMed Google Scholar
Maximilian Muschalik
View author publications
You can also search for this author in PubMed Google Scholar
Paul Hofman
View author publications
You can also search for this author in PubMed Google Scholar
Eyke Hüllermeier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Elena Terzieva or Maximilian Muschalik .

Editor information

Editors and Affiliations

University of Turin, Turin, Italy
Rosa Meo
Sapienza University of Rome, Rome, Italy
Fabrizio Silvestri

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Terzieva, E., Muschalik, M., Hofman, P., Hüllermeier, E. (2025). Identifying Trends in Feature Attributions During Training of Neural Networks. In: Meo, R., Silvestri, F. (eds) Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2023. Communications in Computer and Information Science, vol 2134. Springer, Cham. https://doi.org/10.1007/978-3-031-74627-7_29

Download citation

DOI: https://doi.org/10.1007/978-3-031-74627-7_29
Published: 01 January 2025
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-74626-0
Online ISBN: 978-3-031-74627-7
eBook Packages: Artificial Intelligence (R0)

Publish with us

Policies and ethics

Identifying Trends in Feature Attributions During Training of Neural Networks