Open access
Author
Date
2020Type
- Doctoral Thesis
ETH Bibliography
yes
Altmetrics
Abstract
The last decade has witnessed an increasing adoption of black-box machine learning models in a variety of fields while data-driven models are becoming increasingly important for sensitive domains such as healthcare, hiring, criminal risk assessment, and self-driving cars.
Although the recent successes of deep neural networks have fostered the interest of academia and industry, the predictions of these models are currently opaque to humans, which has important implications in terms of security, ethics, robustness, and scientific understanding.
As deep neural networks will be increasingly deployed for automatic decision making affecting the lives of people, it is paramount to understand how these models work, how to avoid unintended biases, and how to protect their predictions from deliberate manipulations.
In this thesis, we investigate attribution methods as a tool to assess neural network predictions.
First, we compare and discuss existing attribution methods, prove conditions of equivalence and approximation between them, and propose a unified framework. Besides providing a better theoretical understanding of previously proposed heuristics, our analysis also suggests an easier implementation for some attribution methods.
Second, by endorsing an axiomatic approach, we suggest several desirable properties that attribution methods should arguably satisfy. We show that Shapley values, a classic result from cooperative game theory, is an ideal candidate. As computing Shapley values is computationally prohibitive, we then propose an algorithm for their approximation specifically designed for deep neural networks.
Finally, we show that attribution methods can be particularly valuable beyond the scope of interpretability. In particular, we show how methods developed to explain the neural network predictions can be also exploited for pruning the least important hidden units in a network, such to reduce its computational cost with only a small degradation in performance.
We conclude with a broader discussion on the advantages and drawbacks of attribution methods in comparison with other transparency techniques as well as emerging applications and possible future works. As we believe that transparency in machine learning will become crucial over the next decade, we hope that our work provides an important step in such a direction. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000446911Publication status
publishedExternal links
Search print copy at ETH Library
Publisher
ETH ZurichSubject
Artificial Intelligence; Machine Learning; explainable artificial intelligence; Neural networks; Game theoryOrganisational unit
03420 - Gross, Markus / Gross, Markus
More
Show all metadata
ETH Bibliography
yes
Altmetrics