Attribution Methods for Interpreting and Optimizing Deep Neural Networks

Ancona, Marco

doi:10.3929/ethz-b-000446911

Download

Full text (PDF, 11.04Mb)

Open access

Author

Ancona, Marco

Date

2020

Type

Doctoral Thesis

ETH Bibliography

yes

Altmetrics

Download

Full text (PDF, 11.04Mb)

Rights / license

In Copyright - Non-Commercial Use Permitted

Abstract

The last decade has witnessed an increasing adoption of black-box machine learning models in a variety of fields while data-driven models are becoming increasingly important for sensitive domains such as healthcare, hiring, criminal risk assessment, and self-driving cars. Although the recent successes of deep neural networks have fostered the interest of academia and industry, the predictions of these models are currently opaque to humans, which has important implications in terms of security, ethics, robustness, and scientific understanding. As deep neural networks will be increasingly deployed for automatic decision making affecting the lives of people, it is paramount to understand how these models work, how to avoid unintended biases, and how to protect their predictions from deliberate manipulations. In this thesis, we investigate attribution methods as a tool to assess neural network predictions. First, we compare and discuss existing attribution methods, prove conditions of equivalence and approximation between them, and propose a unified framework. Besides providing a better theoretical understanding of previously proposed heuristics, our analysis also suggests an easier implementation for some attribution methods. Second, by endorsing an axiomatic approach, we suggest several desirable properties that attribution methods should arguably satisfy. We show that Shapley values, a classic result from cooperative game theory, is an ideal candidate. As computing Shapley values is computationally prohibitive, we then propose an algorithm for their approximation specifically designed for deep neural networks. Finally, we show that attribution methods can be particularly valuable beyond the scope of interpretability. In particular, we show how methods developed to explain the neural network predictions can be also exploited for pruning the least important hidden units in a network, such to reduce its computational cost with only a small degradation in performance. We conclude with a broader discussion on the advantages and drawbacks of attribution methods in comparison with other transparency techniques as well as emerging applications and possible future works. As we believe that transparency in machine learning will become crucial over the next decade, we hope that our work provides an important step in such a direction. Show more

Permanent link

https://doi.org/10.3929/ethz-b-000446911

Publication status

published

External links

Search print copy at ETH Library

Contributors

Examiner: Gross, Markus
Examiner: Rätsch, Gunnar

Examiner: Öztireli, Cengiz

Publisher

ETH Zurich

Subject