skip to main content
10.1145/3447548.3467310acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Mutual Information Preserving Back-propagation: Learn to Invert for Faithful Attribution

Published: 14 August 2021 Publication History

Abstract

Back-propagation based visualizations have been proposed to interpret deep neural networks (DNNs), some of which produce interpretations with good visual quality. However, there exist doubts about whether these intuitive visualizations are related to network decisions. Recent studies have confirmed this suspicion by verifying that almost all these modified back-propagation visualizations are not faithful to the model's decision-making process. Besides, these visualizations produce vague "relative importance scores", among which low values can't guarantee to be independent of the final prediction. Hence, it's highly desirable to develop a novel back-propagation method that guarantees theoretical faithfulness and produces a quantitative attribution score with a clear understanding. To achieve the goal, we resort to mutual information theory to generate the interpretations, studying how much information of output is encoded in each input neuron. The basic idea is to learn a source signal by back-propagation such that the mutual information between input and output should be as much as possible preserved in the mutual information between input and the source signal. In addition, we propose a Mutual Information Preserving Inverse Network, termed MIP-IN, in which the parameters of each layer are recursively trained to learn how to invert. During the inversion, forward relu operation is adopted to adapt the general interpretations to the specific input. We then empirically demonstrate that the inverted source signal satisfies completeness and minimality property, which are crucial for a faithful interpretation. Furthermore, the empirical study validates the effectiveness of interpretations generated by MIP-IN.

Supplementary Material

MP4 File (KDD_video_finalversion.mp4)
Presentation video for "Mutual Information Preserving Back-propagation: Learn to Invert for Faithful Attribution"

References

[1]
Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, and Been Kim. 2018. Sanity checks for saliency maps. In NeurIPS.
[2]
Maximilian Alber, Sebastian Lapuschkin, Philipp Seegerer, Miriam Hägele, Kristof T. Schütt, Grégoire Montavon, Wojciech Samek, Klaus-Robert Müller, Sven Dahne, and Pieter-Jan Kindermans. 2019. iNNvestigate Neural Networks! Journal of Machine Learning Research (2019).
[3]
Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, et al. 2015. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one (2015).
[4]
David Baehrens, Timon Schroeter, Stefan Harmeling, Motoaki Kawanabe, Katja Hansen, and Klaus-Robert Müller. 2010. How to explain individual classification decisions. The Journal of Machine Learning Research, Vol. 11 (2010), 1803--1831.
[5]
Thomas M Cover. 1999. Elements of information theory. John Wiley & Sons.
[6]
Piotr Dabkowski and Yarin Gal. 2017. Real time image saliency for black box classifiers. In NeurIPS. 6967--6976.
[7]
Huiqi Deng, Na Zou, Mengnan Du, Weifu Chen, Guocan Feng, and Xia Hu. 2021 a. A General Taylor Framework for Unifying and Revisiting Attribution Methods. arXiv preprint arXiv:2105.13841 (2021).
[8]
Huiqi Deng, Na Zou, Mengnan Du, Weifu Chen, Guocan Feng, and Xia Hu. 2021 b. A Unified Taylor Framework for Revisiting Attribution Methods. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 11462--11469.
[9]
Mengnan Du, Ninghao Liu, and Xia Hu. 2019. Techniques for interpretable machine learning. Commun. ACM, Vol. 63, 1 (2019), 68--77.
[10]
Ruth C Fong and Andrea Vedaldi. 2017. Interpretable explanations of black boxes by meaningful perturbation. In ICCV.
[11]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR.
[12]
Pieter-Jan Kindermans, Kristof T Schütt, Maximilian Alber, Klaus-Robert Müller, Dumitru Erhan, Been Kim, et al. 2017. Learning how to explain neural networks: Patternnet and patternattribution. In ICLR.
[13]
Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).
[14]
Yann LeCun. 1998. The MNIST database of handwritten digits. http://yann. lecun. com/exdb/mnist/ (1998).
[15]
Scott Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In NeurIPS.
[16]
Aravindh Mahendran and Andrea Vedaldi. 2016. Salient deconvolutional networks. In European Conference on Computer Vision. Springer, 120--135.
[17]
Grégoire Montavon, Sebastian Lapuschkin, Alexander Binder, Wojciech Samek, and Klaus-Robert Müller. 2017. Explaining nonlinear classification decisions with deep taylor decomposition. Pattern Recognition, Vol. 65 (2017), 211--222.
[18]
Weili Nie, Yang Zhang, and Ankit Patel. 2018. A theoretical explanation for perplexing behaviors of backpropagation-based visualizations. In ICML. PMLR.
[19]
Meng Pang, Binghui Wang, Yiu-ming Cheung, Yiran Chen, and Bihan Wen. 2021. VD-GAN: A Unified Framework for Joint Prototype and Representation Learning From Contaminated Single Sample per Person. IEEE Transactions on Information Forensics and Security, Vol. 16 (2021), 2246--2259.
[20]
Hongbin Pei, Bo Yang, Jiming Liu, and Kevin Chang. 2020. Active Surveillance via Group Sparse Bayesian Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020). https://doi.org/10.1109/TPAMI.2020.3023092
[21]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. IJCV (2015).
[22]
Wojciech Samek, Alexander Binder, Grégoire Montavon, Sebastian Lapuschkin, and Klaus-Robert Müller. 2016. Evaluating the visualization of what a deep neural network has learned. TNNLS (2016).
[23]
Karl Schulz, Leon Sixt, Federico Tombari, and Tim Landgraf. 2020. Restricting the Flow: Information Bottlenecks for Attribution. In ICLR.
[24]
Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In ICCV.
[25]
Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning important features through propagating activation differences. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org.
[26]
Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Deep inside convolutional networks: Visualising image classification models and saliency maps. In ICLR.
[27]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[28]
Leon Sixt, Maximilian Granz, and Tim Landgraf. 2020. When Explanations Lie: Why Many Modified BP Attributions Fail. In ICML.
[29]
Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. 2017. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017).
[30]
Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. 2014. Striving for simplicity: The all convolutional net. In ICLR.
[31]
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic attribution for deep networks. In ICML.
[32]
Jorg Wagner, Jan Mathias Kohler, Tobias Gindele, Leon Hetzel, Jakob Thaddaus Wiedemer, and Sven Behnke. 2019. Interpretable and fine-grained visual explanations for convolutional neural networks. In CVPR.
[33]
Zhitao Ying, Dylan Bourgeois, Jiaxuan You, Marinka Zitnik, et al. 2019. Gnnexplainer: Generating explanations for graph neural networks. In NeurIPS.
[34]
Matthew D Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European conference on computer vision. Springer, 818--833.
[35]
Jianming Zhang, Sarah Adel Bargal, Zhe Lin, Jonathan Brandt, et al. 2018. Top-down neural attention by excitation backprop. IJCV (2018).

Cited By

View all
  • (2024)Unifying Fourteen Post-Hoc Attribution Methods With Taylor InteractionsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.335841046:7(4625-4640)Online publication date: Jul-2024
  • (2024)An attribution graph-based interpretable method for CNNsNeural Networks10.1016/j.neunet.2024.106597179:COnline publication date: 1-Nov-2024
  • (2023)HarsanyiNetProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618597(4804-4825)Online publication date: 23-Jul-2023
  • Show More Cited By

Index Terms

  1. Mutual Information Preserving Back-propagation: Learn to Invert for Faithful Attribution

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
    August 2021
    4259 pages
    ISBN:9781450383325
    DOI:10.1145/3447548
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 August 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. back-propagation techniques
    2. faithfulness
    3. model interpretation
    4. mutual information preserving

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China

    Conference

    KDD '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)28
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Unifying Fourteen Post-Hoc Attribution Methods With Taylor InteractionsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.335841046:7(4625-4640)Online publication date: Jul-2024
    • (2024)An attribution graph-based interpretable method for CNNsNeural Networks10.1016/j.neunet.2024.106597179:COnline publication date: 1-Nov-2024
    • (2023)HarsanyiNetProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618597(4804-4825)Online publication date: 23-Jul-2023
    • (2023)A Factor Marginal Effect Analysis Approach and Its Application in E-Commerce Search SystemInternational Journal of Intelligent Systems10.1155/2023/69688542023(1-15)Online publication date: 11-Oct-2023

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media