skip to main content
10.1145/3485447.3512254acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections

Generating Perturbation-based Explanations with Robustness to Out-of-Distribution Data

Published: 25 April 2022 Publication History


Perturbation-based techniques are promising for explaining black-box machine learning models due to their effectiveness and ease of implementation. However, prior works have faced the problem of Out-of-Distribution (OoD) — an artifact of randomly perturbed data becoming inconsistent with the original dataset, degrading the reliability of generated explanations, which is still under-explored according to our best knowledge. This work addresses the OoD issue by designing a simple yet effective module that can quantify the affinity between the perturbed data and the original dataset distribution. Specifically, we penalize the influences of unreliable OoD data for the perturbed samples by integrating the inlier scores and prediction results of the target models, thereby making the final explanations more robust. Our solution is shown to be compatible with the most popular perturbation-based XAI algorithms: RISE, OCCLUSION, and LIME. Extensive experiments confirmed that our methods exhibit superior performance in most cases with computational and cognitive metrics. In particular, we point out the degradation problem of RISE algorithm for the first time. With our design, the performance of RISE can be boosted significantly. Besides, our solution also resolves a fundamental problem with a faithfulness indicator, a commonly used evaluation metric of XAI algorithms that appears sensitive to the OoD issue.


Amina Adadi and Mohammed Berrada. 2018. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access 6(2018), 52138–52160.
Chirag Agarwal and Anh Nguyen. 2020. Explaining Image Classifiers by Removing Input Features Using Generative Models. In Computer Vision - ACCV 2020 - 15th Asian Conference on Computer Vision, Kyoto, Japan, November 30 - December 4, 2020, Revised Selected Papers, Part VI(Lecture Notes in Computer Science, Vol. 12627), Hiroshi Ishikawa, Cheng-Lin Liu, Tomás Pajdla, and Jianbo Shi (Eds.). Springer, 101–118.
David Alvarez-Melis and Tommi S. Jaakkola. 2018. Towards Robust Interpretability with Self-Explaining Neural Networks. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett (Eds.). 7786–7795.
Vijay Arya, Rachel K. E. Bellamy, Pin-Yu Chen, Amit Dhurandhar, Michael Hind, Samuel C. Hoffman, Stephanie Houde, Q. Vera Liao, Ronny Luss, Aleksandra Mojsilović, Sami Mourad, Pablo Pedemonte, Ramya Raghavendra, John Richards, Prasanna Sattigeri, Karthikeyan Shanmugam, Moninder Singh, Kush R. Varshney, Dennis Wei, and Yunfeng Zhang. 2019. One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques. (2019). arXiv:1909.03012
Or Biran and Courtenay V. Cotton. 2017. Explanation and Justification in Machine Learning : A Survey.
Chun-Hao Chang, Elliot Creager, Anna Goldenberg, and David Duvenaud. 2019. Explaining Image Classifiers by Counterfactual Generation. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019.
Qimin Cheng, Qian Zhang, Peng Fu, Conghuan Tu, and Sen Li. 2018. A survey and analysis on automatic image annotation. Pattern Recognition 79(2018), 242–259.
Finale Doshi-Velez and Been Kim. 2017. Towards A Rigorous Science of Interpretable Machine Learning. arxiv:1702.08608 [stat.ML]
Filip Karlo Dosilovic, Mario Brcic, and Nikica Hlupic. 2018. Explainable artificial intelligence: A survey. In 41st International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2018, Opatija, Croatia, May 21-25, 2018, Karolj Skala, Marko Koricic, Tihana Galinac Grbac, Marina Cicin-Sain, Vlado Sruk, Slobodan Ribaric, Stjepan Gros, Boris Vrdoljak, Mladen Mauher, Edvard Tijan, Predrag Pale, and Matej Janjic (Eds.). IEEE, 210–215.
Mengnan Du, Ninghao Liu, and Xia Hu. 2020. Techniques for interpretable machine learning. Commun. ACM 63, 1 (2020), 68–77.
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. [n. d.]. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results.
Facebook. [n. d.].
Ruth Fong, Mandela Patrick, and Andrea Vedaldi. 2019. Understanding Deep Networks via Extremal Perturbations and Smooth Masks. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019. IEEE, 2950–2958.
Ruth C. Fong and Andrea Vedaldi. 2017. Interpretable Explanations of Black Boxes by Meaningful Perturbation. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017. IEEE Computer Society, 3449–3457.
Amirata Ghorbani, Abubakar Abid, and James Y. Zou. 2019. Interpretation of Neural Networks Is Fragile. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019. AAAI Press, 3681–3688.
Leilani H. Gilpin, David Bau, Ben Z. Yuan, Ayesha Bajwa, Michael Specter, and Lalana Kagal. 2018. Explaining Explanations: An Overview of Interpretability of Machine Learning. In 5th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2018, Turin, Italy, October 1-3, 2018, Francesco Bonchi, Foster J. Provost, Tina Eliassi-Rad, Wei Wang, Ciro Cattuto, and Rayid Ghani (Eds.). IEEE, 80–89.
Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. 2019. A Survey of Methods for Explaining Black Box Models. ACM Comput. Surv. 51, 5 (2019), 93:1–93:42.
Tokutaka Hasegawa and Shun Shiramatsu. 2022. BERT-Based Tagging Method for Social Issues in Web Articles. In Proceedings of Sixth International Congress on Information and Communication Technology, Xin-She Yang, Simon Sherratt, Nilanjan Dey, and Amit Joshi (Eds.). Springer Singapore, Singapore, 897–909.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. IEEE Computer Society, 770–778.
Dan Hendrycks and Kevin Gimpel. 2017. A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings.
Robert R. Hoffman, Shane T. Mueller, Gary Klein, and Jordan Litman. 2018. Metrics for Explainable AI: Challenges and Prospects. CoRR abs/1812.04608(2018). arxiv:1812.04608
Janet HSIAO, Hilary Hei Ting NGAI, Luyu QIU, Yi YANG, and Caleb Chen CAO. 2021. Roadmap for Designing Cognitive Metrics for Explainable Artificial Intelligence (XAI).Artificial Intelligence(2021). STYLE-FILE ERROR
Yen-Chang Hsu, Yilin Shen, Hongxia Jin, and Zsolt Kira. 2020. Generalized ODIN: Detecting Out-of-Distribution Image Without Learning From Out-of-Distribution Data. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020. IEEE, 10948–10957.
Sin-Han Kang, Honggyu Jung, and Seong-Whan Lee. 2019. Interpreting Undesirable Pixels for Image Classification on Black-Box Models. In 2019 IEEE/CVF International Conference on Computer Vision Workshops, ICCV Workshops 2019, Seoul, Korea (South), October 27-28, 2019. IEEE, 4250–4254.
Anna Koufakou and Michael Georgiopoulos. 2010. A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes. Data Min. Knowl. Discov. 20 (03 2010), 259–289.
Xiao-Hui Li, Yuhan Shi, Haoyang Li, Wei Bai, Yuanwei Song, Caleb Chen Cao, and Lei Chen. 2020. Quantitative Evaluations on Saliency Methods: An Experimental Study. CoRR abs/2012.15616(2020). arxiv:2012.15616
Xiao-Hui Li, Caleb Chen Cao, Yuhan Shi, Wei Bai, Han Gao, Luyu Qiu, Cong Wang, Yuanyuan Gao, Shenjia Zhang, Xun Xue, and Lei Chen. 2020. A Survey of Data-driven and Knowledge-aware eXplainable AI. IEEE Transactions on Knowledge and Data Engineering (2020), 1–1.
Shiyu Liang, Yixuan Li, and R. Srikant. 2018. Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings.
Tim Miller. 2019. Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence 267 (2019), 1–38.
Anh Mai Nguyen, Jason Yosinski, and Jeff Clune. 2015. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015. IEEE Computer Society, 427–436.
Vitali Petsiuk, Abir Das, and Kate Saenko. 2018. RISE: Randomized Input Sampling for Explanation of Black-box Models. In British Machine Vision Conference 2018, BMVC 2018, Newcastle, UK, September 3-6, 2018. BMVA Press, 151.
Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. ”Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016, Balaji Krishnapuram, Mohak Shah, Alexander J. Smola, Charu C. Aggarwal, Dou Shen, and Rajeev Rastogi (Eds.). ACM, 1135–1144.
Cynthia Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1, 5 (2019), 206–215.
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Fei-Fei Li. 2015. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 115, 3 (2015), 211–252.
Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2020. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. Int. J. Comput. Vis. 128, 2 (2020), 336–359.
Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Workshop Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.).
Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.).
Dylan Slack, Sophie Hilgard, Emily Jia, Sameer Singh, and Himabindu Lakkaraju. 2020. Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods. In AIES ’20: AAAI/ACM Conference on AI, Ethics, and Society, New York, NY, USA, February 7-8, 2020, Annette N. Markham, Julia Powles, Toby Walsh, and Anne L. Washington (Eds.). ACM, 180–186.
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.).
Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and Understanding Convolutional Networks. In Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I(Lecture Notes in Computer Science, Vol. 8689), David J. Fleet, Tomás Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Springer, 818–833.
Jianming Zhang, Zhe L. Lin, Jonathan Brandt, Xiaohui Shen, and Stan Sclaroff. 2016. Top-Down Neural Attention by Excitation Backprop. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part IV(Lecture Notes in Computer Science, Vol. 9908), Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer, 543–559.
Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. 2016. Learning Deep Features for Discriminative Localization. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. IEEE Computer Society, 2921–2929.

Cited By

View all
  • (2025)A comprehensive study on fidelity metrics for XAIInformation Processing & Management10.1016/j.ipm.2024.10390062:1(103900)Online publication date: Jan-2025
  • (2024)Generating in-distribution proxy graphs for explaining graph neural networksProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692372(7712-7730)Online publication date: 21-Jul-2024
  • (2024)Understanding Human Cognition Through Computational ModelingTopics in Cognitive Science10.1111/tops.1273716:3(349-376)Online publication date: 23-May-2024
  • Show More Cited By

Index Terms

  1. Generating Perturbation-based Explanations with Robustness to Out-of-Distribution Data
        Index terms have been assigned to the content through auto-classification.



        Information & Contributors


        Published In

        cover image ACM Conferences
        WWW '22: Proceedings of the ACM Web Conference 2022
        April 2022
        3764 pages
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]



        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 25 April 2022


        Request permissions for this article.

        Check for updates

        Author Tags

        1. XAI
        2. faithfulness metric
        3. out-of-distribution(OoD)
        4. perturbation-based methods


        • Research-article
        • Research
        • Refereed limited

        Funding Sources

        • the Hong Kong RIF Project
        • Theme-based project
        • Hong Kong ITC ITF grants
        • Guangdong Basic and Applied Basic Research Foundation
        • Didi-HKUST joint research lab
        • National Key Research and Development Program of China Grant
        • the Hong Kong RGC GRF Project
        • the Hong Kong AOE Project
        • the Hong Kong CRF Project
        • Microsoft Research Asia Collaborative Research Grant
        • HKUST-Webank joint research lab grants
        • China NSFC


        WWW '22
        WWW '22: The ACM Web Conference 2022
        April 25 - 29, 2022
        Virtual Event, Lyon, France

        Acceptance Rates

        Overall Acceptance Rate 1,899 of 8,196 submissions, 23%


        Other Metrics

        Bibliometrics & Citations


        Article Metrics

        • Downloads (Last 12 months)156
        • Downloads (Last 6 weeks)17
        Reflects downloads up to 08 Mar 2025

        Other Metrics


        Cited By

        View all
        • (2025)A comprehensive study on fidelity metrics for XAIInformation Processing & Management10.1016/j.ipm.2024.10390062:1(103900)Online publication date: Jan-2025
        • (2024)Generating in-distribution proxy graphs for explaining graph neural networksProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692372(7712-7730)Online publication date: 21-Jul-2024
        • (2024)Understanding Human Cognition Through Computational ModelingTopics in Cognitive Science10.1111/tops.1273716:3(349-376)Online publication date: 23-May-2024
        • (2024)Explainable Artificial Intelligence (XAI) 2.0Information Fusion10.1016/j.inffus.2024.102301106:COnline publication date: 25-Jun-2024
        • (2024)Explainability in AI-based behavioral malware detection systemsComputers and Security10.1016/j.cose.2024.103842141:COnline publication date: 1-Jun-2024
        • (2024)Assessing Fidelity in XAI post-hoc techniques: A Comparative Study with Ground Truth Explanations DatasetsArtificial Intelligence10.1016/j.artint.2024.104179(104179)Online publication date: Jul-2024
        • (2024)DDImage: an image reduction based approach for automatically explaining black-box classifiersEmpirical Software Engineering10.1007/s10664-024-10505-029:5Online publication date: 30-Jul-2024
        • (2023)Formalising the robustness of counterfactual explanations for neural networksProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i12.26740(14901-14909)Online publication date: 7-Feb-2023
        • (2023)The role of explainable AI in the context of the AI ActProceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency10.1145/3593013.3594069(1139-1150)Online publication date: 12-Jun-2023
        • (2023)MixupExplainer: Generalizing Explanations for Graph Neural Networks with Data AugmentationProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599435(3286-3296)Online publication date: 6-Aug-2023
        • Show More Cited By

        View Options

        Login options

        View options


        View or Download as a PDF file.



        View online with eReader.


        HTML Format

        View this article in HTML Format.

        HTML Format






        Share this Publication link

        Share on social media