ABSTRACT
Model interpretation is increasingly important for successful model development and deployment. In recent years, many explanation methods are introduced to help humans understand how a machine learning model makes a decision on a specific instance. Recent studies show that contextualizing an individual model decision within a set of relevant examples can improve the model understanding. However, there is a lack of systematic study on what factors are considered when generating and using the context examples to explain model predictions, and how context examples help with model understanding and debugging in practice. In this work, we first identify a taxonomy of context generation and summarization through literature review. We then present Context Sight, a visual analytics system that integrates customized context generation and multiple-level context summarization to assist context exploration and interpretation. We evaluate the usefulness of the system through a detailed use case. This work is an initial step for a set of systematic research on how contextualization can help data scientists and practitioners understand and diagnose model behaviors, based on which we will gain a better understanding of the usage of context.
- Bilal Alsallakh, Allan Hanbury, Helwig Hauser, Silvia Miksch, and Andreas Rauber. 2014. Visual methods for analyzing probabilistic classification data. IEEE transactions on visualization and computer graphics 20, 12 (2014), 1703--1712.Google ScholarCross Ref
- Naomi S Altman. 1992. An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician 46, 3 (1992), 175--185.Google ScholarCross Ref
- Daniel W Apley and Jingyu Zhu. 2020. Visualizing the effects of predictor variables in black box supervised learning models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82, 4 (2020), 1059--1086.Google ScholarCross Ref
- Victoria Bellotti and Keith Edwards. 2001. Intelligibility and accountability: human considerations in context-aware systems. Human-Computer Interaction 16, 2--4 (2001), 193--212.Google ScholarDigital Library
- Alsallakh Bilal, Amin Jourabloo, Mao Ye, Xiaoming Liu, and Liu Ren. 2017. Do convolutional neural networks learn class hierarchy? IEEE transactions on visualization and computer graphics 24, 1 (2017), 152--162.Google Scholar
- Clara Bove, Jonathan Aigrain, Marie-Jeanne Lesot, Charles Tijus, and Marcin Detyniecki. 2022. Contextualization and Exploration of Local Feature Importance Explanations to Improve Understanding and Satisfaction of Non-Expert Users. In 27th International Conference on Intelligent User Interfaces. 807--819.Google ScholarDigital Library
- Daniel Bruckner. 2014. Ml-o-scope: a diagnostic visualization system for deep machine learning pipelines. Technical Report. CALIFORNIA UNIV BERKELEY DEPT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES.Google Scholar
- Ángel Alexander Cabrera, Will Epperson, Fred Hohman, Minsuk Kahng, Jamie Morgenstern, and Duen Horng Chau. 2019. FairVis: Visual analytics for discovering intersectional bias in machine learning. In 2019 IEEE Conference on Visual Analytics Science and Technology (VAST). IEEE, 46--56.Google ScholarCross Ref
- Carrie J Cai, Jonas Jongejan, and Jess Holbrook. 2019. The effects of example-based explanations in a machine learning interface. In Proceedings of the 24th international conference on intelligent user interfaces. 258--262.Google ScholarDigital Library
- Furui Cheng, Yao Ming, and Huamin Qu. 2020. Dece: Decision explorer with counterfactual explanations for machine learning models. IEEE Transactions on Visualization and Computer Graphics 27, 2 (2020), 1438--1447.Google ScholarCross Ref
- Amit Dhurandhar, Pin-Yu Chen, Ronny Luss, Chun-Chen Tu, Paishun Ting, Karthikeyan Shanmugam, and Payel Das. 2018. Explanations based on the missing: Towards contrastive explanations with pertinent negatives. Advances in neural information processing systems 31 (2018).Google Scholar
- FICO. 2018. Explainable Machine Learning Challenge. https://community.fico.com/s/explainable-machine-learning-challenge?tabset-3158a=2.Google Scholar
- Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232.Google Scholar
- Matt W Gardner and SR Dorling. 1998. Artificial neural networks (the multilayer perceptron)---a review of applications in the atmospheric sciences. Atmospheric environment 32, 14--15 (1998), 2627--2636.Google Scholar
- Karan Goel, Nazneen Rajani, Jesse Vig, Samson Tan, Jason Wu, Stephan Zheng, Caiming Xiong, Mohit Bansal, and Christopher Ré. 2021. Robustness gym: Unifying the nlp evaluation landscape. arXiv preprint arXiv:2101.04840 (2021).Google Scholar
- Oscar Gomez, Steffen Holter, Jun Yuan, and Enrico Bertini. 2020. Vice: Visual counterfactual explanations for machine learning models. In Proceedings of the 25th International Conference on Intelligent User Interfaces. 531--535.Google ScholarDigital Library
- Liang Gou, Lincan Zou, Nanxiang Li, Michael Hofmann, Arvind Kumar Shekar, Axel Wendt, and Liu Ren. 2020. VATLD: a visual analytics system to assess, understand and improve traffic light detection. IEEE transactions on visualization and computer graphics 27, 2 (2020), 261--271.Google Scholar
- Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Dino Pedreschi, Franco Turini, and Fosca Giannotti. 2018. Local rule-based explanations of black box decision systems. arXiv preprint arXiv:1805.10820 (2018).Google Scholar
- Zhihua Jin, Yong Wang, Qianwen Wang, Yao Ming, Tengfei Ma, and Huamin Qu. 2020. GNNLens: A Visual Analytics Approach for Prediction Error Diagnosis of Graph Neural Networks. arXiv preprint arXiv:2011.11048 (2020).Google Scholar
- Been Kim, Cynthia Rudin, and Julie A Shah. 2014. The bayesian case model: A generative approach for case-based reasoning and prototype classification. Advances in neural information processing systems 27 (2014).Google Scholar
- Josua Krause, Aritra Dasgupta, Jordan Swartz, Yindalon Aphinyanaphongs, and Enrico Bertini. 2017. A workflow for visual diagnostics of binary classifiers using instance-level explanations. In 2017 IEEE Conference on Visual Analytics Science and Technology (VAST). IEEE, 162--172.Google ScholarCross Ref
- Q Vera Liao, Daniel Gruen, and Sarah Miller. 2020. Questioning the AI: informing design practices for explainable AI user experiences. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1--15.Google ScholarDigital Library
- Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems 30 (2017).Google Scholar
- Kyle Martin, Anne Liret, Nirmalie Wiratunga, Gilbert Owusu, and Mathias Kern. 2019. Developing a catalogue of explainability methods to support expert and non-expert users. In International Conference on Innovative Techniques and Applications of Artificial Intelligence. Springer, 309--324.Google ScholarDigital Library
- Microsoft. 2021. microsoft/responsible-ai-widgets. https://github.com/microsoft/responsible-ai-widgets. Accessed: 2021-10-08.Google Scholar
- Yao Ming, Panpan Xu, Huamin Qu, and Liu Ren. 2019. Interpretable and steer-able sequence learning via prototypes. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 903--913.Google ScholarDigital Library
- Christoph Molnar. 2022. Interpretable Machine Learning (2 ed.). https://christophm.github.io/interpretable-ml-bookGoogle Scholar
- Ramaravind K Mothilal, Amit Sharma, and Chenhao Tan. 2020. Explaining machine learning classifiers through diverse counterfactual explanations. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 607--617.Google ScholarDigital Library
- Fabio Mendoza Palechor and Alexis de la Hoz Manotas. 2019. Dataset for estimation of obesity levels based on eating habits and physical condition in individuals from Colombia, Peru and Mexico. Data in brief 25 (2019), 104344.Google ScholarCross Ref
- Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135--1144.Google ScholarDigital Library
- Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High-precision model-agnostic explanations. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.Google ScholarCross Ref
- Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin, and Sameer Singh. 2020. Beyond accuracy: Behavioral testing of NLP models with CheckList. arXiv preprint arXiv:2005.04118 (2020).Google Scholar
- Avanti Shrikumar, Peyton Greenside, Anna Shcherbina, and Anshul Kundaje. 2016. Not just a black box: Learning important features through propagating activation differences. arXiv preprint arXiv:1605.01713 (2016).Google Scholar
- Ian Tenney, James Wexler, Jasmijn Bastings, Tolga Bolukbasi, Andy Coenen, Sebastian Gehrmann, Ellen Jiang, Mahima Pushkarna, Carey Radebaugh, Emily Reif, et al. 2020. The language interpretability tool: Extensible, interactive visualizations and analysis for NLP models. arXiv preprint arXiv:2008.05122 (2020).Google Scholar
- Patrick Van Esch, J Stewart Black, and Joseph Ferolie. 2019. Marketing AI recruitment: The next phase in job application and selection. Computers in Human Behavior 90 (2019), 215--222.Google ScholarCross Ref
- Sahil Verma, John Dickerson, and Keegan Hines. 2020. Counterfactual explanations for machine learning: A review. arXiv preprint arXiv:2010.10596 (2020).Google Scholar
- Sandra Wachter, Brent Mittelstadt, and Chris Russell. 2017. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv. JL & Tech. 31 (2017), 841.Google Scholar
- Danding Wang, Qian Yang, Ashraf Abdul, and Brian Y Lim. 2019. Designing theory-driven user-centric explainable AI. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1--15.Google ScholarDigital Library
- Junpeng Wang, Liang Gou, Wei Zhang, Hao Yang, and Han-Wei Shen. 2019. Deep-vid: Deep visual interpretation and diagnosis for image classifiers via knowledge distillation. IEEE transactions on visualization and computer graphics 25, 6 (2019), 2168--2180.Google Scholar
- Tongshuang Wu, Marco Tulio Ribeiro, Jeffrey Heer, and Daniel Weld. 2019. Errudite: Scalable, reproducible, and testable error analysis. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.Google ScholarCross Ref
- Tongshuang Wu, Marco Tulio Ribeiro, Jeffrey Heer, and Daniel S Weld. 2021. Polyjuice: Generating counterfactuals for explaining, evaluating, and improving models. arXiv preprint arXiv:2101.00288 (2021).Google Scholar
- Jun Yuan, Jesse Vig, and Nazneen Rajani. 2022. iSEA: An Interactive Pipeline for Semantic Error Analysis of NLP Models. In 27th International Conference on Intelligent User Interfaces. 878--888.Google ScholarDigital Library
- Jiawei Zhang, Yang Wang, Piero Molino, Lezhi Li, and David S Ebert. 2018. Manifold: A model-agnostic framework for interpretation and diagnosis of machine learning models. IEEE transactions on visualization and computer graphics 25, 1 (2018), 364--373.Google Scholar
Index Terms
- Context sight: model understanding and debugging via interpretable context
Recommendations
A Framework and Guidelines for Context-Specific Theorizing in Information Systems Research
This paper discusses the value of context in theory development in information systems (IS) research. We examine how prior research has incorporated context in theorizing and develop a framework to classify existing approaches to contextualization. In ...
Visual contextualization and activity monitoring for networked telepresence
ETP '04: Proceedings of the 2004 ACM SIGMM workshop on Effective telepresenceThe context of an environment is denied by a complex interrelationship between past, present, and future events and of the events' surroundings. While the present provides a current and immediate interpretation of our surroundings and enables immediate ...
Context Reasoning for Predicting Gestational Diabetes Mellitus Using CA-RETE Algorithm
Today, IoT-related applications play an important role in scientific world development. Context reasoning emphasizes the perception of various contexts by means of collection of IoT data which includes context-aware decision making. Context-aware ...
Comments