ABSTRACT
The goal of interpretable machine learning (ML) is to design tools and visualizations to help users scrutinize a system’s predictions. Prior studies have mostly employed quantitative methods to investigate the effects of specific tools/visualizations on outcomes related to objective performance—a human’s ability to correctly agree or disagree with the system—and subjective perceptions of the system. Few studies have employed qualitative methods to investigate how and why specific tools/visualizations influence performance, perceptions, and behaviors. We report on a lab study (N = 30) that investigated the influences of two interpretability features: confidence values and sentence highlighting. Participants judged whether medical articles belong to a predicted medical topic and were exposed to two interface conditions—one with and one without interpretability features. We investigate the effects of our interpretability features on participants’ performance and perceptions. Additionally, we report on a qualitative analysis of participants’ responses during an exit interview. Specifically, we report on how our interpretability features impacted different cognitive activities that participants engaged with during the task—reading, learning, and decision making. We also describe ways in which the interpretability features introduced challenges and sometimes led participants to make mistakes. Insights gained from our results point to future directions for interpretable ML research.
- Gagan Bansal, Tongshuang Wu, Joyce Zhou, Raymond Fok, Besmira Nushi, Ece Kamar, Marco Tulio Ribeiro, and Daniel Weld. 2021. Does the whole exceed its parts? the effect of AI explanations on complementary team performance. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–16.Google ScholarDigital Library
- David A Broniatowski. 2021. Psychological Foundations of Explainability and Interpretability in Artificial Intelligence. (2021).Google Scholar
- John Brooke. 2013. SUS: a retrospective. Journal of usability studies 8, 2 (2013), 29–40.Google ScholarDigital Library
- Samuel Carton, Qiaozhu Mei, and Paul Resnick. 2020. Feature-Based Explanations Don’t Help People Detect Misclassifications of Online Toxicity. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 14. 95–106.Google ScholarCross Ref
- Jianbo Chen, Le Song, Martin Wainwright, and Michael Jordan. 2018. Learning to Explain: An Information-Theoretic Perspective on Model Interpretation. In Proceedings of the 35th International Conference on Machine Learning(Proceedings of Machine Learning Research). PMLR, 883–892.Google Scholar
- Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608(2017).Google Scholar
- Gunther Eysenbach and Ch Kohler. 2003. What is the prevalence of health-related searches on the World Wide Web? Qualitative and quantitative analysis of search engine queries on the internet. In AMIA annual symposium proceedings, Vol. 2003. American Medical Informatics Association, 225.Google Scholar
- Peter Hase and Mohit Bansal. 2020. Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior?. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 5540–5552.Google ScholarCross Ref
- Daniel W Hook, Simon J Porter, and Christian Herzog. 2018. Dimensions: building context for search and evaluation. Frontiers in Research Metrics and Analytics 3 (2018), 23.Google ScholarCross Ref
- Harmanpreet Kaur, Harsha Nori, Samuel Jenkins, Rich Caruana, Hanna Wallach, and Jennifer Wortman Vaughan. 2020. Interpreting Interpretability: Understanding Data Scientists’ Use of Interpretability Tools for Machine Learning. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–14.Google ScholarDigital Library
- Been Kim, Rajiv Khanna, and Oluwasanmi O Koyejo. 2016. Examples are not enough, learn to criticize! Criticism for Interpretability. In Advances in Neural Information Processing Systems. Curran Associates, Inc.Google Scholar
- Pang Wei Koh and Percy Liang. 2017. Understanding black-box predictions via influence functions. In International conference on machine learning. PMLR, 1885–1894.Google Scholar
- Vivian Lai, Han Liu, and Chenhao Tan. 2020. " Why is’ Chicago’deceptive?" Towards Building Model-Driven Tutorials for Humans. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–13.Google ScholarDigital Library
- Vivian Lai and Chenhao Tan. 2019. On human predictions with explanations and predictions of machine learning models: A case study on deception detection. In Proceedings of the conference on fairness, accountability, and transparency. 29–38.Google ScholarDigital Library
- Himabindu Lakkaraju, Stephen H Bach, and Jure Leskovec. 2016. Interpretable decision sets: A joint framework for description and prediction. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1675–1684.Google ScholarDigital Library
- Tao Lei, Regina Barzilay, and Tommi Jaakkola. 2016. Rationalizing Neural Predictions. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 107–117.Google ScholarCross Ref
- Henry J Lowe and G Octo Barnett. 1994. Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches. JAMA 271, 14 (1994), 1103–1108.Google ScholarCross Ref
- Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems 30 (2017).Google Scholar
- W James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. 2019. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences 116, 44(2019), 22071–22080.Google ScholarCross Ref
- National Library of Medicine. 2021. MEDLINE 2022 Initiative: Transition to Automated Indexing. https://www.nlm.nih.gov/pubs/techbull/nd21/nd21_medline_2022.html. Accessed: 2022-01.Google Scholar
- Cecilia Panigutti, Andrea Beretta, Fosca Giannotti, and Dino Pedreschi. 2022. Understanding the Impact of Explanations on Advice-Taking: A User Study for AI-Based Clinical Decision Support Systems. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 568, 9 pages.Google ScholarDigital Library
- Jiaming Qu, Jaime Arguello, and Yue Wang. 2021. A Study of Explainability Features to Scrutinize Faceted Filtering Results. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 1498–1507.Google ScholarDigital Library
- Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why should I trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135–1144.Google ScholarDigital Library
- Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High-precision model-agnostic explanations. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.Google ScholarCross Ref
- Jakob Schoeffer, Niklas Kuehl, and Yvette Machowski. 2022. “There Is Not Enough Information”: On the Effects of Explanations on Perceptions of Informational Fairness and Trustworthiness in Automated Decision-Making(FAccT ’22). Association for Computing Machinery, New York, NY, USA, 1616–1628.Google Scholar
- Hendrik Schuff, Alon Jacovi, Heike Adel, Yoav Goldberg, and Ngoc Thang Vu. 2022. Human Interpretation of Saliency-Based Explanation Over Text. In 2022 ACM Conference on Fairness, Accountability, and Transparency(FAccT ’22). Association for Computing Machinery, New York, NY, USA, 611–636.Google Scholar
- Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In 2017 IEEE International Conference on Computer Vision (ICCV). 618–626.Google ScholarCross Ref
- Aaron Springer and Steve Whittaker. 2020. Progressive disclosure: When, why, and how do users want algorithmic transparency information?ACM Transactions on Interactive Intelligent Systems (TiiS) 10, 4(2020), 1–32.Google Scholar
- S Wachter, B Mittelstadt, and C Russell. 2018. Counterfactual explanations without opening the black box: automated decisions and the GDPR. Harvard Journal of Law and Technology 31, 2 (2018), 841–887.Google Scholar
- Huaiyu Wan, Yutao Zhang, Jing Zhang, and Jie Tang. 2019. Aminer: Search and mining of academic social networks. Data Intelligence 1, 1 (2019), 58–76.Google ScholarCross Ref
- Xinru Wang and Ming Yin. 2021. Are Explanations Helpful? A Comparative Study of the Effects of Explanations in AI-Assisted Decision-Making. In 26th International Conference on Intelligent User Interfaces. 318–328.Google ScholarDigital Library
- Hilde JP Weerts, Werner van Ipenburg, and Mykola Pechenizkiy. 2019. A human-grounded evaluation of SHAP for alert processing. arXiv preprint arXiv:1907.03324(2019).Google Scholar
- Peace Ossom Williamson and Christian IJ Minter. 2019. Exploring PubMed as a reliable resource for scholarly communications services. Journal of the Medical Library Association: JMLA 107, 1 (2019), 16.Google Scholar
- Linyi Yang, Eoin M Kenny, Tin Lok James Ng, Yi Yang, Barry Smyth, and Ruihai Dong. 2020. Generating plausible counterfactual explanations for deep transformers in financial text classification. arXiv preprint arXiv:2010.12512(2020).Google Scholar
- Muhammad Rehman Zafar and Naimul Mefraz Khan. 2019. DLIME: A Deterministic Local Interpretable Model-Agnostic Explanations Approach for Computer-Aided Diagnosis Systems. In ACM SIGKDD Workshop on Explainable AI/ML (XAI) for Accountability, Fairness, and Transparency. 6.Google Scholar
Index Terms
- Understanding the Cognitive Influences of Interpretability Features on How Users Scrutinize Machine-Predicted Categories
Recommendations
Understanding the micronote lifecycle: improving mobile support for informal note taking
CHI '04: Proceedings of the SIGCHI Conference on Human Factors in Computing SystemsPeople frequently write messages to themselves. These informal, hurried personal jottings serve as temporary storage for notable information as well as reminders for future action. Many mobile technologies have been designed specifically to support this ...
The effect of chair type on users' viewing experience for 360-degree video
VRST '18: Proceedings of the 24th ACM Symposium on Virtual Reality Software and TechnologyThe consumption of 360-degree videos with head-mounted displays (HMDs) is increasing rapidly. A large number of HMD users watch 360-degree videos at home, often on non-swivel seats; however videos are frequently designed to require the user to turn ...
Using Photo Diaries to Elicit User Requirements from Older Adults: A Case Study on Mobility Barriers
Human-Computer Interaction – INTERACT 2015AbstractOlder adults encounter numerous barriers to mobility, many of which are in the built environment. Technological solutions may enable them to mitigate these barriers and promote physical activity. To design appropriate technological solutions, it ...
Comments