research-article

Understanding the Cognitive Influences of Interpretability Features on How Users Scrutinize Machine-Predicted Categories

Authors:
Jiaming Qu

School of Information and Library Science, University of North Carolina at Chapel Hill, USA

School of Information and Library Science, University of North Carolina at Chapel Hill, USA

0000-0003-4460-5637
View Profile

,
Jaime Arguello

School of Information and Library Science, University of North Carolina at Chapel Hill, USA

School of Information and Library Science, University of North Carolina at Chapel Hill, USA

0000-0002-7645-0556
View Profile

,
Yue Wang

School of Information and Library Science, University of North Carolina at Chapel Hill, USA

School of Information and Library Science, University of North Carolina at Chapel Hill, USA

0000-0002-0278-2347
View Profile

CHIIR '23: Proceedings of the 2023 Conference on Human Information Interaction and RetrievalMarch 2023Pages 247–257https://doi.org/10.1145/3576840.3578315

Published:20 March 2023Publication History

CHIIR '23: Proceedings of the 2023 Conference on Human Information Interaction and Retrieval

Pages 247–257

ABSTRACT

The goal of interpretable machine learning (ML) is to design tools and visualizations to help users scrutinize a system’s predictions. Prior studies have mostly employed quantitative methods to investigate the effects of specific tools/visualizations on outcomes related to objective performance—a human’s ability to correctly agree or disagree with the system—and subjective perceptions of the system. Few studies have employed qualitative methods to investigate how and why specific tools/visualizations influence performance, perceptions, and behaviors. We report on a lab study (N = 30) that investigated the influences of two interpretability features: confidence values and sentence highlighting. Participants judged whether medical articles belong to a predicted medical topic and were exposed to two interface conditions—one with and one without interpretability features. We investigate the effects of our interpretability features on participants’ performance and perceptions. Additionally, we report on a qualitative analysis of participants’ responses during an exit interview. Specifically, we report on how our interpretability features impacted different cognitive activities that participants engaged with during the task—reading, learning, and decision making. We also describe ways in which the interpretability features introduced challenges and sometimes led participants to make mistakes. Insights gained from our results point to future directions for interpretable ML research.

References

Gagan Bansal, Tongshuang Wu, Joyce Zhou, Raymond Fok, Besmira Nushi, Ece Kamar, Marco Tulio Ribeiro, and Daniel Weld. 2021. Does the whole exceed its parts? the effect of AI explanations on complementary team performance. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–16.Google ScholarDigital Library
David A Broniatowski. 2021. Psychological Foundations of Explainability and Interpretability in Artificial Intelligence. (2021).Google Scholar
John Brooke. 2013. SUS: a retrospective. Journal of usability studies 8, 2 (2013), 29–40.Google ScholarDigital Library
Samuel Carton, Qiaozhu Mei, and Paul Resnick. 2020. Feature-Based Explanations Don’t Help People Detect Misclassifications of Online Toxicity. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 14. 95–106.Google ScholarCross Ref
Jianbo Chen, Le Song, Martin Wainwright, and Michael Jordan. 2018. Learning to Explain: An Information-Theoretic Perspective on Model Interpretation. In Proceedings of the 35th International Conference on Machine Learning(Proceedings of Machine Learning Research). PMLR, 883–892.Google Scholar
Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608(2017).Google Scholar
Gunther Eysenbach and Ch Kohler. 2003. What is the prevalence of health-related searches on the World Wide Web? Qualitative and quantitative analysis of search engine queries on the internet. In AMIA annual symposium proceedings, Vol. 2003. American Medical Informatics Association, 225.Google Scholar
Peter Hase and Mohit Bansal. 2020. Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior?. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 5540–5552.Google ScholarCross Ref
Daniel W Hook, Simon J Porter, and Christian Herzog. 2018. Dimensions: building context for search and evaluation. Frontiers in Research Metrics and Analytics 3 (2018), 23.Google ScholarCross Ref
Harmanpreet Kaur, Harsha Nori, Samuel Jenkins, Rich Caruana, Hanna Wallach, and Jennifer Wortman Vaughan. 2020. Interpreting Interpretability: Understanding Data Scientists’ Use of Interpretability Tools for Machine Learning. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–14.Google ScholarDigital Library
Been Kim, Rajiv Khanna, and Oluwasanmi O Koyejo. 2016. Examples are not enough, learn to criticize! Criticism for Interpretability. In Advances in Neural Information Processing Systems. Curran Associates, Inc.Google Scholar
Pang Wei Koh and Percy Liang. 2017. Understanding black-box predictions via influence functions. In International conference on machine learning. PMLR, 1885–1894.Google Scholar
Vivian Lai, Han Liu, and Chenhao Tan. 2020. " Why is’ Chicago’deceptive?" Towards Building Model-Driven Tutorials for Humans. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–13.Google ScholarDigital Library
Vivian Lai and Chenhao Tan. 2019. On human predictions with explanations and predictions of machine learning models: A case study on deception detection. In Proceedings of the conference on fairness, accountability, and transparency. 29–38.Google ScholarDigital Library
Himabindu Lakkaraju, Stephen H Bach, and Jure Leskovec. 2016. Interpretable decision sets: A joint framework for description and prediction. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1675–1684.Google ScholarDigital Library
Tao Lei, Regina Barzilay, and Tommi Jaakkola. 2016. Rationalizing Neural Predictions. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 107–117.Google ScholarCross Ref
Henry J Lowe and G Octo Barnett. 1994. Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches. JAMA 271, 14 (1994), 1103–1108.Google ScholarCross Ref
Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems 30 (2017).Google Scholar
W James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. 2019. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences 116, 44(2019), 22071–22080.Google ScholarCross Ref
National Library of Medicine. 2021. MEDLINE 2022 Initiative: Transition to Automated Indexing. https://www.nlm.nih.gov/pubs/techbull/nd21/nd21_medline_2022.html. Accessed: 2022-01.Google Scholar
Cecilia Panigutti, Andrea Beretta, Fosca Giannotti, and Dino Pedreschi. 2022. Understanding the Impact of Explanations on Advice-Taking: A User Study for AI-Based Clinical Decision Support Systems. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 568, 9 pages.Google ScholarDigital Library
Jiaming Qu, Jaime Arguello, and Yue Wang. 2021. A Study of Explainability Features to Scrutinize Faceted Filtering Results. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 1498–1507.Google ScholarDigital Library
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why should I trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135–1144.Google ScholarDigital Library
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High-precision model-agnostic explanations. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.Google ScholarCross Ref
Jakob Schoeffer, Niklas Kuehl, and Yvette Machowski. 2022. “There Is Not Enough Information”: On the Effects of Explanations on Perceptions of Informational Fairness and Trustworthiness in Automated Decision-Making(FAccT ’22). Association for Computing Machinery, New York, NY, USA, 1616–1628.Google Scholar
Hendrik Schuff, Alon Jacovi, Heike Adel, Yoav Goldberg, and Ngoc Thang Vu. 2022. Human Interpretation of Saliency-Based Explanation Over Text. In 2022 ACM Conference on Fairness, Accountability, and Transparency(FAccT ’22). Association for Computing Machinery, New York, NY, USA, 611–636.Google Scholar
Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In 2017 IEEE International Conference on Computer Vision (ICCV). 618–626.Google ScholarCross Ref
Aaron Springer and Steve Whittaker. 2020. Progressive disclosure: When, why, and how do users want algorithmic transparency information?ACM Transactions on Interactive Intelligent Systems (TiiS) 10, 4(2020), 1–32.Google Scholar
S Wachter, B Mittelstadt, and C Russell. 2018. Counterfactual explanations without opening the black box: automated decisions and the GDPR. Harvard Journal of Law and Technology 31, 2 (2018), 841–887.Google Scholar
Huaiyu Wan, Yutao Zhang, Jing Zhang, and Jie Tang. 2019. Aminer: Search and mining of academic social networks. Data Intelligence 1, 1 (2019), 58–76.Google ScholarCross Ref
Xinru Wang and Ming Yin. 2021. Are Explanations Helpful? A Comparative Study of the Effects of Explanations in AI-Assisted Decision-Making. In 26th International Conference on Intelligent User Interfaces. 318–328.Google ScholarDigital Library
Hilde JP Weerts, Werner van Ipenburg, and Mykola Pechenizkiy. 2019. A human-grounded evaluation of SHAP for alert processing. arXiv preprint arXiv:1907.03324(2019).Google Scholar
Peace Ossom Williamson and Christian IJ Minter. 2019. Exploring PubMed as a reliable resource for scholarly communications services. Journal of the Medical Library Association: JMLA 107, 1 (2019), 16.Google Scholar
Linyi Yang, Eoin M Kenny, Tin Lok James Ng, Yi Yang, Barry Smyth, and Ruihai Dong. 2020. Generating plausible counterfactual explanations for deep transformers in financial text classification. arXiv preprint arXiv:2010.12512(2020).Google Scholar
Muhammad Rehman Zafar and Naimul Mefraz Khan. 2019. DLIME: A Deterministic Local Interpretable Model-Agnostic Explanations Approach for Computer-Aided Diagnosis Systems. In ACM SIGKDD Workshop on Explainable AI/ML (XAI) for Accountability, Fairness, and Transparency. 6.Google Scholar

Index Terms

Understanding the Cognitive Influences of Interpretability Features on How Users Scrutinize Machine-Predicted Categories
1. Applied computing
  1. Document management and text processing
    1. Document preparation
      1. Annotation
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI design and evaluation methods
      1. User studies
  2. Interaction design
    1. Empirical studies in interaction design

Recommendations

Understanding the micronote lifecycle: improving mobile support for informal note taking
CHI '04: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

People frequently write messages to themselves. These informal, hurried personal jottings serve as temporary storage for notable information as well as reminders for future action. Many mobile technologies have been designed specifically to support this ...
Read More
The effect of chair type on users' viewing experience for 360-degree video
VRST '18: Proceedings of the 24th ACM Symposium on Virtual Reality Software and Technology

The consumption of 360-degree videos with head-mounted displays (HMDs) is increasing rapidly. A large number of HMD users watch 360-degree videos at home, often on non-swivel seats; however videos are frequently designed to require the user to turn ...
Read More
Using Photo Diaries to Elicit User Requirements from Older Adults: A Case Study on Mobility Barriers
Human-Computer Interaction – INTERACT 2015
Abstract
Older adults encounter numerous barriers to mobility, many of which are in the built environment. Technological solutions may enable them to mitigate these barriers and promote physical activity. To design appropriate technological solutions, it ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHIIR '23: Proceedings of the 2023 Conference on Human Information Interaction and Retrieval
March 2023
520 pages
ISBN:9798400700354
DOI:10.1145/3576840
Editors:
Jacek Gwizdka
School of Information, The University of Texas at Austin, Texas, USA
,
Soo Young Rieh
School of Information, The University of Texas at Austin, Texas, USA
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 March 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Cognitive Activities
Interpretable Machine Learning
User Study
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate55of163submissions,34%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 83
  Total Downloads
- Downloads (Last 12 months)53
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Understanding the Cognitive Influences of Interpretability Features on How Users Scrutinize Machine-Predicted Categories

CHIIR '23: Proceedings of the 2023 Conference on Human Information Interaction and Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Understanding the micronote lifecycle: improving mobile support for informal note taking

The effect of chair type on users' viewing experience for 360-degree video

Using Photo Diaries to Elicit User Requirements from Older Adults: A Case Study on Mobility Barriers