research-article

Open Access

Human Reliance on Machine Learning Models When Performance Feedback is Limited: Heuristics and Risks

Authors:
Zhuoran Lu

Purdue University, United States

Purdue University, United States
View Profile

,
Ming Yin

Purdue University, United States

Purdue University, United States
View Profile

CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing SystemsMay 2021Article No.: 78Pages 1–16https://doi.org/10.1145/3411764.3445562

Published:07 May 2021Publication History

CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

Pages 1–16

ABSTRACT

This paper addresses an under-explored problem of AI-assisted decision-making: when objective performance information of the machine learning model underlying a decision aid is absent or scarce, how do people decide their reliance on the model? Through three randomized experiments, we explore the heuristics people may use to adjust their reliance on machine learning models when performance feedback is limited. We find that the level of agreement between people and a model on decision-making tasks that people have high confidence in significantly affects reliance on the model if people receive no information about the model’s performance, but this impact will change after aggregate-level model performance information becomes available. Furthermore, the influence of high confidence human-model agreement on people’s reliance on a model is moderated by people’s confidence in cases where they disagree with the model. We discuss potential risks of these heuristics, and provide design implications on promoting appropriate reliance on AI.

Supplemental Material

Available for Download

zip

Supplementary Materials (784.9 KB)

References

Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N Bennett, Kori Inkpen, 2019. Guidelines for human-ai interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–13.Google ScholarDigital Library
Ricardo Baeza-Yates. 2018. Bias on the web. Commun. ACM 61, 6 (2018), 54–61.Google ScholarDigital Library
Gagan Bansal, Besmira Nushi, Ece Kamar, Walter S Lasecki, Daniel S Weld, and Eric Horvitz. 2019. Beyond Accuracy: The Role of Mental Models in Human-AI Team Performance. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 7. 2–11.Google ScholarCross Ref
Gagan Bansal, Besmira Nushi, Ece Kamar, Daniel S Weld, Walter S Lasecki, and Eric Horvitz. 2019. Updates in human-ai teams: Understanding and addressing the performance/compatibility tradeoff. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 2429–2437.Google ScholarDigital Library
Gagan Bansal, Tongshuang Wu, Joyce Zhu, Raymond Fok, Besmira Nushi, Ece Kamar, Marco Tulio Ribeiro, and Daniel S Weld. 2020. Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance. (2020). arxiv:2006.14779Google Scholar
Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency. 77–91.Google Scholar
Carrie J Cai, Jonas Jongejan, and Jess Holbrook. 2019. The effects of example-based explanations in a machine learning interface. In Proceedings of the 24th International Conference on Intelligent User Interfaces. 258–262.Google ScholarDigital Library
Eric T Chancey, James P Bliss, Yusuke Yamani, and Holly AH Handley. 2017. Trust and the compliance–reliance paradigm: The effects of risk, error bias, and reliability on trust and dependence. Human factors 59, 3 (2017), 333–345.Google Scholar
Hao-Fei Cheng, Ruotong Wang, Zheng Zhang, Fiona O’Connell, Terrance Gray, F Maxwell Harper, and Haiyi Zhu. 2019. Explaining decision-making algorithms through UI: Strategies to help non-expert stakeholders. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–12.Google ScholarDigital Library
Stephen R Dixon, Christopher D Wickens, and Jason S McCarley. 2007. On the independence of compliance and reliance: Are automation false alarms worse than misses?Human factors 49, 4 (2007), 564–572.Google Scholar
David Dunning. 2014. We are all confident idiots. Pacific Standard 7(2014), 46–54.Google Scholar
Mary T Dzindolet, Scott A Peterson, Regina A Pomranky, Linda G Pierce, and Hall P Beck. 2003. The role of trust in automation reliance. International journal of human-computer studies 58, 6 (2003), 697–718.Google ScholarDigital Library
Mary T Dzindolet, Linda G Pierce, Hall P Beck, and Lloyd A Dawe. 1999. Misuse and disuse of automated aids. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Vol. 43. SAGE Publications Sage CA: Los Angeles, CA, 339–339.Google ScholarCross Ref
Philipp Ecken and Richard Pibernik. 2016. Hit or miss: what leads experts to take advice for long-term judgments?Management Science 62, 7 (2016), 2002–2021.Google Scholar
Andre Esteva, Brett Kuprel, Roberto A Novoa, Justin Ko, Susan M Swetter, Helen M Blau, and Sebastian Thrun. 2017. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 7639 (2017), 115–118.Google Scholar
Raymond Fisman, Sheena S Iyengar, Emir Kamenica, and Itamar Simonson. 2006. Gender differences in mate selection: Evidence from a speed dating experiment. The Quarterly Journal of Economics 121, 2 (2006), 673–697.Google ScholarCross Ref
SM Fleming and ND Daw. 2016. Self-evaluation of decision performance: A general Bayesian framework for metacognitive computation. Psychol Rev 124(2016), 1–59.Google Scholar
Jorge Galindo and Pablo Tamayo. 2000. Credit risk assessment using statistical and machine learning: basic methodology and risk modeling applications. Computational Economics 15, 1-2 (2000), 107–143.Google ScholarDigital Library
Ji Gao and John D Lee. 2006. Extending the decision field theory to model operators’ reliance on automation in supervisory control situations. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 36, 5 (2006), 943–959.Google ScholarDigital Library
Ji Gao, John D Lee, and Yi Zhang. 2006. A dynamic model of interaction between reliance on automation and cooperation in multi-operator multi-automation situations. International Journal of Industrial Ergonomics 36, 5(2006), 511–526.Google ScholarCross Ref
Efstathios D Gennatas, Jerome H Friedman, Lyle H Ungar, Romain Pirracchio, Eric Eaton, Lara G Reichmann, Yannet Interian, José Marcio Luna, Charles B Simone, Andrew Auerbach, 2020. Expert-augmented machine learning. Proceedings of the National Academy of Sciences 117, 9 (2020), 4571–4577.Google ScholarCross Ref
Ben Green and Yiling Chen. 2019. The principles and limits of algorithm-in-the-loop decision making. Proceedings of the ACM on Human-Computer Interaction 3, CSCW(2019), 1–24.Google ScholarDigital Library
Dale W Griffin and Lee Ross. 1991. Subjective construal, social inference, and human misunderstanding. In Advances in experimental social psychology. Vol. 24. Elsevier, 319–359.Google Scholar
Christoph Hube, Besnik Fetahu, and Ujwal Gadiraju. 2019. Understanding and mitigating worker biases in the crowdsourced collection of subjective judgments. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–12.Google ScholarDigital Library
Sarfaraz Hussein, Kunlin Cao, Qi Song, and Ulas Bagci. 2017. Risk Stratification of Lung Nodules Using 3D CNN-Based Multi-task Learning. In Information Processing in Medical Imaging, Marc Niethammer, Martin Styner, Stephen Aylward, Hongtu Zhu, Ipek Oguz, Pew-Thian Yap, and Dinggang Shen(Eds.). Springer International Publishing, Cham, 249–260.Google Scholar
H Kaur, A Williams, and WS Lasecki. 2019. Building shared mental models between humans and ai for effective collaboration. (2019).Google Scholar
Yea-Seul Kim, Paula Kayongo, Madeleine Grunde-McLaughlin, and Jessica Hullman. 2020. Bayesian-Assisted Inference from Visualized Data. (2020). arxiv:2008.00142Google Scholar
Gang Kou, Yi Peng, and Guoxun Wang. 2014. Evaluation of clustering algorithms for financial risk analysis using MCDM methods. Information Sciences 275(2014), 1–12.Google ScholarCross Ref
Justin Kruger and David Dunning. 1999. Unskilled and unaware of it: how difficulties in recognizing one’s own incompetence lead to inflated self-assessments.Journal of personality and social psychology 77, 6(1999), 1121.Google Scholar
Vivian Lai and Chenhao Tan. 2019. On human predictions with explanations and predictions of machine learning models: A case study on deception detection. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 29–38.Google ScholarDigital Library
John D Lee and Katrina A See. 2004. Trust in automation: Designing for appropriate reliance. Human factors 46, 1 (2004), 50–80.Google Scholar
Moshe Leshno, Vladimir Ya Lin, Allan Pinkus, and Shimon Schocken. 1993. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural networks 6, 6 (1993), 861–867.Google Scholar
Varda Liberman, Julia A Minson, Christopher J Bryan, and Lee Ross. 2012. Naïve realism and capturing the “wisdom of dyads”. Journal of Experimental Social Psychology 48, 2 (2012), 507–512.Google ScholarCross Ref
Duri Long and Brian Magerko. 2020. What is AI Literacy? Competencies and Design Considerations. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–16.Google ScholarDigital Library
Maria Madsen and Shirley Gregor. 2000. Measuring human-computer trust. In 11th australasian conference on information systems, Vol. 53. Citeseer, 6–8.Google Scholar
Gary Marks and Norman Miller. 1987. Ten years of research on the false-consensus effect: An empirical and theoretical review.Psychological bulletin 102, 1 (1987), 72.Google Scholar
Charles Marx, Flavio Calmon, and Berk Ustun. 2020. Predictive multiplicity in classification. In International Conference on Machine Learning. PMLR, 6765–6774.Google Scholar
Stephanie M Merritt. 2011. Affective processes in human–automation interactions. Human Factors 53, 4 (2011), 356–370.Google ScholarCross Ref
Julia A Minson, Varda Liberman, and Lee Ross. 2011. Two to tango: Effects of collaboration and disagreement on dyadic judgment. Personality and Social Psychology Bulletin 37, 10 (2011), 1325–1338.Google ScholarCross Ref
Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency. 220–229.Google ScholarDigital Library
Mahsan Nourani, Samia Kabir, Sina Mohseni, and Eric D Ragan. 2019. The Effects of Meaningful and Meaningless Explanations on Trust and Perceived System Accuracy in Intelligent Systems. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 7. 97–105.Google ScholarCross Ref
Besmira Nushi, Ece Kamar, and E. Horvitz. 2018. Towards Accountable AI: Hybrid Human-Machine Analyses for Characterizing System Failure. In HCOMP.Google Scholar
Raja Parasuraman and Victor Riley. 1997. Humans and automation: Use, misuse, disuse, abuse. Human factors 39, 2 (1997), 230–253.Google Scholar
Raja Parasuraman and Christopher D Wickens. 2008. Humans: Still vital after all these years of automation. Human factors 50, 3 (2008), 511–520.Google Scholar
Niccolo Pescetelli, Geraint Rees, and Bahador Bahrami. 2016. The perceptual and social components of metacognition.Journal of Experimental Psychology: General 145, 8 (2016), 949.Google Scholar
Niccolo Pescetelli and Nick Yeung. 2018. The role of decision confidence in advice-taking and trust formation. (2018). arxiv:1809.10453Google Scholar
Forough Poursabzi-Sangdeh, Daniel G Goldstein, Jake M Hofman, Jennifer Wortman Vaughan, and Hanna Wallach. 2018. Manipulating and measuring model interpretability. (2018). arxiv:1802.07810Google Scholar
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. ” Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135–1144.Google ScholarDigital Library
Victor Riley. 1996. Operator reliance on automation: Theory and data. Automation and human performance: Theory and applications (1996), 19–35.Google Scholar
Lee Ross, David Greene, and Pamela House. 1977. The “false consensus effect”: An egocentric bias in social perception and attribution processes. Journal of experimental social psychology 13, 3 (1977), 279–301.Google ScholarCross Ref
Julian Sanchez, Arthur D Fisk, and Wendy A Rogers. 2004. Reliability and age-related effects on trust and reliance of a decision support aid. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Vol. 48. Sage Publications Sage CA: Los Angeles, CA, 586–589.Google ScholarCross Ref
Julian Sanchez, Wendy A Rogers, Arthur D Fisk, and Ericka Rovira. 2014. Understanding reliance on automation: effects of error type, error distribution, age and experience. Theoretical issues in ergonomics science 15, 2 (2014), 134–160.Google Scholar
James Schaffer, John O’Donovan, James Michaelis, Adrienne Raglin, and Tobias Höllerer. 2019. I can do better than your AI: expertise and explanations. In Proceedings of the 24th International Conference on Intelligent User Interfaces. 240–251.Google ScholarDigital Library
Kelly E See, Elizabeth W Morrison, Naomi B Rothman, and Jack B Soll. 2011. The detrimental effects of power on confidence, advice taking, and accuracy. Organizational behavior and human decision processes 116, 2(2011), 272–285.Google Scholar
Keng Siau and Weiyu Wang. 2018. Building trust in artificial intelligence, machine learning, and robotics. Cutter Business Technology Journal 31, 2 (2018), 47–53.Google Scholar
Aaron Van den Oord, Sander Dieleman, and Benjamin Schrauwen. 2013. Deep content-based music recommendation. In Advances in neural information processing systems. 2643–2651.Google Scholar
Kees Van Dongen and Peter-Paul Van Maanen. 2013. A framework for explaining reliance on decision aids. International Journal of Human-Computer Studies 71, 4 (2013), 410–424.Google ScholarDigital Library
Lyn M Van Swol and Janet A Sniezek. 2005. Factors affecting the acceptance of expert advice. British journal of social psychology 44, 3 (2005), 443–461.Google Scholar
Mei Wang, Weihong Deng, Jiani Hu, Xunqiang Tao, and Yaohai Huang. 2019. Racial faces in the wild: Reducing racial bias by information maximization adaptation network. In Proceedings of the IEEE International Conference on Computer Vision. 692–702.Google ScholarCross Ref
Xinxi Wang and Ye Wang. 2014. Improving content-based and hybrid music recommendation using deep learning. In Proceedings of the 22nd ACM international conference on Multimedia. 627–636.Google ScholarDigital Library
Andrew Ward, L Ross, E Reed, E Turiel, and T Brown. 1997. Naive realism in everyday life: Implications for social conflict and misunderstanding. Values and knowledge(1997), 103–135.Google Scholar
Fumeng Yang, Zhuanyi Huang, Jean Scholtz, and Dustin L Arendt. 2020. How do visual explanations foster end users’ appropriate trust in machine learning?. In Proceedings of the 25th International Conference on Intelligent User Interfaces. 189–201.Google ScholarDigital Library
Ilan Yaniv. 2004. Receiving other people’s advice: Influence and benefit. Organizational behavior and human decision processes 93, 1 (2004), 1–13.Google Scholar
Ming Yin, Jennifer Wortman Vaughan, and Hanna Wallach. 2019. Understanding the effect of accuracy on trust in machine learning models. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–12.Google ScholarDigital Library
Yunfeng Zhang, Q Vera Liao, and Rachel KE Bellamy. 2020. Effect of Confidence and Explanation on Accuracy and Trust Calibration in AI-Assisted Decision Making. (2020). arxiv:2001.02114Google Scholar
Zijian Zhang, Jaspreet Singh, Ujwal Gadiraju, and Avishek Anand. 2019. Dissonance between human and machine understanding. Proceedings of the ACM on Human-Computer Interaction 3, CSCW(2019), 1–23.Google ScholarDigital Library

Index Terms

Human Reliance on Machine Learning Models When Performance Feedback is Limited: Heuristics and Risks

Index terms have been assigned to the content through auto-classification.

Recommendations

You’d Better Stop! Understanding Human Reliance on Machine Learning Models under Covariate Shift
WebSci '21: Proceedings of the 13th ACM Web Science Conference 2021

Decision-making aids powered by machine learning models become increasingly prevalent on the web today. However, when applied to a new distribution of data that is different from the training data (i.e., when covariate shift occurs), machine learning ...
Read More
Exploring the Effects of Machine Learning Literacy Interventions on Laypeople’s Reliance on Machine Learning Models
IUI '22: Proceedings of the 27th International Conference on Intelligent User Interfaces

Today, machine learning (ML) technologies have penetrated almost every aspect of people’s lives, yet public understandings of these technologies are often limited. This highlights the urgent need of designing effective methods to increase people’s ...
Read More
Human-in-the-Loop Machine Learning to Increase Video Accessibility for Visually Impaired and Blind Users
DIS '20: Proceedings of the 2020 ACM Designing Interactive Systems Conference

Video accessibility is crucial for blind and visually impaired individuals for education, employment, and entertainment purposes. However, professional video descriptions are costly and time-consuming. Volunteer-created video descriptions could be a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems
May 2021
10862 pages
ISBN:9781450380966
DOI:10.1145/3411764
General Chairs:
Yoshifumi Kitamura
Tohoku University, Japan
,
Aaron Quigley
University of New South Wales, Australia
,
Program Chairs:
Katherine Isbister
University of California Santa Cruz, USA
,
Takeo Igarashi
The University of Tokyo, Japan
,
Publications Chairs:
Pernille Bjørn
University of Copenhagen, Denmark
,
Steven Drucker
Microsoft Research, USA
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 May 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Machine learning
appropriate reliance
human-AI interaction
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate6,199of26,314submissions,24%
Upcoming Conference
CHI '24

Sponsor:

sigchi

CHI Conference on Human Factors in Computing Systems

May 11 - 16, 2024

Honolulu , HI , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 40
  Total Citations
  View Citations
- 1,520
  Total Downloads
- Downloads (Last 12 months)619
- Downloads (Last 6 weeks)133
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Human Reliance on Machine Learning Models When Performance Feedback is Limited: Heuristics and Risks

CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

You’d Better Stop! Understanding Human Reliance on Machine Learning Models under Covariate Shift

Exploring the Effects of Machine Learning Literacy Interventions on Laypeople’s Reliance on Machine Learning Models

Human-in-the-Loop Machine Learning to Increase Video Accessibility for Visually Impaired and Blind Users