skip to main content
10.1145/3411764.3445562acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article
Open Access

Human Reliance on Machine Learning Models When Performance Feedback is Limited: Heuristics and Risks

Published:07 May 2021Publication History

ABSTRACT

This paper addresses an under-explored problem of AI-assisted decision-making: when objective performance information of the machine learning model underlying a decision aid is absent or scarce, how do people decide their reliance on the model? Through three randomized experiments, we explore the heuristics people may use to adjust their reliance on machine learning models when performance feedback is limited. We find that the level of agreement between people and a model on decision-making tasks that people have high confidence in significantly affects reliance on the model if people receive no information about the model’s performance, but this impact will change after aggregate-level model performance information becomes available. Furthermore, the influence of high confidence human-model agreement on people’s reliance on a model is moderated by people’s confidence in cases where they disagree with the model. We discuss potential risks of these heuristics, and provide design implications on promoting appropriate reliance on AI.

Skip Supplemental Material Section

Supplemental Material

References

  1. Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N Bennett, Kori Inkpen, 2019. Guidelines for human-ai interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ricardo Baeza-Yates. 2018. Bias on the web. Commun. ACM 61, 6 (2018), 54–61.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Gagan Bansal, Besmira Nushi, Ece Kamar, Walter S Lasecki, Daniel S Weld, and Eric Horvitz. 2019. Beyond Accuracy: The Role of Mental Models in Human-AI Team Performance. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 7. 2–11.Google ScholarGoogle ScholarCross RefCross Ref
  4. Gagan Bansal, Besmira Nushi, Ece Kamar, Daniel S Weld, Walter S Lasecki, and Eric Horvitz. 2019. Updates in human-ai teams: Understanding and addressing the performance/compatibility tradeoff. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 2429–2437.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Gagan Bansal, Tongshuang Wu, Joyce Zhu, Raymond Fok, Besmira Nushi, Ece Kamar, Marco Tulio Ribeiro, and Daniel S Weld. 2020. Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance. (2020). arxiv:2006.14779Google ScholarGoogle Scholar
  6. Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency. 77–91.Google ScholarGoogle Scholar
  7. Carrie J Cai, Jonas Jongejan, and Jess Holbrook. 2019. The effects of example-based explanations in a machine learning interface. In Proceedings of the 24th International Conference on Intelligent User Interfaces. 258–262.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Eric T Chancey, James P Bliss, Yusuke Yamani, and Holly AH Handley. 2017. Trust and the compliance–reliance paradigm: The effects of risk, error bias, and reliability on trust and dependence. Human factors 59, 3 (2017), 333–345.Google ScholarGoogle Scholar
  9. Hao-Fei Cheng, Ruotong Wang, Zheng Zhang, Fiona O’Connell, Terrance Gray, F Maxwell Harper, and Haiyi Zhu. 2019. Explaining decision-making algorithms through UI: Strategies to help non-expert stakeholders. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Stephen R Dixon, Christopher D Wickens, and Jason S McCarley. 2007. On the independence of compliance and reliance: Are automation false alarms worse than misses?Human factors 49, 4 (2007), 564–572.Google ScholarGoogle Scholar
  11. David Dunning. 2014. We are all confident idiots. Pacific Standard 7(2014), 46–54.Google ScholarGoogle Scholar
  12. Mary T Dzindolet, Scott A Peterson, Regina A Pomranky, Linda G Pierce, and Hall P Beck. 2003. The role of trust in automation reliance. International journal of human-computer studies 58, 6 (2003), 697–718.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Mary T Dzindolet, Linda G Pierce, Hall P Beck, and Lloyd A Dawe. 1999. Misuse and disuse of automated aids. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Vol. 43. SAGE Publications Sage CA: Los Angeles, CA, 339–339.Google ScholarGoogle ScholarCross RefCross Ref
  14. Philipp Ecken and Richard Pibernik. 2016. Hit or miss: what leads experts to take advice for long-term judgments?Management Science 62, 7 (2016), 2002–2021.Google ScholarGoogle Scholar
  15. Andre Esteva, Brett Kuprel, Roberto A Novoa, Justin Ko, Susan M Swetter, Helen M Blau, and Sebastian Thrun. 2017. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 7639 (2017), 115–118.Google ScholarGoogle Scholar
  16. Raymond Fisman, Sheena S Iyengar, Emir Kamenica, and Itamar Simonson. 2006. Gender differences in mate selection: Evidence from a speed dating experiment. The Quarterly Journal of Economics 121, 2 (2006), 673–697.Google ScholarGoogle ScholarCross RefCross Ref
  17. SM Fleming and ND Daw. 2016. Self-evaluation of decision performance: A general Bayesian framework for metacognitive computation. Psychol Rev 124(2016), 1–59.Google ScholarGoogle Scholar
  18. Jorge Galindo and Pablo Tamayo. 2000. Credit risk assessment using statistical and machine learning: basic methodology and risk modeling applications. Computational Economics 15, 1-2 (2000), 107–143.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ji Gao and John D Lee. 2006. Extending the decision field theory to model operators’ reliance on automation in supervisory control situations. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 36, 5 (2006), 943–959.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ji Gao, John D Lee, and Yi Zhang. 2006. A dynamic model of interaction between reliance on automation and cooperation in multi-operator multi-automation situations. International Journal of Industrial Ergonomics 36, 5(2006), 511–526.Google ScholarGoogle ScholarCross RefCross Ref
  21. Efstathios D Gennatas, Jerome H Friedman, Lyle H Ungar, Romain Pirracchio, Eric Eaton, Lara G Reichmann, Yannet Interian, José Marcio Luna, Charles B Simone, Andrew Auerbach, 2020. Expert-augmented machine learning. Proceedings of the National Academy of Sciences 117, 9 (2020), 4571–4577.Google ScholarGoogle ScholarCross RefCross Ref
  22. Ben Green and Yiling Chen. 2019. The principles and limits of algorithm-in-the-loop decision making. Proceedings of the ACM on Human-Computer Interaction 3, CSCW(2019), 1–24.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Dale W Griffin and Lee Ross. 1991. Subjective construal, social inference, and human misunderstanding. In Advances in experimental social psychology. Vol. 24. Elsevier, 319–359.Google ScholarGoogle Scholar
  24. Christoph Hube, Besnik Fetahu, and Ujwal Gadiraju. 2019. Understanding and mitigating worker biases in the crowdsourced collection of subjective judgments. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Sarfaraz Hussein, Kunlin Cao, Qi Song, and Ulas Bagci. 2017. Risk Stratification of Lung Nodules Using 3D CNN-Based Multi-task Learning. In Information Processing in Medical Imaging, Marc Niethammer, Martin Styner, Stephen Aylward, Hongtu Zhu, Ipek Oguz, Pew-Thian Yap, and Dinggang Shen(Eds.). Springer International Publishing, Cham, 249–260.Google ScholarGoogle Scholar
  26. H Kaur, A Williams, and WS Lasecki. 2019. Building shared mental models between humans and ai for effective collaboration. (2019).Google ScholarGoogle Scholar
  27. Yea-Seul Kim, Paula Kayongo, Madeleine Grunde-McLaughlin, and Jessica Hullman. 2020. Bayesian-Assisted Inference from Visualized Data. (2020). arxiv:2008.00142Google ScholarGoogle Scholar
  28. Gang Kou, Yi Peng, and Guoxun Wang. 2014. Evaluation of clustering algorithms for financial risk analysis using MCDM methods. Information Sciences 275(2014), 1–12.Google ScholarGoogle ScholarCross RefCross Ref
  29. Justin Kruger and David Dunning. 1999. Unskilled and unaware of it: how difficulties in recognizing one’s own incompetence lead to inflated self-assessments.Journal of personality and social psychology 77, 6(1999), 1121.Google ScholarGoogle Scholar
  30. Vivian Lai and Chenhao Tan. 2019. On human predictions with explanations and predictions of machine learning models: A case study on deception detection. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 29–38.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. John D Lee and Katrina A See. 2004. Trust in automation: Designing for appropriate reliance. Human factors 46, 1 (2004), 50–80.Google ScholarGoogle Scholar
  32. Moshe Leshno, Vladimir Ya Lin, Allan Pinkus, and Shimon Schocken. 1993. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural networks 6, 6 (1993), 861–867.Google ScholarGoogle Scholar
  33. Varda Liberman, Julia A Minson, Christopher J Bryan, and Lee Ross. 2012. Naïve realism and capturing the “wisdom of dyads”. Journal of Experimental Social Psychology 48, 2 (2012), 507–512.Google ScholarGoogle ScholarCross RefCross Ref
  34. Duri Long and Brian Magerko. 2020. What is AI Literacy? Competencies and Design Considerations. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Maria Madsen and Shirley Gregor. 2000. Measuring human-computer trust. In 11th australasian conference on information systems, Vol. 53. Citeseer, 6–8.Google ScholarGoogle Scholar
  36. Gary Marks and Norman Miller. 1987. Ten years of research on the false-consensus effect: An empirical and theoretical review.Psychological bulletin 102, 1 (1987), 72.Google ScholarGoogle Scholar
  37. Charles Marx, Flavio Calmon, and Berk Ustun. 2020. Predictive multiplicity in classification. In International Conference on Machine Learning. PMLR, 6765–6774.Google ScholarGoogle Scholar
  38. Stephanie M Merritt. 2011. Affective processes in human–automation interactions. Human Factors 53, 4 (2011), 356–370.Google ScholarGoogle ScholarCross RefCross Ref
  39. Julia A Minson, Varda Liberman, and Lee Ross. 2011. Two to tango: Effects of collaboration and disagreement on dyadic judgment. Personality and Social Psychology Bulletin 37, 10 (2011), 1325–1338.Google ScholarGoogle ScholarCross RefCross Ref
  40. Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency. 220–229.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Mahsan Nourani, Samia Kabir, Sina Mohseni, and Eric D Ragan. 2019. The Effects of Meaningful and Meaningless Explanations on Trust and Perceived System Accuracy in Intelligent Systems. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 7. 97–105.Google ScholarGoogle ScholarCross RefCross Ref
  42. Besmira Nushi, Ece Kamar, and E. Horvitz. 2018. Towards Accountable AI: Hybrid Human-Machine Analyses for Characterizing System Failure. In HCOMP.Google ScholarGoogle Scholar
  43. Raja Parasuraman and Victor Riley. 1997. Humans and automation: Use, misuse, disuse, abuse. Human factors 39, 2 (1997), 230–253.Google ScholarGoogle Scholar
  44. Raja Parasuraman and Christopher D Wickens. 2008. Humans: Still vital after all these years of automation. Human factors 50, 3 (2008), 511–520.Google ScholarGoogle Scholar
  45. Niccolo Pescetelli, Geraint Rees, and Bahador Bahrami. 2016. The perceptual and social components of metacognition.Journal of Experimental Psychology: General 145, 8 (2016), 949.Google ScholarGoogle Scholar
  46. Niccolo Pescetelli and Nick Yeung. 2018. The role of decision confidence in advice-taking and trust formation. (2018). arxiv:1809.10453Google ScholarGoogle Scholar
  47. Forough Poursabzi-Sangdeh, Daniel G Goldstein, Jake M Hofman, Jennifer Wortman Vaughan, and Hanna Wallach. 2018. Manipulating and measuring model interpretability. (2018). arxiv:1802.07810Google ScholarGoogle Scholar
  48. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. ” Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135–1144.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Victor Riley. 1996. Operator reliance on automation: Theory and data. Automation and human performance: Theory and applications (1996), 19–35.Google ScholarGoogle Scholar
  50. Lee Ross, David Greene, and Pamela House. 1977. The “false consensus effect”: An egocentric bias in social perception and attribution processes. Journal of experimental social psychology 13, 3 (1977), 279–301.Google ScholarGoogle ScholarCross RefCross Ref
  51. Julian Sanchez, Arthur D Fisk, and Wendy A Rogers. 2004. Reliability and age-related effects on trust and reliance of a decision support aid. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Vol. 48. Sage Publications Sage CA: Los Angeles, CA, 586–589.Google ScholarGoogle ScholarCross RefCross Ref
  52. Julian Sanchez, Wendy A Rogers, Arthur D Fisk, and Ericka Rovira. 2014. Understanding reliance on automation: effects of error type, error distribution, age and experience. Theoretical issues in ergonomics science 15, 2 (2014), 134–160.Google ScholarGoogle Scholar
  53. James Schaffer, John O’Donovan, James Michaelis, Adrienne Raglin, and Tobias Höllerer. 2019. I can do better than your AI: expertise and explanations. In Proceedings of the 24th International Conference on Intelligent User Interfaces. 240–251.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Kelly E See, Elizabeth W Morrison, Naomi B Rothman, and Jack B Soll. 2011. The detrimental effects of power on confidence, advice taking, and accuracy. Organizational behavior and human decision processes 116, 2(2011), 272–285.Google ScholarGoogle Scholar
  55. Keng Siau and Weiyu Wang. 2018. Building trust in artificial intelligence, machine learning, and robotics. Cutter Business Technology Journal 31, 2 (2018), 47–53.Google ScholarGoogle Scholar
  56. Aaron Van den Oord, Sander Dieleman, and Benjamin Schrauwen. 2013. Deep content-based music recommendation. In Advances in neural information processing systems. 2643–2651.Google ScholarGoogle Scholar
  57. Kees Van Dongen and Peter-Paul Van Maanen. 2013. A framework for explaining reliance on decision aids. International Journal of Human-Computer Studies 71, 4 (2013), 410–424.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Lyn M Van Swol and Janet A Sniezek. 2005. Factors affecting the acceptance of expert advice. British journal of social psychology 44, 3 (2005), 443–461.Google ScholarGoogle Scholar
  59. Mei Wang, Weihong Deng, Jiani Hu, Xunqiang Tao, and Yaohai Huang. 2019. Racial faces in the wild: Reducing racial bias by information maximization adaptation network. In Proceedings of the IEEE International Conference on Computer Vision. 692–702.Google ScholarGoogle ScholarCross RefCross Ref
  60. Xinxi Wang and Ye Wang. 2014. Improving content-based and hybrid music recommendation using deep learning. In Proceedings of the 22nd ACM international conference on Multimedia. 627–636.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Andrew Ward, L Ross, E Reed, E Turiel, and T Brown. 1997. Naive realism in everyday life: Implications for social conflict and misunderstanding. Values and knowledge(1997), 103–135.Google ScholarGoogle Scholar
  62. Fumeng Yang, Zhuanyi Huang, Jean Scholtz, and Dustin L Arendt. 2020. How do visual explanations foster end users’ appropriate trust in machine learning?. In Proceedings of the 25th International Conference on Intelligent User Interfaces. 189–201.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Ilan Yaniv. 2004. Receiving other people’s advice: Influence and benefit. Organizational behavior and human decision processes 93, 1 (2004), 1–13.Google ScholarGoogle Scholar
  64. Ming Yin, Jennifer Wortman Vaughan, and Hanna Wallach. 2019. Understanding the effect of accuracy on trust in machine learning models. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Yunfeng Zhang, Q Vera Liao, and Rachel KE Bellamy. 2020. Effect of Confidence and Explanation on Accuracy and Trust Calibration in AI-Assisted Decision Making. (2020). arxiv:2001.02114Google ScholarGoogle Scholar
  66. Zijian Zhang, Jaspreet Singh, Ujwal Gadiraju, and Avishek Anand. 2019. Dissonance between human and machine understanding. Proceedings of the ACM on Human-Computer Interaction 3, CSCW(2019), 1–23.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Human Reliance on Machine Learning Models When Performance Feedback is Limited: Heuristics and Risks
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems
            May 2021
            10862 pages
            ISBN:9781450380966
            DOI:10.1145/3411764

            Copyright © 2021 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 7 May 2021

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

            Acceptance Rates

            Overall Acceptance Rate6,199of26,314submissions,24%

            Upcoming Conference

            CHI '24
            CHI Conference on Human Factors in Computing Systems
            May 11 - 16, 2024
            Honolulu , HI , USA

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format