skip to main content
10.1145/3593013.3594087acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfacctConference Proceedingsconference-collections
research-article
Open Access

Towards a Science of Human-AI Decision Making: An Overview of Design Space in Empirical Human-Subject Studies

Published:12 June 2023Publication History

ABSTRACT

AI systems are adopted in numerous domains due to their increasingly strong predictive performance. However, in high-stakes domains such as criminal justice and healthcare, full automation is often not desirable due to safety, ethical, and legal concerns, yet fully manual approaches can be inaccurate and time-consuming. As a result, there is growing interest in the research community to augment human decision making with AI assistance. Besides developing AI technologies for this purpose, the emerging field of human-AI decision making must embrace empirical approaches to form a foundational understanding of how humans interact and work with AI to make decisions. To invite and help structure research efforts towards a science of understanding and improving human-AI decision making, we survey recent literature of empirical human-subject studies on this topic. We summarize the study design choices made in over 100 papers in three important aspects: (1) decision tasks, (2) AI assistance elements, and (3) evaluation metrics. For each aspect, we summarize current trends, discuss gaps in current practices of the field, and make a list of recommendations for future research. Our work highlights the need to develop common frameworks to account for the design and research spaces of human-AI decision making, so that researchers can make rigorous choices in study design, and the research community can build on each other’s work and produce generalizable scientific knowledge. We also hope this work will serve as a bridge for HCI and AI communities to work together to mutually shape the empirical science and computational technologies for human-AI decision making.

References

  1. Ashraf Abdul, Christian von der Weth, Mohan Kankanhalli, and Brian Y Lim. 2020. COGAM: Measuring and Moderating Cognitive Load in Machine Learning Model Explanations. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–14.Google ScholarGoogle Scholar
  2. Ahmed Alqaraawi, Martin Schuessler, Philipp Weiß, Enrico Costanza, and Nadia Berthouze. 2020. Evaluating saliency map explanations for convolutional neural networks: a user study. In Proceedings of the 25th International Conference on Intelligent User Interfaces. 275–285.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014. Power to the people: The role of humans in interactive machine learning. Ai Magazine 35, 4 (2014), 105–120.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. Machine bias. ProPublica. See https://www. propublica. org/article/machine-bias-risk-assessments-in-criminal-sentencing (2016).Google ScholarGoogle Scholar
  5. Ariful Islam Anik and Andrea Bunt. 2021. Data-Centric Explanations: Explaining Training Data of Machine Learning Systems to Promote Transparency. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Suresh Kumar Annappindi. 2014. System and method for predicting consumer credit risk using income risk based credit score. US Patent 8,799,150.Google ScholarGoogle Scholar
  7. Matthew Arnold, Rachel KE Bellamy, Michael Hind, Stephanie Houde, Sameep Mehta, Aleksandra Mojsilović, Ravi Nair, K Natesan Ramamurthy, Alexandra Olteanu, David Piorkowski, 2019. FactSheets: Increasing trust in AI services through supplier’s declarations of conformity. IBM Journal of Research and Development 63, 4/5 (2019), 6–1.Google ScholarGoogle ScholarCross RefCross Ref
  8. Syed Z Arshad, Jianlong Zhou, Constant Bridon, Fang Chen, and Yang Wang. 2015. Investigating user confidence for uncertainty presentation in predictive decision making. In Proceedings of the Annual Meeting of the Australian Special Interest Group for Computer Human Interaction. 352–360.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Gagan Bansal, Besmira Nushi, Ece Kamar, Walter S Lasecki, Daniel S Weld, and Eric Horvitz. 2019. Beyond Accuracy: The Role of Mental Models in Human-AI Team Performance. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 7. 2–11.Google ScholarGoogle ScholarCross RefCross Ref
  10. Gagan Bansal, Besmira Nushi, Ece Kamar, Daniel S Weld, Walter S Lasecki, and Eric Horvitz. 2019. Updates in human-ai teams: Understanding and addressing the performance/compatibility tradeoff. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 2429–2437.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gagan Bansal, Tongshuang Wu, Joyce Zhou, Raymond Fok, Besmira Nushi, Ece Kamar, Marco Tulio Ribeiro, and Daniel S. Weld. 2020. Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance. arXiv preprint arXiv:2006.14779 (2020).Google ScholarGoogle Scholar
  12. Reuben Binns, Max Van Kleek, Michael Veale, Ulrik Lyngs, Jun Zhao, and Nigel Shadbolt. 2018. ’It’s Reducing a Human Being to a Percentage’ Perceptions of Justice in Algorithmic Decisions. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Or Biran and Kathleen R McKeown. 2017. Human-Centric Justification of Machine Learning Predictions.. In IJCAI, Vol. 2017. 1461–1467.Google ScholarGoogle ScholarCross RefCross Ref
  14. Abeba Birhane, Pratyusha Kalluri, Dallas Card, William Agnew, Ravit Dotan, and Michelle Bao. 2021. The values encoded in machine learning research. arXiv preprint arXiv:2106.15590 (2021).Google ScholarGoogle Scholar
  15. Zana Buçinca, Phoebe Lin, Krzysztof Z Gajos, and Elena L Glassman. 2020. Proxy tasks and subjective measures can be misleading in evaluating explainable ai systems. In Proceedings of the 25th International Conference on Intelligent User Interfaces. 454–464.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Zana Buçinca, Maja Barbara Malaya, and Krzysztof Z Gajos. 2021. To Trust or to Think: Cognitive Forcing Functions Can Reduce Overreliance on AI in AI-assisted Decision-making. arXiv preprint arXiv:2102.09692 (2021).Google ScholarGoogle Scholar
  17. Adrian Bussone, Simone Stumpf, and Dympna O’Sullivan. 2015. The role of explanations on trust and reliance in clinical decision support systems. In 2015 International Conference on Healthcare Informatics. IEEE, 160–169.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Carrie J Cai, Jonas Jongejan, and Jess Holbrook. 2019. The effects of example-based explanations in a machine learning interface. In Proceedings of the 24th International Conference on Intelligent User Interfaces. ACM, 258–262.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Carrie J Cai, Emily Reif, Narayan Hegde, Jason Hipp, Been Kim, Daniel Smilkov, Martin Wattenberg, Fernanda Viegas, Greg S Corrado, Martin C Stumpe, 2019. Human-centered tools for coping with imperfect algorithms during medical decision-making. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 4.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Samuel Carton, Qiaozhu Mei, and Paul Resnick. 2020. Feature-Based Explanations Don’t Help People Detect Misclassifications of Online Toxicity. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 14. 95–106.Google ScholarGoogle ScholarCross RefCross Ref
  21. Arjun Chandrasekaran, Viraj Prabhu, Deshraj Yadav, Prithvijit Chattopadhyay, and Devi Parikh. 2018. Do explanations make VQA models more predictable to a human?arXiv preprint arXiv:1810.12366 (2018).Google ScholarGoogle Scholar
  22. Chacha Chen, Shi Feng, Amit Sharma, and Chenhao Tan. 2022. Machine explanations and human understanding. arXiv preprint arXiv:2202.04092 (2022).Google ScholarGoogle Scholar
  23. Hao-Fei Cheng, Ruotong Wang, Zheng Zhang, Fiona O’Connell, Terrance Gray, F Maxwell Harper, and Haiyi Zhu. 2019. Explaining Decision-Making Algorithms through UI: Strategies to Help Non-Expert Stakeholders. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 559.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Michael Chromik, Malin Eiband, Felicitas Buchner, Adrian Krüger, and Andreas Butz. 2021. I Think I Get Your Point, AI! The Illusion of Explanatory Depth in Explainable AI. In 26th International Conference on Intelligent User Interfaces. 307–317.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Lee J Cronbach and Paul E Meehl. 1955. Construct validity in psychological tests.Psychological bulletin 52, 4 (1955), 281.Google ScholarGoogle Scholar
  26. Devleena Das and Sonia Chernova. 2020. Leveraging rationales to improve human task performance. In Proceedings of the 25th International Conference on Intelligent User Interfaces. 510–518.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jeffrey Dastin. 2016. Amazon scraps secret AI recruiting tool that showed bias against women. Reuters. See https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G (2016).Google ScholarGoogle Scholar
  28. Maria De-Arteaga, Riccardo Fogliato, and Alexandra Chouldechova. 2020. A Case for Humans-in-the-Loop: Decisions in the Presence of Erroneous Algorithmic Scores. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Berkeley J Dietvorst, Joseph P Simmons, and Cade Massey. 2015. Algorithm aversion: People erroneously avoid algorithms after seeing them err.Journal of Experimental Psychology: General 144, 1 (2015), 114.Google ScholarGoogle Scholar
  30. Berkeley J Dietvorst, Joseph P Simmons, and Cade Massey. 2018. Overcoming algorithm aversion: People will use imperfect algorithms if they can (even slightly) modify them. Management Science 64, 3 (2018), 1155–1170.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Steven E Dilsizian and Eliot L Siegel. 2014. Artificial intelligence in medicine and cardiac imaging: harnessing big data and advanced computing to provide personalized medical diagnosis and treatment. Current cardiology reports 16, 1 (2014), 441.Google ScholarGoogle Scholar
  32. Jonathan Dodge, Q Vera Liao, Yunfeng Zhang, Rachel KE Bellamy, and Casey Dugan. 2019. Explaining models: an empirical study of how explanations impact fairness judgment. In Proceedings of the 24th International Conference on Intelligent User Interfaces. 275–285.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Julia Dressel and Hany Farid. 2018. The accuracy, fairness, and limits of predicting recidivism. Science advances 4, 1 (2018), eaao5580.Google ScholarGoogle Scholar
  34. Kate Ehrlich, Susanna E Kirk, John Patterson, Jamie C Rasmussen, Steven I Ross, and Daniel M Gruen. 2011. Taking advice from intelligent systems: the double-edged sword of explanations. In Proceedings of the 16th international conference on Intelligent user interfaces. 125–134.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Shi Feng and Jordan Boyd-Graber. 2018. What can AI do for me: Evaluating Machine Learning Interpretations in Cooperative Play. arXiv preprint arXiv:1810.09648 (2018).Google ScholarGoogle Scholar
  36. Sorelle A Friedler, Chitradeep Dutta Roy, Carlos Scheidegger, and Dylan Slack. 2019. Assessing the Local Interpretability of Machine Learning Models. arXiv preprint arXiv:1902.03501 (2019).Google ScholarGoogle Scholar
  37. Erik Frøkjær, Morten Hertzum, and Kasper Hornbæk. 2000. Measuring usability: are effectiveness, efficiency, and satisfaction really correlated?. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. 345–352.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Katy Ilonka Gero, Zahra Ashktorab, Casey Dugan, Qian Pan, James Johnson, Werner Geyer, Maria Ruiz, Sarah Miller, David R Millen, Murray Campbell, 2020. Mental Models of AI Agents in a Cooperative Game Setting. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Bhavya Ghai, Q Vera Liao, Yunfeng Zhang, Rachel Bellamy, and Klaus Mueller. 2020. Explainable Active Learning (XAL): An Empirical Study of How Local Explanations Impact Annotator Experience. arXiv preprint arXiv:2001.09219 (2020).Google ScholarGoogle Scholar
  40. Soumya Ghosh, Q Vera Liao, Karthikeyan Natesan Ramamurthy, Jiri Navratil, Prasanna Sattigeri, Kush R Varshney, and Yunfeng Zhang. 2021. Uncertainty Quantification 360: A Holistic Toolkit for Quantifying and Communicating the Uncertainty of AI. arXiv preprint arXiv:2106.01410 (2021).Google ScholarGoogle Scholar
  41. Ana Valeria Gonzalez, Gagan Bansal, Angela Fan, Robin Jia, Yashar Mehdad, and Srinivasan Iyer. 2020. Human Evaluation of Spoken vs. Visual Explanations for Open-Domain QA. arXiv preprint arXiv:2012.15075 (2020).Google ScholarGoogle Scholar
  42. Ben Green. 2021. The Flaws of Policies Requiring Human Oversight of Government Algorithms. Available at SSRN (2021).Google ScholarGoogle Scholar
  43. Ben Green and Yiling Chen. 2019. Disparate interactions: An algorithm-in-the-loop analysis of fairness in risk assessments. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 90–99.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Ben Green and Yiling Chen. 2019. The principles and limits of algorithm-in-the-loop decision making. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 50.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Nina Grgić-Hlača, Christoph Engel, and Krishna P Gummadi. 2019. Human decision making with machine assistance: An experiment on bailing and jailing. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 1–25.Google ScholarGoogle ScholarCross RefCross Ref
  46. Shunan Guo, Fan Du, Sana Malik, Eunyee Koh, Sungchul Kim, Zhicheng Liu, Donghyun Kim, Hongyuan Zha, and Nan Cao. 2019. Visualizing uncertainty and alternatives in event sequence predictions. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Galen Harrison, Julia Hanson, Christine Jacinto, Julio Ramirez, and Blase Ur. 2020. An empirical study on the perceived fairness of realistic, imperfect machine learning models. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 392–402.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Peter Hase and Mohit Bansal. 2020. Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior?arXiv preprint arXiv:2005.01831 (2020).Google ScholarGoogle Scholar
  49. Fred Hohman, Minsuk Kahng, Robert Pienta, and Duen Horng Chau. 2018. Visual analytics in deep learning: An interrogative survey for the next frontiers. IEEE transactions on visualization and computer graphics 25, 8 (2018), 2674–2693.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Maia Jacobs, Jeffrey He, Melanie F. Pradier, Barbara Lam, Andrew C Ahn, Thomas H McCoy, Roy H Perlis, Finale Doshi-Velez, and Krzysztof Z Gajos. 2021. Designing AI for Trust and Collaboration in Time-Constrained Medical Decisions: A Sociotechnical Lens. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Alon Jacovi, Ana Marasović, Tim Miller, and Yoav Goldberg. 2021. Formalizing trust in artificial intelligence: Prerequisites, causes and goals of human trust in ai. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 624–635.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Jongbin Jung, Sharad Goel, Jennifer Skeem, 2020. The limits of human predictions of recidivism. Science Advances 6, 7 (2020), eaaz0652.Google ScholarGoogle Scholar
  53. Matthew Kay, Tara Kola, Jessica R Hullman, and Sean A Munson. 2016. When (ish) is my bus? user-centered visualizations of uncertainty in everyday, mobile predictive systems. In Proceedings of the 2016 chi conference on human factors in computing systems. 5092–5103.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Amir E Khandani, Adlar J Kim, and Andrew W Lo. 2010. Consumer credit-risk models via machine-learning algorithms. Journal of Banking & Finance 34, 11 (2010), 2767–2787.Google ScholarGoogle ScholarCross RefCross Ref
  55. Amirhossein Kiani, Bora Uyumazturk, Pranav Rajpurkar, Alex Wang, Rebecca Gao, Erik Jones, Yifan Yu, Curtis P Langlotz, Robyn L Ball, Thomas J Montine, 2020. Impact of a deep learning assistant on the histopathologic classification of liver cancer. NPJ digital medicine 3, 1 (2020), 1–8.Google ScholarGoogle Scholar
  56. Rafal Kocielnik, Saleema Amershi, and Paul N Bennett. 2019. Will you accept an imperfect ai? exploring designs for adjusting end-user expectations of ai systems. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Todd Kulesza, Simone Stumpf, Margaret Burnett, and Irwin Kwan. 2012. Tell me more? The effects of mental model soundness on personalizing an intelligent agent. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 1–10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Todd Kulesza, Simone Stumpf, Margaret Burnett, Sherry Yang, Irwin Kwan, and Weng-Keen Wong. 2013. Too much, too little, or just right? Ways explanations impact end users’ mental models. In 2013 IEEE Symposium on Visual Languages and Human Centric Computing. IEEE, 3–10.Google ScholarGoogle ScholarCross RefCross Ref
  59. Johannes Kunkel, Tim Donkers, Lisa Michael, Catalin-Mihai Barbu, and Jürgen Ziegler. 2019. Let Me Explain: Impact of Personal and Impersonal Explanations on Trust in Recommender Systems. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 487.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Isaac Lage, Emily Chen, Jeffrey He, Menaka Narayanan, Been Kim, Sam Gershman, and Finale Doshi-Velez. 2019. An evaluation of the human-interpretability of explanation. arXiv preprint arXiv:1902.00006 (2019).Google ScholarGoogle Scholar
  61. Vivian Lai, Han Liu, and Chenhao Tan. 2020. “Why is ‘Chicago’ deceptive?” Towards Building Model-Driven Tutorials for Humans. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Vivian Lai and Chenhao Tan. 2019. On human predictions with explanations and predictions of machine learning models: A case study on deception detection. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 29–38.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Himabindu Lakkaraju, Stephen H Bach, and Jure Leskovec. 2016. Interpretable decision sets: A joint framework for description and prediction. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1675–1684.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Charles H Lawshe. 1975. A quantitative approach to content validity. Personnel psychology 28, 4 (1975), 563–575.Google ScholarGoogle Scholar
  65. John D Lee and Katrina A See. 2004. Trust in automation: Designing for appropriate reliance. Human factors 46, 1 (2004), 50–80.Google ScholarGoogle Scholar
  66. Min Hun Lee, Daniel P Siewiorek, Asim Smailagic, Alexandre Bernardino, and Sergi Bermúdez i Badia. 2020. Co-Design and Evaluation of an Intelligent Decision Support System for Stroke Rehabilitation Assessment. Proceedings of the ACM on Human-Computer Interaction 4, CSCW2 (2020), 1–27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Min Hun Lee, Daniel PP Siewiorek, Asim Smailagic, Alexandre Bernardino, and Sergi Bermúdez Bermúdez i Badia. 2021. A Human-AI Collaborative Approach for Clinical Decision Making on Rehabilitation Assessment. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Min Kyung Lee, Anuraag Jain, Hea Jin Cha, Shashank Ojha, and Daniel Kusbit. 2019. Procedural justice in algorithmic fairness: Leveraging transparency and outcome control for fair algorithmic mediation. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 1–26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Piyawat Lertvittayakumjorn and Francesca Toni. 2021. Explanation-Based Human Debugging of NLP Models: A Survey. arXiv preprint arXiv:2104.15135 (2021).Google ScholarGoogle Scholar
  70. Ariel Levy, Monica Agrawal, Arvind Satyanarayan, and David Sontag. 2021. Assessing the Impact of Automated Suggestions on Decision Making: Domain Experts Mediate Model Errors but Take Less Initiative. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Brian Y Lim, Anind K Dey, and Daniel Avrahami. 2009. Why and why not explanations improve the intelligibility of context-aware intelligent systems. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2119–2128.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Han Liu, Vivian Lai, and Chenhao Tan. 2021. Understanding the Effect of Out-of-distribution Examples and Interactive Explanations on Human-AI Decision Making. arXiv preprint arXiv:2101.05303 (2021).Google ScholarGoogle Scholar
  73. Jennifer M Logg, Julia A Minson, and Don A Moore. 2019. Algorithm appreciation: People prefer algorithmic to human judgment. Organizational Behavior and Human Decision Processes 151 (2019), 90–103.Google ScholarGoogle ScholarCross RefCross Ref
  74. Zhuoran Lu and Ming Yin. 2021. Human Reliance on Machine Learning Models When Performance Feedback is Limited: Heuristics and Risks. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Ana Lucic, Hinda Haned, and Maarten de Rijke. 2020. Why does my model fail? contrastive local explanations for retail forecasting. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 90–98.Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Keri Mallari, Kori Inkpen, Paul Johns, Sarah Tan, Divya Ramesh, and Ece Kamar. 2020. Do I Look Like a Criminal? Examining how Race Presentation Impacts Human Judgement of Recidivism. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Sean McGrath, Parth Mehta, Alexandra Zytek, Isaac Lage, and Himabindu Lakkaraju. 2020. When Does Uncertainty Matter?: Understanding the Impact of Predictive Uncertainty in ML Assisted Decision Making. arXiv preprint arXiv:2011.06167 (2020).Google ScholarGoogle Scholar
  78. Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency. 220–229.Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Menaka Narayanan, Emily Chen, Jeffrey He, Been Kim, Sam Gershman, and Finale Doshi-Velez. 2018. How do humans understand explanations from machine learning systems? an evaluation of the human-interpretability of explanation. arXiv preprint arXiv:1802.00682 (2018).Google ScholarGoogle Scholar
  80. Dong Nguyen. 2018. Comparing automatic and human evaluation of local explanations for text classification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 1069–1078.Google ScholarGoogle ScholarCross RefCross Ref
  81. Mahsan Nourani, Chiradeep Roy, Jeremy E Block, Donald R Honeycutt, Tahrima Rahman, Eric Ragan, and Vibhav Gogate. 2021. Anchoring Bias Affects Mental Model Formation and User Reliance in Explainable AI Systems. In 26th International Conference on Intelligent User Interfaces. 340–350.Google ScholarGoogle Scholar
  82. Joon Sung Park, Rick Barber, Alex Kirlik, and Karrie Karahalios. 2019. A Slow Algorithm Improves Users’ Assessments of the Algorithm’s Accuracy. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 1–15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Forough Poursabzi-Sangdeh, Daniel G Goldstein, Jake M Hofman, Jennifer Wortman Vaughan, and Hanna Wallach. 2018. Manipulating and measuring model interpretability. arXiv preprint arXiv:1802.07810 (2018).Google ScholarGoogle Scholar
  84. Emilee Rader, Kelley Cotter, and Janghee Cho. 2018. Explanations as mechanisms for supporting algorithmic transparency. In Proceedings of the 2018 CHI conference on human factors in computing systems. 1–13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. UCI Mahcine Learning Repository. 1994. Adult Data Set.Google ScholarGoogle Scholar
  86. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of KDD.Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High-precision model-agnostic explanations. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google ScholarGoogle ScholarCross RefCross Ref
  88. Michael Roberts, Derek Driggs, Matthew Thorpe, Julian Gilbey, Michael Yeung, Stephan Ursprung, Angelica I Aviles-Rivero, Christian Etmann, Cathal McCague, Lucian Beer, 2021. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nature Machine Intelligence 3, 3 (2021), 199–217.Google ScholarGoogle ScholarCross RefCross Ref
  89. Selma Šabanović. 2010. Robots in society, society in robots. International Journal of Social Robotics 2, 4 (2010), 439–450.Google ScholarGoogle ScholarCross RefCross Ref
  90. Peter Schmidt and Ann D Witte. 1988. Predicting recidivism in north carolina, 1978 and 1980. Inter-university Consortium for Political and Social Research.Google ScholarGoogle Scholar
  91. Alison Smith-Renner, Ron Fan, Melissa Birchfield, Tongshuang Wu, Jordan Boyd-Graber, Dan Weld, and Leah Findlater. 2020. No Explainability without Accountability: An Empirical Study of Explanations and Feedback in Interactive ML. In Proceedings of CHI.Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Fabian Sperrle, Mennatallah El-Assady, Grace Guo, Rita Borgo, D Horng Chau, Alex Endert, and Daniel Keim. 2021. A Survey of Human-Centered Evaluations in Human-Centered Machine Learning. In Computer Graphics Forum, Vol. 40. Wiley Online Library, 543–567.Google ScholarGoogle Scholar
  93. Fabian Sperrle, Mennatallah El-Assady, Grace Guo, Duen Horng Chau, Alex Endert, and Daniel Keim. 2020. Should We Trust (X) AI? Design Dimensions for Structured Experimental Evaluations. arXiv preprint arXiv:2009.06433 (2020).Google ScholarGoogle Scholar
  94. Aaron Springer and Steve Whittaker. 2018. Progressive disclosure: designing for effective transparency. arXiv preprint arXiv:1811.02164 (2018).Google ScholarGoogle Scholar
  95. Lending Club Statistics. 2019. Lending Club.Google ScholarGoogle Scholar
  96. Kimberly Stowers, Nicholas Kasdaglis, Michael Rupp, Jessie Chen, Daniel Barber, and Michael Barnes. 2017. Insights into human-agent teaming: Intelligent agent transparency and uncertainty. In Advances in Human Factors in Robots and Unmanned Systems. Springer, 149–160.Google ScholarGoogle Scholar
  97. Supreme Court of Wisconsin. 2016. State of Wisconsin, Plaintiff-Respondent, v. Eric L. Loomis, Defendant-Appellant.Google ScholarGoogle Scholar
  98. Maxwell Szymanski, Martijn Millecamp, and Katrien Verbert. 2021. Visual, textual or hybrid: the effect of user expertise on different explanations. In 26th International Conference on Intelligent User Interfaces. 109–119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. Chun-Hua Tsai, Yue You, Xinning Gui, Yubo Kou, and John M Carroll. 2021. Exploring and Promoting Diagnostic Transparency and Explainability in Online Symptom Checkers. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–17.Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. United States Department of Justice. Office of Justice Programs. Bureau of Justice Statistics.2014. State Court Processing Statistics, 1990-2009: Felony Defendants in Large Urban Counties.Google ScholarGoogle Scholar
  101. Niels van Berkel, Jorge Goncalves, Daniel Russo, Simo Hosio, and Mikael B Skov. 2021. Effect of Information Presentation on Fairness Perceptions of Machine Learning Predictors. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. Michael Veale, Max Van Kleek, and Reuben Binns. 2018. Fairness and accountability design needs for algorithmic support in high-stakes public sector decision-making. In Proceedings of the 2018 chi conference on human factors in computing systems. 1–14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. Oleksandra Vereschak, Gilles Bailly, and Baptiste Caramiaux. 2021. How to Evaluate Trust in AI-Assisted Decision Making? A Survey of Empirical Methodologies. Proceedings of the ACM on Human-Computer Interaction 5, CSCW2 (2021), 1–39.Google ScholarGoogle ScholarDigital LibraryDigital Library
  104. Xinxi Wang and Ye Wang. 2014. Improving content-based and hybrid music recommendation using deep learning. In Proceedings of the 22nd ACM international conference on Multimedia. 627–636.Google ScholarGoogle ScholarDigital LibraryDigital Library
  105. Xinru Wang and Ming Yin. 2021. Are Explanations Helpful? A Comparative Study of the Effects of Explanations in AI-Assisted Decision-Making. In 26rd International Conference on Intelligent User Interfaces.Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. Hilde JP Weerts, Werner van Ipenburg, and Mykola Pechenizkiy. 2019. A human-grounded evaluation of shap for alert processing. arXiv preprint arXiv:1907.03324 (2019).Google ScholarGoogle Scholar
  107. Xingjiao Wu, Luwei Xiao, Yixuan Sun, Junhang Zhang, Tianlong Ma, and Liang He. 2021. A Survey of Human-in-the-loop for Machine Learning. arXiv preprint arXiv:2108.00941 (2021).Google ScholarGoogle Scholar
  108. Fumeng Yang, Zhuanyi Huang, Jean Scholtz, and Dustin L Arendt. 2020. How do visual explanations foster end users’ appropriate trust in machine learning?. In Proceedings of the 25th International Conference on Intelligent User Interfaces. 189–201.Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. Yi Yang, Wei Qian, and Hui Zou. 2018. Insurance premium prediction via gradient tree-boosted Tweedie compound Poisson models. Journal of Business & Economic Statistics 36, 3 (2018), 456–470.Google ScholarGoogle ScholarCross RefCross Ref
  110. Ming Yin, Jennifer Wortman Vaughan, and Hanna Wallach. 2019. Understanding the Effect of Accuracy on Trust in Machine Learning Models. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 279.Google ScholarGoogle ScholarDigital LibraryDigital Library
  111. Kun Yu, Shlomo Berkovsky, Ronnie Taib, Jianlong Zhou, and Fang Chen. 2019. Do i trust my machine teammate? an investigation from perception to decision. In Proceedings of the 24th International Conference on Intelligent User Interfaces. 460–468.Google ScholarGoogle ScholarDigital LibraryDigital Library
  112. Yunfeng Zhang, Q Vera Liao, and Rachel KE Bellamy. 2020. Effect of Confidence and Explanation on Accuracy and Trust Calibration in AI-Assisted Decision Making. arXiv preprint arXiv:2001.02114 (2020).Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    FAccT '23: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency
    June 2023
    1929 pages
    ISBN:9798400701924
    DOI:10.1145/3593013

    Copyright © 2023 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 12 June 2023

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format