skip to main content
research-article

The Explanation That Hits Home: The Characteristics of Verbal Explanations That Affect Human Perception in Subjective Decision-Making

Published: 08 November 2024 Publication History

Abstract

Human-AI collaborative decision-making can achieve better outcomes than either party individually. The success of this collaboration can depend on whether the human decision-maker perceives the AI contribution as beneficial to the decision-making process. Beneficial AI explanations are often described as relevant, convincing, and trustworthy. Yet, we know little about the characteristics of explanations that result in these perceptions. Focusing on collaborative subjective decision-making, using the context of subtle sexism, where explanations can surface new interpretations, we conducted a user study (N=20) to explore the structural and content characteristics that affect perceptions of human and AI-generated verbal (text and audio) explanations. We find four groups of characteristics (Tone, Grammatical Elements, Argumentative Sophistication and Relation to User), and that the effect of these characteristics on the perception of explanations for subtle sexism depends on the perceived author. Thus, we also identify which explanation characteristics participants use to identify the author of an explanation. Demonstrating the relationship between these characteristics and explanation perceptions, we present a categorized set of characteristics that system builders can leverage to produce the appropriate perception of an explanation for various sensitive contexts. We also highlight human perception biases and associated issues resulting from these perceptions.

References

[1]
Amina Adadi and Mohammed Berrada. 2018. Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE access (2018), 52138--52160. https://doi.org/10.1109/ACCESS.2018.2870052
[2]
Resham Ahluwalia, Himani Soni, Edward Callow, Anderson Nascimento, and Martine De Cock. 2018. Detecting hate speech against women in english tweets. EVALITA Evaluation of NLP and Speech Tools for Italian (2018), 194.
[3]
Alican Akman and Björn W Schuller. 2024. Audio Explainable Artificial Intelligence: A Review. Intelligent Computing (2024), 0074.
[4]
Hala Al Kuwatly, Maximilian Wich, and Georg Groh. 2020. Identifying and measuring annotator bias based on annotators? demographic characteristics. In Proceedings of the fourth workshop on online abuse and harms. 184--190.
[5]
Alejandro Barredo Arrieta, Natalia Díaz-Rodríguez, Javier Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado, Salvador García, Sergio Gil-López, Daniel Molina, Richard Benjamins, et al. 2020. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information fusion (2020), 82--115.
[6]
Zahra Ashktorab, Q. Vera Liao, Casey Dugan, James Johnson, Qian Pan, Wei Zhang, Sadhana Kumaravel, and Murray Campbell. 2020. Human-AI Collaboration in a Cooperative Game Setting: Measuring Social Perception and Outcomes. Proc. ACM Hum.-Comput. Interact. CSCW2, Article 96 (oct 2020), 20 pages. https://doi.org/10.1145/3415167
[7]
Krisztian Balog, Filip Radlinski, and Andrey Petrov. 2023. Measuring the Impact of Explanation Bias: A Study of Natural Language Justifications for Recommender Systems. In Extended Abstracts of the 2023 CHI Conf. on Human Factors in Computing Systems (Hamburg, Germany) (CHI EA '23). ACM, New York, NY, USA, Article 203, 8 pages. https://doi.org/10.1145/3544549.3585748
[8]
Nikola Banovic, Zhuoran Yang, Aditya Ramesh, and Alice Liu. 2023. Being Trustworthy is Not Enough: How Untrustworthy Artificial Intelligence (AI) Can Deceive the End-Users and Gain Their Trust. Proceedings of the ACM on Human-Computer Interaction CSCW1 (2023), 1--17.
[9]
Gagan Bansal, Besmira Nushi, Ece Kamar, Walter S Lasecki, Daniel S Weld, and Eric Horvitz. 2019. Beyond accuracy: The role of mental models in human-AI team performance. In Proceedings of the AAAI Conf. on human computation and crowdsourcing. 2--11.
[10]
Gagan Bansal, Tongshuang Wu, Joyce Zhou, Raymond Fok, Besmira Nushi, Ece Kamar, Marco Tulio Ribeiro, and Daniel Weld. 2021. Does the Whole Exceed Its Parts? The Effect of AI Explanations on Complementary Team Performance. In Proceedings of the 2021 CHI Conf. on Human Factors in Computing Systems (Yokohama, Japan) (CHI '21). ACM, New York, NY, USA, Article 81, 16 pages. https://doi.org/10.1145/3411764.3445717
[11]
Or Biran and Kathleen McKeown. 2014. Justification narratives for individual classifications. In Proceedings of the AutoML workshop at ICML. 1--7.
[12]
Noémi Bontridder and Yves Poullet. 2021. The role of artificial intelligence in disinformation. Data & Policy (2021), e32. https://doi.org/10.1017/dap.2021.20
[13]
Tom Bourgeade. 2022. From Text to Trust: A Priori Interpretability Versus Post Hoc Explainability in Natural Language Processing. Ph.,D. Dissertation. Université Paul Sabatier-Toulouse III.
[14]
Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology 2 (2006), 77--101.
[15]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. CoRR (2020). arxiv: 2005.14165 https://arxiv.org/abs/2005.14165
[16]
Zana Buccinca, Phoebe Lin, Krzysztof Z Gajos, and Elena L Glassman. 2020. Proxy tasks and subjective measures can be misleading in evaluating explainable ai systems. In Proceedings of the 25th Int'l Conf. on intelligent user interfaces. 454--464.
[17]
Zana Buccinca, Maja Barbara Malaya, and Krzysztof Z Gajos. 2021. To trust or to think: cognitive forcing functions can reduce overreliance on AI in AI-assisted decision-making. Proceedings of the ACM on Human-Computer Interaction CSCW1 (2021), 1--21.
[18]
Laura Cabello, Anna Katrine Jørgensen, and Anders Søgaard. 2023. On the Independence of Association Bias and Empirical Fairness in Language Models. In Proceedings of the 2023 ACM Conf. on Fairness, Accountability, and Transparency. 370--378.
[19]
Beatriz Cabrero-Daniel, Tomas Herda, Victoria Pichler, and Martin Eder. 2024. Exploring Human-AI Collaboration in Agile: Customised LLM Meeting Assistants. arXiv preprint arXiv:2404.14871 (2024).
[20]
Carrie J Cai, Jonas Jongejan, and Jess Holbrook. 2019. The effects of example-based explanations in a machine learning interface. In Proceedings of the 24th Int'l Conf. on intelligent user interfaces. 258--262.
[21]
Aylin Caliskan, Joanna J Bryson, and Arvind Narayanan. 2017. Semantics derived automatically from language corpora contain human-like biases. Science 6334 (2017), 183--186.
[22]
Erik Cambria, Lorenzo Malandri, Fabio Mercorio, Mario Mezzanzanica, and Navid Nobani. 2023. A survey on XAI and natural language explanations. Information Processing & Management 1 (2023), 103111.
[23]
Samuel Carton, Qiaozhu Mei, and Paul Resnick. 2020. Feature-Based Explanations Don't Help People Detect Misclassifications of Online Toxicity. In Proceedings of the Int'l AAAI Conf. on Web and Social Media. 95--106.
[24]
Neha Chacko and Viju Chacko. 2023. Paradigm shift presented by Large Language Models (LLM) in Deep Learning. ADVANCES IN EMERGING COMPUTING TECHNOLOGIES (2023), 40.
[25]
Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Kaijie Zhu, Hao Chen, Linyi Yang, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, et al. 2023. A survey on evaluation of large language models. arXiv preprint arXiv:2307.03109 (2023).
[26]
Chacha Chen, Shi Feng, Amit Sharma, and Chenhao Tan. 2022. Machine explanations and human understanding. arXiv preprint arXiv:2202.04092 (2022).
[27]
Chun-Wei Chiang, Zhuoran Lu, Zhuoyan Li, and Ming Yin. 2023. Are Two Heads Better Than One in AI-Assisted Decision Making? Comparing the Behavior and Performance of Groups and Individuals in Human-AI Collaborative Recidivism Risk Assessment. In Proceedings of the 2023 CHI Conf. on Human Factors in Computing Systems (CHI '23). ACM, New York, NY, USA, Article 348, 18 pages. https://doi.org/10.1145/3544548.3581015
[28]
Uthsav Chitra and Christopher Musco. 2020. Analyzing the Impact of Filter Bubbles on Social Network Polarization. In Proceedings of the 13th Int'l Conf. on Web Search and Data Mining (Houston, TX, USA) (WSDM '20). ACM, New York, NY, USA, 115--123. https://doi.org/10.1145/3336191.3371825
[29]
Ke-Li Chiu and Rohan Alexander. 2021. Detecting Hate Speech with GPT-3. CoRR (2021). arxiv: 2103.12407 https://arxiv.org/abs/2103.12407
[30]
Nazli Cila. 2022. Designing Human-Agent Collaborations: Commitment, Responsiveness, and Support. In Proceedings of the 2022 CHI Conf. on Human Factors in Computing Systems (CHI '22). ACM, New York, NY, USA, Article 420, 18 pages. https://doi.org/10.1145/3491102.3517500
[31]
Elizabeth Clark, Tal August, Sofia Serrano, Nikita Haduong, Suchin Gururangan, and Noah A. Smith. 2021. All That's `Human' Is Not Gold: Evaluating Human Evaluation of Generated Text. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th Int'l Joint Conf. on Natural Language Processing (Vol. 1: Long Papers). Association for Computational Linguistics, Online, 7282--7296. https://doi.org/10.18653/v1/2021.acl-long.565
[32]
Leonardo De Cosmo. 2022. Google Engineer claims AI chatbot is sentient: Why that matters. https://www.scientificamerican.com/article/google-engineer-claims-ai-chatbot-is-sentient-why-that-matters/
[33]
Marina Danilevsky, Kun Qian, Ranit Aharonov, Yannis Katsis, Ban Kawas, and Prithviraj Sen. 2020. A survey of the state of explainable AI for natural language processing. arXiv preprint arXiv:2010.00711 (2020).
[34]
Valdemar Danry, Pat Pataranutaporn, Yaoli Mao, and Pattie Maes. 2023. Dont Just Tell Me, Ask Me: AI Systems that Intelligently Frame Explanations as Questions Improve Human Logical Discernment Accuracy over Causal AI explanations. In Proceedings of the 2023 CHI Conf. on Human Factors in Computing Systems (Hamburg, Germany) (CHI '23). ACM, New York, NY, USA, Article 352, 13 pages. https://doi.org/10.1145/3544548.3580672
[35]
Teresa Datta and John P Dickerson. 2023. Who's Thinking A Push for Human-Centered Evaluation of LLMs using the XAI Playbook. arXiv preprint arXiv:2303.06223 (2023).
[36]
Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. In Proceedings of the Int'l AAAI Conf. on Web and Social Media.
[37]
Maartje de Graaf and Bertram F. Malle. 2017. How People Explain Action (and Autonomous Intelligent Systems Should Too). In 2017 AAAI Fall Symposia.
[38]
Jwala Dhamala, Tony Sun, Varun Kumar, Satyapriya Krishna, Yada Pruksachatkun, Kai-Wei Chang, and Rahul Gupta. 2021. Bold: Dataset and metrics for measuring biases in open-ended language generation. In Proceedings of the 2021 ACM Conf. on fairness, accountability, and transparency. 862--872.
[39]
Thiago Dias Oliva, Dennys Marcelo Antonialli, and Alessandra Gomes. 2021. Fighting hate speech, silencing drag queens artificial intelligence in content moderation and risks to LGBTQ voices online. Sexuality & Culture (2021), 700--732.
[40]
Kate Donahue, Alexandra Chouldechova, and Krishnaram Kenthapadi. 2022. Human-Algorithm Collaboration: Achieving Complementarity and Avoiding Unfairness. In 2022 ACM Conf. on Fairness, Accountability, and Transparency (Seoul, Republic of Korea) (FAccT '22). ACM, New York, NY, USA, 1639--1656. https://doi.org/10.1145/3531146.3533221
[41]
Robert S Dooley and Gerald E Fryxell. 1999. Attaining decision quality and commitment from dissent: The moderating effects of loyalty and competence in strategic decision-making teams. Academy of Management journal 4 (1999), 389--402.
[42]
Philip R Doyle, Justin Edwards, Odile Dumbleton, Leigh Clark, and Benjamin R Cowan. 2019. Mapping perceptions of humanness in intelligent personal assistant interaction. In Proceedings of the 21st Int'l Conf. on human-computer interaction with mobile devices and services. 1--12.
[43]
Senjuti Dutta, Sid Mittal, Sherol Chen, Deepak Ramachandran, Ravi Rajakumar, Ian Kivlichan, Sunny Mak, Alena Butryna, and Praveen Paritosh. 2023. Modeling subjectivity (by Mimicking Annotator Annotation) in toxic comment identification across diverse communities. arXiv preprint arXiv:2311.00203 (2023).
[44]
Malin Eiband, Daniel Buschek, Alexander Kremer, and Heinrich Hussmann. 2019. The Impact of Placebic Explanations on Trust in Intelligent Systems. In Extended Abstracts of the 2019 CHI Conf. on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI EA '19). ACM, New York, NY, USA, 1--6. https://doi.org/10.1145/3290607.3312787
[45]
Ziv Epstein, Antonio Alonso Arechar, and David Rand. 2023. What label should be applied to content produced by generative AI (2023).
[46]
Mingming Fan, Xianyou Yang, TszTung Yu, Q. Vera Liao, and Jian Zhao. 2022. Human-AI Collaboration for UX Evaluation: Effects of Explanation and Synchronization. Proc. ACM Hum.-Comput. Interact. CSCW1, Article 96 (apr 2022), 32 pages. https://doi.org/10.1145/3512943
[47]
Thomas Fel, David Vigouroux, Rémi Cadène, and Thomas Serre. 2022. How good is your explanation algorithmic stability measures to assess the quality of explanations for deep neural networks. In Proceedings of the IEEE/CVF Winter Conf. on Applications of Computer Vision. 720--730.
[48]
Xueyang Feng, Zhi-Yuan Chen, Yujia Qin, Yankai Lin, Xu Chen, Zhiyuan Liu, and Ji-Rong Wen. 2024. Large Language Model-based Human-Agent Collaboration for Complex Task Solving. arXiv preprint arXiv:2402.12914 (2024).
[49]
Sharon Ferguson, Paula Akemi Aoyagui, Young-Ho Kim, and Anastasia Kuzminykh. 2024. Just Like Me: The Role of Opinions and Personal Experiences in The Perception of Explanations in Subjective Decision-Making. arXiv preprint arXiv:2404.12558 (2024).
[50]
Sharon A Ferguson, Paula Akemi Aoyagui, and Anastasia Kuzminykh. 2023. Something Borrowed: Exploring the Influence of AI-Generated Explanation Text on the Composition of Human Explanations. In Extended Abstracts of the 2023 CHI Conf. on Human Factors in Computing Systems. 1--7.
[51]
Andrea Ferrario, Michele Loi, and Eleonora Viganò. 2020. In AI we trust incrementally: A multi-layer model of trust to analyze human-artificial intelligence interactions. Philosophy & Technology 3 (2020), 523--539.
[52]
Jessica L. Feuston and Jed R. Brubaker. 2021. Putting Tools in Their Place: The Role of Time and Perspective in Human-AI Collaboration for Qualitative Analysis. Proc. ACM Hum.-Comput. Interact. CSCW2, Article 469 (oct 2021), 25 pages. https://doi.org/10.1145/3479856
[53]
Seth Flaxman, Sharad Goel, and Justin M Rao. 2016. Filter bubbles, echo chambers, and online news consumption. Public opinion quarterly S1 (2016), 298--320.
[54]
Raymond Fok and Daniel S Weld. 2023. In search of verifiability: Explanations rarely enable complementary performance in ai-advised decision making. arXiv preprint arXiv:2305.07722 (2023).
[55]
Deborah H Francis and William R Sandberg. 2000. Friendship within entrepreneurial teams and its association with team and venture performance. Entrepreneurship Theory and Practice 2 (2000), 5--26.
[56]
Masaru Fuji, Katsuhito Nakazawa, and Hiroaki Yoshida. 2020. Trustworthy and Explainable AI Achieved Through Knowledge Graphs and Social Implementation. Fujitsu Scientific & Technical Journal 1 (2020), 39--45.
[57]
Cary Funk. 2023. How americans view emerging uses of artificial intelligence, including programs to generate text or art. https://www.pewresearch.org/short-reads/2023/02/22/how-americans-view-emerging-uses-of-artificial-intelligence-including-programs-to-generate-text-or-art/
[58]
Cristina Garbacea, Samuel Carton, Shiyan Yan, and Qiaozhu Mei. 2019. Judge the judges: A large-scale evaluation study of neural language models for online review generation. arXiv preprint arXiv:1901.00398 (2019).
[59]
Tanmay Garg, Sarah Masud, Tharun Suresh, and Tanmoy Chakraborty. 2023. Handling bias in toxic speech detection: A survey. Comput. Surveys 13s (2023), 1--32.
[60]
Maliheh Ghajargar, Jeffrey Bardzell, Alison Marie Smith-Renner, Kristina Höök, and Peter Gall Krogh. 2022. Graspable AI: Physical forms as explanation modality for explainable AI. In Sixteenth Int'l Conf. on Tangible, Embedded, and Embodied Interaction. 1--4.
[61]
Katherine J Hall. 2016. " They believe that because they are women, it should be easier for them." Subtle and Overt Sexism toward Women in STEM from Social Media Commentary. Virginia Commonwealth University.
[62]
Ronan Hamon, Henrik Junklewitz, Gianclaudio Malgieri, Paul De Hert, Laurent Beslay, and Ignacio Sanchez. 2021. Impossible Explanations Beyond explainable AI in the GDPR from a COVID-19 use case scenario. In Proceedings of the 2021 ACM Conf. on fairness, accountability, and transparency. 549--559.
[63]
Ronan Hamon, Henrik Junklewitz, Ignacio Sanchez, Gianclaudio Malgieri, and Paul De Hert. 2022. Bridging the gap between AI and explainability in the GDPR: towards trustworthiness-by-design in automated decision-making. IEEE Computational Intelligence Magazine 1 (2022), 72--85.
[64]
Allyson I Hauptman, Wen Duan, and Nathan J Mcneese. 2022. The Components of Trust for Collaborating With AI Colleagues. In Companion Publication of the 2022 Conf. on Computer Supported Cooperative Work and Social Computing. 72--75.
[65]
Yugo Hayashi and Kosuke Wakabayashi. 2017. Can AI become reliable source to support human decision making in a court scene. In Companion of the 2017 ACM Conf. on Computer Supported Cooperative Work and Social Computing. 195--198.
[66]
Will Douglas Heaven. 2023. Large language models arent people. let's stop testing them as if they were. https://www.technologyreview.com/2023/08/30/1078670/large-language-models-arent-people-lets-stop-testing-them-like-they-were/
[67]
Patrick Hemmer, Max Schemmer, Michael Vössing, and Niklas Kühl. 2021. Human-AI Complementarity in Hybrid Intelligence Systems: A Structured Literature Review. PACIS (2021), 78.
[68]
Patrick Hemmer, Monika Westphal, Max Schemmer, Sebastian Vetter, Michael Vössing, and Gerhard Satzger. 2023. Human-AI Collaboration: The Effect of AI Delegation on Human Task Performance and Task Satisfaction. In Proceedings of the 28th Int'l Conf. on Intelligent User Interfaces (Sydney, NSW, Australia) (IUI '23). ACM, New York, NY, USA, 453--463. https://doi.org/10.1145/3581641.3584052
[69]
Lisa Anne Hendricks, Anna Rohrbach, Bernt Schiele, Trevor Darrell, and Zeynep Akata. 2021. Generating visual explanations with natural language. Applied AI Letters 4 (2021), e55.
[70]
Jonathan L Herlocker, Joseph A Konstan, and John Riedl. 2000. Explaining collaborative filtering recommendations. In Proceedings of the 2000 ACM Conf. on Computer supported cooperative work. 241--250.
[71]
Danula Hettiachchi and Jorge Goncalves. 2019. Towards effective crowd-powered online content moderation. In Proceedings of the 31st Australian Conf. on Human-Computer-Interaction. 342--346.
[72]
Jimpei Hitsuwari, Yoshiyuki Ueda, Woojin Yun, and Michio Nomura. 2023. Does human--AI collaboration lead to more creative art Aesthetic evaluation of human-made and AI-generated haiku poetry. Computers in Human Behavior (2023), 107502.
[73]
Robert Hoffman, Shane Mueller, Gary Klein, and Jordan Litman. 2021. Measuring trust in the XAI context. (2021).
[74]
Robert R Hoffman, Shane T Mueller, Gary Klein, and Jordan Litman. 2018. Metrics for explainable AI: Challenges and prospects. arXiv preprint arXiv:1812.04608 (2018).
[75]
Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Michael Tontchev, Qing Hu, Brian Fuller, Davide Testuggine, et al. 2023. Llama guard: Llm-based input-output safeguard for human-ai conversations. arXiv preprint arXiv:2312.06674 (2023).
[76]
Maurice Jakesch, Megan French, Xiao Ma, Jeffrey T Hancock, and Mor Naaman. 2019. AI-mediated communication: How the perception that profile text was written by AI affects trustworthiness. In Proceedings of the 2019 CHI Conf. on Human Factors in Computing Systems. 1--13.
[77]
Maurice Jakesch, Jeffrey T Hancock, and Mor Naaman. 2023. Human heuristics for AI-generated language are flawed. Proceedings of the National Academy of Sciences 11 (2023), e2208839120.
[78]
Jun Li Jeung and Janet Yi-Ching Huang. 2023. Correct Me If I Am Wrong: Exploring How AI Outputs Affect User Perception and Trust. In Companion Publication of the 2023 Conf. on Computer Supported Cooperative Work and Social Computing. 323--327.
[79]
David S Johnson, Olya Hakobyan, and Hanna Drimalla. 2023. Towards Interpretability in Audio and Visual Affective Machine Learning: A Review. arXiv preprint arXiv:2306.08933 (2023).
[80]
Ju Yeon Jung, Tom Steinberger, and Chaehan So. 2023. Towards Actionable Data Science: Domain Experts as End-Users of Data Science Systems. Computer Supported Cooperative Work (CSCW) (2023), 1--45.
[81]
Amruta Kale, Tin Nguyen, Frederick C Harris Jr, Chenhao Li, Jiyin Zhang, and Xiaogang Ma. 2023. Provenance documentation to enable explainable and trustworthy AI: A literature review. Data Intelligence 1 (2023), 139--162.
[82]
Manveer Kalirai and Anastasia Kuzminykh. 2022. What Can You Do For Me The Discoverability of Intelligent Assistant Skills. In Adjunct Proceedings of the 2022 ACM Int'l Joint Conf. on Pervasive and Ubiquitous Computing and the 2022 ACM Int'l Symp. on Wearable Computers. 57--59.
[83]
Katarina Kertysova. 2018. Artificial intelligence and disinformation: How AI changes the way disinformation is produced, disseminated, and can be countered. Security and Human Rights 1--4 (2018), 55--81.
[84]
Jae Yeon Kim, Carlos Ortiz, Sarah Nam, Sarah Santiago, and Vivek Datta. 2020. Intersectional bias in hate speech and abusive language datasets. arXiv preprint arXiv:2005.05921 (2020).
[85]
Nils Köbis and Luca D Mossink. 2021. Artificial intelligence versus Maya Angelou: Experimental evidence that people cannot differentiate AI-generated from human-written poetry. Computers in human behavior (2021), 106553.
[86]
Rafal Kocielnik, Daniel Avrahami, Jennifer Marlow, Di Lu, and Gary Hsieh. 2018. Designing for workplace reflection: a chat and voice-based conversational agent. In Proceedings of the 2018 designing interactive systems Conf. 881--894.
[87]
Hadas Kotek, Rikker Dockum, and David Q Sun. 2023. Gender bias and stereotypes in Large Language Models. arXiv preprint arXiv:2308.14921 (2023).
[88]
Sarah Kreps, R Miles McCain, and Miles Brundage. 2022. All the news that's fit to fabricate: AI-generated text as a tool of media misinformation. Journal of experimental political science 1 (2022), 104--117.
[89]
Swarn Avinash Kumar, Moustafa M Nasralla, Iván García-Magari no, and Harsh Kumar. 2021. A machine-learning scraping tool for data fusion in the analysis of sentiments about pandemics for supporting business decisions with human-centric AI explanations. PeerJ Computer Science (2021), e713.
[90]
Johannes Kunkel, Tim Donkers, Catalin-Mihai Barbu, and Jürgen Ziegler. 2018. Trust-related Effects of Expertise and Similarity Cues in Human-Generated Recommendations. In IUI Workshops.
[91]
Johannes Kunkel, Tim Donkers, Lisa Michael, Catalin-Mihai Barbu, and Jürgen Ziegler. 2019. Let me explain: Impact of personal and impersonal explanations on trust in recommender systems. In Proceedings of the 2019 CHI Conf. on human factors in computing systems. 1--12.
[92]
Vivian Lai, Samuel Carton, Rajat Bhatnagar, Q. Vera Liao, Yunfeng Zhang, and Chenhao Tan. 2022. Human-AI Collaboration via Conditional Delegation: A Case Study of Content Moderation. In Proceedings of the 2022 CHI Conf. on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI '22). ACM, New York, NY, USA, Article 54, 18 pages. https://doi.org/10.1145/3491102.3501999
[93]
Vivian Lai, Chacha Chen, Alison Smith-Renner, Q Vera Liao, and Chenhao Tan. 2023. Towards a science of human-ai decision making: An overview of design space in empirical human-subject studies. (2023), 1369--1385.
[94]
Vivian Lai, Yiming Zhang, Chacha Chen, Q Vera Liao, and Chenhao Tan. 2023. Selective explanations: Leveraging human input to align explainable ai. arXiv preprint arXiv:2301.09656 (2023).
[95]
John D Lee and Katrina A See. 2004. Trust in automation: Designing for appropriate reliance. Human factors 1 (2004), 50--80.
[96]
Min Hun Lee, Daniel P Siewiorek, Asim Smailagic, Alexandre Bernardino, and Sergi Bermúdez Bermúdez i Badia. 2021. A human-ai collaborative approach for clinical decision making on rehabilitation assessment. In Proceedings of the 2021 CHI Conf. on human factors in computing systems. 1--14.
[97]
Q Vera Liao and Jennifer Wortman Vaughan. 2023. AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap. arXiv preprint arXiv:2306.01941 (2023).
[98]
Zachary C Lipton. 2018. The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue 3 (2018), 31--57.
[99]
Han Liu, Vivian Lai, and Chenhao Tan. 2021. Understanding the effect of out-of-distribution examples and interactive explanations on human-ai decision making. Proceedings of the ACM on Human-Computer Interaction CSCW2 (2021), 1--45.
[100]
Ewa Luger and Abigail Sellen. 2016. "Like Having a Really Bad PA" The Gulf between User Expectation and Experience of Conversational Agents. In Proceedings of the 2016 CHI Conf. on human factors in computing systems. 5286--5297.
[101]
Shuai Ma, Qiaoyi Chen, Xinru Wang, Chengbo Zheng, Zhenhui Peng, Ming Yin, and Xiaojuan Ma. 2024. Towards Human-AI Deliberation: Design and Evaluation of LLM-Empowered Deliberative AI for AI-Assisted Decision-Making. arXiv preprint arXiv:2403.16812 (2024).
[102]
Enrico Mariconti, Guillermo Suarez-Tangil, Jeremy Blackburn, Emiliano De Cristofaro, Nicolas Kourtellis, Ilias Leontiadis, Jordi Luque Serrano, and Gianluca Stringhini. 2019. You Know What to Do: Proactive Detection of YouTube Videos Targeted by Coordinated Hate Attacks. Proceedings of the ACM on Human-Computer Interaction-CSCW CSCW (2019), 1--21.
[103]
Daniel McDuff and Mary Czerwinski. 2018. Designing emotionally sentient agents. Commun. ACM 12 (2018), 74--83.
[104]
Elizabeth J Miller, Ben A Steward, Zak Witkower, Clare AM Sutherland, Eva G Krumhuber, and Amy Dawel. 2023. AI Hyperrealism: Why AI Faces Are Perceived as More Real Than Human Ones. Psychological Science 12 (2023), 1390--1403.
[105]
Tim Miller. 2019. Explanation in artificial intelligence: Insights from the social sciences. Artificial intelligence (2019), 1--38.
[106]
Tim Miller. 2023. Explainable ai is dead, long live explainable ai! hypothesis-driven decision support using evaluative ai. In Proceedings of the 2023 ACM Conf. on Fairness, Accountability, and Transparency. 333--342.
[107]
Chelsea Mitamura, Lynnsey Erickson, and Patricia G Devine. 2017. Value-based standards guide sexism inferences for self and others. Journal of Experimental Social Psychology (2017), 101--117.
[108]
Brent Mittelstadt, Chris Russell, and Sandra Wachter. 2019. Explaining explanations in AI. In Proceedings of the Conf. on fairness, accountability, and transparency. 279--288.
[109]
Sina Mohseni, Niloofar Zarei, and Eric D Ragan. 2021. A multidisciplinary survey and framework for design and evaluation of explainable AI systems. ACM Transactions on Interactive Intelligent Systems (TiiS) 3--4 (2021), 1--45.
[110]
Masahiro Mori, Karl F MacDorman, and Norri Kageki. 2012. The uncanny valley [from the field]. IEEE Robotics & automation magazine 2 (2012), 98--100.
[111]
Katelyn Morrison, Donghoon Shin, Kenneth Holstein, and Adam Perer. 2023. Evaluating the Impact of Human Explanation Strategies on Human-AI Visual Decision-Making. Proceedings of the ACM on Human-Computer Interaction CSCW1 (2023), 1--37.
[112]
Mohammad Naiseh, Nan Jiang, Jianbing Ma, and Raian Ali. 2020. Explainable recommendations in intelligent systems: delivery methods, modalities and risks. In Research Challenges in Information Science: 14th Int'l Conf., RCIS 2020, Limassol, Cyprus, September 23--25, 2020, Proceedings 14. Springer, 212--228.
[113]
David T Newman, Nathanael J Fast, and Derek J Harmon. 2020. When eliminating bias isnt fair: Algorithmic reductionism and procedural justice in human resource decisions. Organizational Behavior and Human Decision Processes (2020), 149--167.
[114]
Raymond S Nickerson. 1998. Confirmation bias: A ubiquitous phenomenon in many guises. Review of general psychology 2 (1998), 175--220.
[115]
Nicole Nisbett and Viktoria Spaiser. 2023. How convincing are AI-generated moral arguments for climate action Frontiers in Climate (2023), 1193350.
[116]
Mahsan Nourani, Chiradeep Roy, Jeremy E Block, Donald R Honeycutt, Tahrima Rahman, Eric Ragan, and Vibhav Gogate. 2021. Anchoring bias affects mental model formation and user reliance in explainable AI systems. In 26th Int'l Conf. on Intelligent User Interfaces. 340--350.
[117]
Thiago Dias Oliva, Dennys Marcelo Antonialli, and Alessandra Gomes. 2021. Fighting hate speech, silencing drag queens? artificial intelligence in content moderation and risks to lgbtq voices online. Sexuality & Culture 2 (2021), 700--732.
[118]
Alexis Palmer and Arthur Spirling. 2023. Large Language Models Can Argue in Convincing and Novel Ways About Politics: Evidence from Experiments and Human Judgement. Technical Report.
[119]
Cecilia Panigutti, Andrea Beretta, Fosca Giannotti, and Dino Pedreschi. 2022. Understanding the impact of explanations on advice-taking: a user study for AI-based clinical Decision Support Systems. In Proceedings of the 2022 CHI Conf. on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI '22). ACM, New York, NY, USA, Article 568, 9 pages. https://doi.org/10.1145/3491102.3502104
[120]
Mike C Parent, Teresa D Gobble, and Aaron Rochlen. 2019. Social media behavior, toxic masculinity, and depression. Psychology of Men & Masculinities 3 (2019), 277.
[121]
Patrick philips2011, Max Scphilips2011, Michael Vössing, and Niklas Kühl. 2021. Human-AI Complementarity in Hybrid Intelligence Systems: A Structured Literature Review. PACIS (2021), 78.
[122]
Julia R. Fernandez. 2021. Being Yourself Online: Supporting Authenticity for LGBTQ Social Media Users. ACM, New York, NY, USA, 249--252. https://doi.org/10.1145/3462204.3481786
[123]
Stephen J Read and Amy Marcus-Newhall. 1993. Explanatory coherence in social explanations: A parallel distributed processing account. Journal of Personality and Social Psychology 3 (1993), 429.
[124]
Cheng Ren, Zachary Pardos, and Zhi Li. 2024. Human-AI Collaboration Increases Skill Tagging Speed but Degrades Accuracy. arXiv preprint arXiv:2403.02259 (2024).
[125]
Thomson Reuters. 2023. US lawmaker urges labelling, restrictions on AI content. https://www.reuters.com/technology/us-lawmaker-urges-labelling-restrictions-ai-content-2023-06--29/
[126]
Carlo Reverberi, Tommaso Rigon, Aldo Solari, Cesare Hassan, Paolo Cherubini, and Andrea Cherubini. 2022. Experimental evidence of effective human--AI collaboration in medical decision-making. Scientific reports 1 (2022), 14952.
[127]
Vincent Robbemond, Oana Inel, and Ujwal Gadiraju. 2022. Understanding the Role of Explanation Modality in AI-assisted Decision-making. In Proceedings of the 30th ACM Conf. on User Modeling, Adaptation and Personalization. 223--233.
[128]
Francisco Rodríguez-Sánchez, Jorge Carrillo-de Albornoz, and Laura Plaza. 2020. Automatic Classification of Sexism in Social Networks: An Empirical Study on Twitter Data. IEEE Access (2020), 219563--219576.
[129]
Chris Russell, Rory Mc Grath, and Luca Costabello. 2020. Learning Relevant Explanations.
[130]
Christine Rzepka, Benedikt Berger, and Thomas Hess. 2022. Voice assistant vs. Chatbot--examining the fit between conversational agents interaction modalities and information search tasks. Information Systems Frontiers 3 (2022), 839--856.
[131]
Mike Schaekermann. 2020. Human-AI Interaction in the Presence of Ambiguity: From Deliberation-based Labeling to Ambiguity-aware AI. PhD Thesis. UWSpace. http://hdl.handle.net/10012/16284
[132]
Mike Schaekermann, Graeme Beaton, Elaheh Sanoubari, Andrew Lim, Kate Larson, and Edith Law. 2020. Ambiguity-aware ai assistants for medical data analysis. In Proceedings of the 2020 CHI Conf. on human factors in computing systems. 1--14.
[133]
Beau G. Schelble, Christopher Flathmann, Nathan J. McNeese, Guo Freeman, and Rohit Mallick. 2022. Let's Think Together! Assessing Shared Mental Models, Performance, and Trust in Human-Agent Teams. Proc. ACM Hum.-Comput. Interact. GROUP, Article 13 (jan 2022), 29 pages. https://doi.org/10.1145/3492832
[134]
Björn W Schuller, Tuomas Virtanen, Maria Riveiro, Georgios Rizos, Jing Han, Annamaria Mesaros, and Konstantinos Drossos. 2021. Towards sonification in multimodal and user-friendlyexplainable artificial intelligence. In Proceedings of the 2021 Int'l Conf. on Multimodal Interaction. 788--792.
[135]
Katie Seaborn, Norihisa P. Miyake, Peter Pennefather, and Mihoko Otake-Matsuura. 2021. Voice in Human--Agent Interaction: A Survey. ACM Comput. Surv. 4, Article 81 (may 2021), 43 pages. https://doi.org/10.1145/3386867
[136]
Emre Sezgin, Joseph Sirrianni, and Simon L Linwood. 2022. Operationalizing and Implementing Pretrained, Large Artificial Intelligence Linguistic Models in the US Health Care System: Outlook of Generative Pretrained Transformer 3 (GPT-3) as a Service Model. JMIR Med Inform 2 (10 Feb 2022), e32875. https://doi.org/10.2196/32875
[137]
Murray Shanahan. 2022. Talking about large language models. arXiv preprint arXiv:2212.03551 (2022).
[138]
Matthew Shardlow and Piotr Przybyła. 2022. Deanthropomorphising NLP: Can a Language Model Be Conscious? arXiv preprint arXiv:2211.11483 (2022).
[139]
Ashish Sharma, Inna W Lin, Adam S Miner, David C Atkins, and Tim Althoff. 2023. Human--AI collaboration enables more empathic conversations in text-based peer-to-peer mental health support. Nature Machine Intelligence 1 (2023), 46--57.
[140]
Ashish Sharma, Sudha Rao, Chris Brockett, Akanksha Malhotra, Nebojsa Jojic, and Bill Dolan. 2024. Investigating Agency of LLMs in Human-AI Collaboration Tasks. In Proceedings of the 18th Conf. of the European Chapter of the Association for Computational Linguistics (Vol. 1: Long Papers), Yvette Graham and Matthew Purver (Eds.). Association for Computational Linguistics, St. Julian's, Malta, 1968--1987. https://aclanthology.org/2024.eacl-long.119
[141]
Hua Shen, Chieh-Yang Huang, Tongshuang Wu, and Ting-Hao Kenneth Huang. 2023. ConvXAI: Delivering heterogeneous AI explanations via conversations to support human-AI scientific writing. In Companion Publication of the 2023 Conf. on Computer Supported Cooperative Work and Social Computing. 384--387.
[142]
Nina Svenningsson and Montathar Faraon. 2019. Artificial intelligence in conversational agents: A study of factors related to perceived humanness in chatbots. In Proceedings of the 2019 2nd Artificial Intelligence and Cloud Computing Conf. 151--161.
[143]
Maxwell Szymanski, Martijn Millecamp, and Katrien Verbert. 2021. Visual, textual or hybrid: the effect of user expertise on different explanations. In 26th Int'l Conf. on Intelligent User Interfaces. 109--119.
[144]
Hiroki Tanaka, Sakti Sakriani, Graham Neubig, Tomoki Toda, Hideki Negoro, Hidemi Iwasaka, and Satoshi Nakamura. 2016. Teaching Social Communication Skills Through Human-Agent Interaction. ACM Trans. Interact. Intell. Syst. 2, Article 18 (aug 2016), 26 pages. https://doi.org/10.1145/2937757
[145]
Daniel Vale, Ali El-Sharif, and Muhammed Ali. 2022. Explainable artificial intelligence (XAI) post-hoc explainability methods: Risks and limitations in non-discrimination law. AI and Ethics 4 (2022), 815--826.
[146]
Niels Van Berkel, Jorge Goncalves, Daniel Russo, Simo Hosio, and Mikael B Skov. 2021. Effect of information presentation on fairness perceptions of machine learning predictors. In Proceedings of the 2021 CHI Conf. on Human Factors in Computing Systems. 1--13.
[147]
Vanshika Vats, Marzia Binta Nizam, Minghao Liu, Ziyuan Wang, Richard Ho, Mohnish Sai Prasad, Vincent Titterton, Sai Venkat Malreddy, Riya Aggarwal, Yanwen Xu, et al. 2024. A Survey on Human-AI Teaming with Large Pre-Trained Models. arXiv preprint arXiv:2403.04931 (2024).
[148]
Giulia Vilone and Luca Longo. 2020. Explainable artificial intelligence: a systematic review. arXiv preprint arXiv:2006.00093 (2020).
[149]
Xinru Wang and Ming Yin. 2021. Are explanations helpful? a comparative study of the effects of explanations in ai-assisted decision-making. In 26th Int'l Conf. on Intelligent User Interfaces. 318--328.
[150]
Xinru Wang and Ming Yin. 2023. Watch Out for Updates: Understanding the Effects of Model Explanation Updates in AI-Assisted Decision Making. In Proceedings of the 2023 CHI Conf. on Human Factors in Computing Systems (Hamburg, Germany) (CHI '23). ACM, New York, NY, USA, Article 758, 19 pages. https://doi.org/10.1145/3544548.3581366
[151]
Zeerak Waseem. 2016. Are you a racist or am i seeing things? annotator influence on hate speech detection on twitter. In Proceedings of the first workshop on NLP and computational social science. 138--142.
[152]
Zeerak Waseem and Dirk Hovy. 2016. Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter. In Proceedings of the NAACL Student Research Workshop. Association for Computational Linguistics, San Diego, California, 88--93. https://doi.org/10.18653/v1/N16--2013
[153]
Maximilian Wich, Jan Bauer, and Georg Groh. 2020. Impact of politically biased data on hate speech classification. In Proceedings of the fourth workshop on online abuse and harms. 54--64.
[154]
Sarah Wiegreffe, Jack Hessel, Swabha Swayamdipta, Mark Riedl, and Yejin Choi. 2022. Reframing Human-AI Collaboration for Generating Free-Text Explanations. arxiv: 2112.08674
[155]
Kyle Wiggers. 2023. Researchers discover a way to make CHATGPT consistently toxic. https://techcrunch.com/2023/04/12/researchers-discover-a-way-to-make-chatgpt-consistently-toxic/
[156]
Magdalena Wischnewski, Nicole Krämer, and Emmanuel Müller. 2023. Measuring and Understanding Trust Calibrations for Automated Systems: A Survey of the State-Of-The-Art and Future Directions. In Proceedings of the 2023 CHI Conf. on Human Factors in Computing Systems. 1--16.
[157]
Chloe Wittenberg, Ziv Epstein, Adam J. Berinsky, and David G. Rang. 2023. Labeling AI-Generated Content: Promises, Perils, and Future Directions. https://computing.mit.edu/wp-content/uploads/2023/11/AI-Policy_Labeling.pdf
[158]
Robert F Woolson. 2007. Wilcoxon signed-rank test. Wiley encyclopedia of clinical trials (2007), 1--3.
[159]
Charles Yu, Sullam Jeoung, Anish Kasi, Pengfei Yu, and Heng Ji. 2023. Unlearning bias in language models by partitioning gradients. In Findings of the Association for Computational Linguistics: ACL 2023. 6032--6048.
[160]
Rui Zhang, Nathan J. McNeese, Guo Freeman, and Geoff Musick. 2021. "An Ideal Human": Expectations of AI Teammates in Human-AI Teaming. Proc. ACM Hum.-Comput. Interact. CSCW3, Article 246 (jan 2021), 25 pages. https://doi.org/10.1145/3432945
[161]
Yunfeng Zhang, Q Vera Liao, and Rachel KE Bellamy. 2020. Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making. In Proceedings of the 2020 Conf. on Fairness, Accountability, and Transparency. 295--305.
[162]
Jianlong Zhou, Zhidong Li, Huaiwen Hu, Kun Yu, Fang Chen, Zelin Li, and Yang Wang. 2019. Effects of influence on user trust in predictive decision making. In Extended Abstracts of the 2019 CHI Conf. on Human Factors in Computing Systems. 1--6.
[163]
Qingxiaoyang Zhu and Hao-Chuan Wang. 2023. Leveraging Large Language Model as Support for Human Problem Solving: An Exploration of Its Appropriation and Impact. In Companion Publication of the 2023 Conf. on Computer Supported Cooperative Work and Social Computing. 333--337.

Index Terms

  1. The Explanation That Hits Home: The Characteristics of Verbal Explanations That Affect Human Perception in Subjective Decision-Making

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the ACM on Human-Computer Interaction
      Proceedings of the ACM on Human-Computer Interaction  Volume 8, Issue CSCW2
      CSCW
      November 2024
      5177 pages
      EISSN:2573-0142
      DOI:10.1145/3703902
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 08 November 2024
      Published in PACMHCI Volume 8, Issue CSCW2

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. collaborative decision-making
      2. explainable ai
      3. explanation characteristics
      4. perceptions
      5. subjectivity
      6. verbal explanation

      Qualifiers

      • Research-article

      Funding Sources

      • Naver Corp

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 152
        Total Downloads
      • Downloads (Last 12 months)152
      • Downloads (Last 6 weeks)42
      Reflects downloads up to 20 Jan 2025

      Other Metrics

      Citations

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media