skip to main content
10.1145/3544548.3580805acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article

Is this AI trained on Credible Data? The Effects of Labeling Quality and Performance Bias on User Trust

Published: 19 April 2023 Publication History

Abstract

To promote data transparency, frameworks such as CrowdWorkSheets encourage documentation of annotation practices on the interfaces of AI systems, but we do not know how they affect user experience. Will the quality of labeling affect perceived credibility of training data? Does the source of annotation matter? Will a credible dataset persuade users to trust a system even if it shows racial biases in its predictions? To find out, we conducted a user study (N = 430) with a prototype of a classification system, using a 2 (labeling quality: high vs. low) × 4 (source: others-as-source vs. self-as-source cue vs. self-as-source voluntary action, vs. self-as-source forced action) × 3 (AI performance: none vs. biased vs. unbiased) experiment. We found that high-quality labeling leads to higher perceived training data credibility, which in turn enhances users’ trust in AI, but not when the system shows bias. Practical implications for explainable and ethical AI interfaces are discussed.

Supplementary Material

MP4 File (3544548.3580805-talk-video.mp4)
Pre-recorded Video Presentation

References

[1]
Simon Albrecht and Anthony Travaglione. 2003. Trust in public-sector senior management. International Journal of Human Resource Management 14, 1(2003), 76–92. https://doi.org/10.1080/09585190210158529
[2]
Ariful Islam Anik and Andrea Bunt. 2021. Data-centric explanations: explaining training data of machine learning systems to promote transparency. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY; United States, 1–13. https://doi.org/10.1145/3411764.3445736
[3]
Alejandro Barredo Arrieta, Natalia Díaz-Rodríguez, Javier Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado, Salvador García, Sergio Gil-López, Daniel Molina, Richard Benjamins, 2020. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information fusion 58(2020), 82–115. https://doi.org/10.1016/j.inffus.2019.12.012
[4]
Michelle Bao, Angela Zhou, Samantha Zottola, Brian Brubach, Sarah Desmarais, Aaron Horowitz, Kristian Lum, and Suresh Venkatasubramanian. 2021. It’s COMPASlicated: The Messy Relationship between RAI Datasets and Algorithmic Fairness Benchmarks. In Proceedings of the 35th Conference on Neutral Information Proceesing Systems. https://doi.org/10.48550/arXiv.2106.05498
[5]
Emily M Bender and Batya Friedman. 2018. Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics 6 (2018), 587–604. https://doi.org/10.1162/tacl_a_00041
[6]
Reuben Binns, Max Van Kleek, Michael Veale, Ulrik Lyngs, Jun Zhao, and Nigel Shadbolt. 2018. ’It’s reducing a human being to a percentage’ perceptions of justice in algorithmic decisions. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, United States, 1–14. https://doi.org/10.1145/3173574.3173951
[7]
Judee K Burgoon. 1993. Interpersonal expectations, expectancy violations, and emotional communication. Journal of language and social psychology 12, 1-2 (1993), 30–48. https://doi.org/10.1177/0261927X93121003
[8]
Judee K Burgoon, Joseph A Bonito, Paul Benjamin Lowry, Sean L Humpherys, Gregory D Moody, James E Gaskin, and Justin Scott Giboney. 2016. Application of expectancy violations theory to communication with and judgments about embodied agents during a decision-making task. International Journal of Human-Computer Studies 91 (2016), 24–36. https://doi.org/10.1016/j.ijhcs.2016.02.002
[9]
Cheng Chen. 2022. Communicating racial bias in AI algorithms: Effects of training data diversity and user feedback on AI trust. Ph. D. Dissertation. Pennsylvania State University, State College, PA.
[10]
Cheng Chen and S. Shyam Sundar. 2021. Combating algorithmic bias: Should AI media show and tell to gain user trust? (2021). Poster presented at the 2021 ICDS Symposium on Fairness of Machine Learning.
[11]
Irene Chen, Fredrik D Johansson, and David Sontag. 2018. Why is my classifier discriminatory?Proceedings of the 32nd International Conference on Neural Information Processing Systems 31(2018), 3543–3554. https://doi.org/10.5555/3327144.3327272
[12]
Alexandra Chouldechova and Aaron Roth. 2020. A snapshot of the frontiers of fairness in machine learning. Commun. ACM 63, 5 (2020), 82–89. https://doi.org/10.1145/3376898
[13]
Mark Díaz, Ian Kivlichan, Rachel Rosen, Dylan Baker, Razvan Amironesei, Vinodkumar Prabhakaran, and Emily Denton. 2022. Crowdworksheets: Accounting for individual and collective identities underlying crowdsourced dataset annotation. In 2022 ACM Conference on Fairness, Accountability, and Transparency. Association for Computing Machinery, New York, NY, USA, 2342–2351. https://doi.org/10.1145/3531146.3534647
[14]
Berkeley J Dietvorst, Joseph P Simmons, and Cade Massey. 2015. Algorithm aversion: people erroneously avoid algorithms after seeing them err.Journal of Experimental Psychology: General 144, 1 (2015), 114. https://doi.org/10.1037/xge0000033
[15]
Berkeley J Dietvorst, Joseph P Simmons, and Cade Massey. 2018. Overcoming algorithm aversion: People will use imperfect algorithms if they can (even slightly) modify them. Management Science 64, 3 (2018), 1155–1170. https://doi.org/10.1287/mnsc.2016.2643
[16]
Jonathan Dodge, Q Vera Liao, Yunfeng Zhang, Rachel KE Bellamy, and Casey Dugan. 2019. Explaining models: An empirical study of how explanations impact fairness judgment. In Proceedings of the 24th International Conference on Intelligent User Interfaces. Association for Computing Machinery, New York, NY, United States, 275–285. https://doi.org/10.1145/3301275.3302310
[17]
Motahhare Eslami, Kristen Vaccaro, Karrie Karahalios, and Kevin Hamilton. 2017. “Be careful; things can be worse than they appear”: Understanding Biased Algorithms and Users’ Behavior around Them in Rating Platforms. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 11. AAAI Press, Palo Alto, California, USA, 62–71. https://doi.org/10.1609/icwsm.v11i1.14898
[18]
Franz Faul, Edgar Erdfelder, Albert-Georg Lang, and Axel Buchner. 2007. G* Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods 39, 2 (2007), 175–191. https://doi.org/10.3758/bf03193146
[19]
Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé Iii, and Kate Crawford. 2021. Datasheets for datasets. Commun. ACM 64, 12 (2021), 86–92. https://doi.org/10.48550/arXiv.1803.09010
[20]
Karen Hao. 2019. AI is sending people to jail—and getting it wrong. MIT Technology Review. Retrieved December 8, 2022 from https://www.technologyreview.com/2019/01/21/137783/algorithms-criminal-justice-ai/
[21]
Andrew F Hayes. 2017. Introduction to mediation, moderation, and conditional process analysis: A regression-based approach. Guilford publications, New York, NY, United States.
[22]
Sarah Holland, Ahmed Hosny, Sarah Newman, Joshua Joseph, and Kasia Chmielinski. 2020. The dataset nutrition label. Data Protection and Privacy, Volume 12: Data Protection and Democracy 12 (2020), 1. https://doi.org/10.48550/arXiv.1805.03677
[23]
Joo-Wha Hong, Ignacio Cruz, and Dmitri Williams. 2021. AI, you can drive my car: How we evaluate human drivers vs. self-driving cars. Computers in Human Behavior 125 (2021), 106944. https://doi.org/10.1016/j.chb.2021.106944
[24]
Yan Huang and S Shyam Sundar. 2022. Do we trust the crowd? Effects of crowdsourcing on perceived credibility of online health information. Health Communication 37, 1 (2022), 93–102. https://doi.org/10.1080/10410236.2020.1824662
[25]
Jiun-Yin Jian, Ann M Bisantz, and Colin G Drury. 2000. Foundations for an empirically determined scale of trust in automated systems. International Journal of Cognitive Ergonomics 4, 1 (2000), 53–71. https://doi.org/10.1207/S15327566IJCE0401_04
[26]
Devon Johnson and Kent Grayson. 2005. Cognitive and affective trust in service relationships. Journal of Business Research 58, 4 (2005), 500–507. https://doi.org/10.1016/S0148-2963(03)00140-1
[27]
René F Kizilcec. 2016. How much information? Effects of transparency on trust in an algorithmic interface. In Proceedings of the 2016 CHI conference on human factors in computing systems. Association for Computing Machinery, New York, NY, USA, 2390–2395. https://doi.org/10.1145/2858036.2858402
[28]
Heidi Ledford. 2019. Millions of black people affected by racial bias in health-care algorithms. Nature 574, 7780 (2019), 608–610. https://www.nature.com/articles/d41586-019-03228-6
[29]
John D Lee and Katrina A See. 2004. Trust in automation: Designing for appropriate reliance. Human factors 46, 1 (2004), 50–80. https://doi.org/10.1518/hfes.46.1.50_30392
[30]
Q Vera Liao, Moninder Singh, Yunfeng Zhang, and Rachel Bellamy. 2021. Introduction to explainable AI. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1–3. https://doi.org/10.1145/3411763.3445016
[31]
Q Vera Liao and S Shyam Sundar. 2022. Designing for Responsible Trust in AI Systems: A Communication Perspective. In FAccT ’22: 2022 ACM Conference on Fairness, Accountability, and Transparency. Association for Computing Machinery, New York, NY, United States, 1257–1268. https://doi.org/10.1145/3531146.3533182
[32]
Leib Litman, Jonathan Robinson, and Tzvi Abberbock. 2017. TurkPrime.com: A versatile crowdsourcing data acquisition platform for the behavioral sciences. Behavior Research Methods 49, 2 (2017), 433–442. https://doi.org/10.3758/s13428-016-0727-z
[33]
Daniel J McAllister. 1995. Affect-and cognition-based trust as foundations for interpersonal cooperation in organizations. Academy of Management Journal 38, 1 (1995), 24–59. https://doi.org/10.2307/256727
[34]
Miriam J Metzger and Andrew J Flanagin. 2013. Credibility and trust of information in online environments: The use of cognitive heuristics. Journal of Pragmatics 59(2013), 210–220. https://doi.org/10.1016/j.pragma.2013.07.012
[35]
David Meyer. 2018. Amazon reportedly killed an AI recruitment system because it couldn’t stop the tool from discriminating against women. Fortune. Retrieved September 13, 2022 from https://fortune.com/2018/10/10/amazon-ai-recruitment-bias-women-sexist/
[36]
Dale T Miller. 1976. Ego involvement and attributions for success and failure.Journal of Personality and Social Psychology 34, 5(1976), 901. https://doi.org/10.1037/0022-3514.34.5.901
[37]
Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency. Association for Computing Machinery, New York, NY, USA, 220–229. https://doi.org/10.1145/3287560.3287596
[38]
Maria D Molina and S Shyam Sundar. 2022. When AI moderates online content: effects of human collaboration and interactive transparency on user trust. Journal of Computer-Mediated Communication 27, 4 (2022), zmac010. https://doi.org/10.1093/jcmc/zmac010
[39]
Kathleen L Mosier, Linda J Skitka, Susan Heers, and Mark Burdick. 2017. Automation bias: Decision making and performance in high-tech cockpits. In Decision Making in Aviation. Routledge, 271–288. https://doi.org/10.1207/s15327108ijap0801_3
[40]
Daniel J O’Keefe. 2003. Message properties, mediating states, and manipulation checks: Claims, evidence, and data analysis in experimental persuasive message effects research. Communication Theory 13, 3 (2003), 251–274. https://doi.org/10.1111/j.1468-2885.2003.tb00292.x
[41]
Amandalynne Paullada, Inioluwa Deborah Raji, Emily M Bender, Emily Denton, and Alex Hanna. 2021. Data and its (dis) contents: A survey of dataset development and use in machine learning research. Patterns 2, 11 (2021), 100336. https://doi.org/10.1016/j.patter.2021.100336
[42]
A Phaneuf. 2020. Artificial intelligence in financial services: Applications and benefits of AI in finance. Insider Intelligence. Retrieved December 8, 2022 from https://www.insiderintelligence.com/insights/ai-in-finance/#: :text=AI%20is%20particularly%20helpful%20in,underwriting%20and%20reduce%20financial%20risk.
[43]
Amy Rechkemmer and Ming Yin. 2022. When confidence meets accuracy: Exploring the effects of multiple performance indicators on trust in machine learning models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, United States, 1–14. https://doi.org/10.1145/3491102.3501967
[44]
Wojciech Samek and Klaus-Robert Müller. 2019. Towards explainable artificial intelligence. In Explainable AI: Interpreting, explaining and visualizing deep learning. Springer, 5–22. https://doi.org/10.1145/3491102.3501967
[45]
Andrew D Selbst, Danah Boyd, Sorelle A Friedler, Suresh Venkatasubramanian, and Janet Vertesi. 2019. Fairness and abstraction in sociotechnical systems. In Proceedings of the Conference on Fairness, Accountability, and Transparency. Association for Computing Machinery, New York, NY, USA, 59–68. https://doi.org/10.1145/3287560.3287598
[46]
Alison Smith-Renner, Ron Fan, Melissa Birchfield, Tongshuang Wu, Jordan Boyd-Graber, Daniel S Weld, and Leah Findlater. 2020. No explainability without accountability: An empirical study of explanations and feedback in interactive ML. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, United States, 1–13. https://doi.org/10.1145/3313831.3376624
[47]
Hyeonjin Soh, Leonard N Reid, and Karen Whitehill King. 2009. Measuring trust in advertising. Journal of Advertising 38, 2 (2009), 83–104. https://doi.org/10.2753/JOA0091-3367380206
[48]
Yuan Sun and S Shyam Sundar. 2022. Exploring the effects of interactive dialogue in improving user control for explainable online symptom checkers. In CHI Conference on Human Factors in Computing Systems Extended Abstracts. Association for Computing Machinery, New York, NY, United States, 1–7. https://doi.org/10.1145/3491101.3519668
[49]
S Shyam Sundar. 2008. The MAIN model: A heuristic approach to understanding technology effects on credibility. MacArthur Foundation Digital Media and Learning Initiative, Cambridge, MA. https://doi.org/10.1162/dmal.9780262562324.073
[50]
S Shyam Sundar. 2020. Rise of machine agency: A framework for studying the psychology of human–AI interaction (HAII). Journal of Computer-Mediated Communication 25, 1 (2020), 74–88. https://doi.org/10.1093/jcmc/zmz026
[51]
S Shyam Sundar, Haiyan Jia, T Franklin Waddell, and Yan Huang. 2015. Toward a theory of interactive media effects (TIME) four models for explaining how interface features affect user psychology. The Handbook of the Psychology of Communication Technology (2015), 47–86. https://doi.org/10.1002/9781118426456.ch3
[52]
S Shyam Sundar and Clifford Nass. 2001. Conceptualizing sources in online news. Journal of Communication 51, 1 (2001), 52–72. https://doi.org/10.1111/j.1460-2466.2001.tb02872.x
[53]
Chun-Hua Tsai, Yue You, Xinning Gui, Yubo Kou, and John M Carroll. 2021. Exploring and promoting diagnostic transparency and explainability in online symptom checkers. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, United States, 1–17. https://doi.org/10.1145/3411764.3445101
[54]
Endel Tulving, Daniel L Schacter, and Heather A Stark. 1982. Priming effects in word-fragment completion are independent of recognition memory. Journal of Experimental Psychology: Learning, Memory, and Cognition 8, 4(1982), 336. https://doi.org/10.1037/0278-7393.8.4.336
[55]
Amos Tversky and Daniel Kahneman. 1974. Judgment under uncertainty: Heuristics and biases. science 185, 4157 (1974), 1124–1131. https://doi.org/10.1126/science.185.4157.1124
[56]
Sahil Verma and Julia Rubin. 2018. Fairness definitions explained. In 2018 IEEE/ACM International Workshop on Software Fairness (FairWare). IEEE, 1–7. https://doi.org/10.1145/3194770.3194776
[57]
Ruotong Wang, F Maxwell Harper, and Haiyi Zhu. 2020. Factors influencing perceived fairness in algorithmic decision-making: Algorithm outcomes, development procedures, and individual differences. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, United States, 1–14. https://doi.org/10.1145/3313831.3376813
[58]
Allison Woodruff, Sarah E Fox, Steven Rousso-Schindler, and Jeffrey Warshaw. 2018. A qualitative exploration of perceptions of algorithmic fairness. In Proceedings of the 2018 CHI conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, United States, 1–14. https://doi.org/10.1145/3173574.3174230
[59]
Wencan Zhang and Brian Y Lim. 2022. Towards relatable explainable AI with the perceptual process. In CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, United States, 1–24. https://doi.org/10.1145/3491102.3501826
[60]
Dolf Zillmann. 2002. Exemplification theory of media influence. In Media effects. Routledge, 29–52.

Cited By

View all
  • (2025)Gender Bias in AIDimensions of Diversity, Equity, Inclusion, and Belonging in Business10.4018/979-8-3693-3876-6.ch004(87-106)Online publication date: 10-Jan-2025
  • (2025)On the Readiness of Scientific Data Papers for a Fair and Transparent Use in Machine LearningScientific Data10.1038/s41597-025-04402-412:1Online publication date: 13-Jan-2025
  • (2024)Trust in artificial intelligence: Producing ontological security through governmental visionsCooperation and Conflict10.1177/00108367241288073Online publication date: 19-Oct-2024
  • Show More Cited By

Index Terms

  1. Is this AI trained on Credible Data? The Effects of Labeling Quality and Performance Bias on User Trust

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CHI '23: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems
    April 2023
    14911 pages
    ISBN:9781450394215
    DOI:10.1145/3544548
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 April 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. algorithmic bias
    2. data labeling quality
    3. labeling source
    4. training data credibility
    5. trust in AI

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    CHI '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

    Upcoming Conference

    CHI 2025
    ACM CHI Conference on Human Factors in Computing Systems
    April 26 - May 1, 2025
    Yokohama , Japan

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)816
    • Downloads (Last 6 weeks)148
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Gender Bias in AIDimensions of Diversity, Equity, Inclusion, and Belonging in Business10.4018/979-8-3693-3876-6.ch004(87-106)Online publication date: 10-Jan-2025
    • (2025)On the Readiness of Scientific Data Papers for a Fair and Transparent Use in Machine LearningScientific Data10.1038/s41597-025-04402-412:1Online publication date: 13-Jan-2025
    • (2024)Trust in artificial intelligence: Producing ontological security through governmental visionsCooperation and Conflict10.1177/00108367241288073Online publication date: 19-Oct-2024
    • (2024)Is Your Prompt Detailed Enough? Exploring the Effects of Prompt Coaching on Users' Perceptions, Engagement, and Trust in Text-to-Image Generative AI ToolsProceedings of the Second International Symposium on Trustworthy Autonomous Systems10.1145/3686038.3686060(1-12)Online publication date: 16-Sep-2024
    • (2024)TwIPS: A Large Language Model Powered Texting Application to Simplify Conversational Nuances for Autistic UsersProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675633(1-18)Online publication date: 27-Oct-2024
    • (2024)VIME: Visual Interactive Model Explorer for Identifying Capabilities and Limitations of Machine Learning Models for Sequential Decision-MakingProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676323(1-21)Online publication date: 13-Oct-2024
    • (2024)Human-Algorithmic Interaction Using a Large Language Model-Augmented Artificial Intelligence Clinical Decision Support SystemProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642024(1-20)Online publication date: 11-May-2024
    • (2024)A Review of Data-Centric Artificial Intelligence (DCAI) and its Impact on manufacturing Industry: Challenges, Limitations, and Future Directions2024 IEEE Conference on Artificial Intelligence (CAI)10.1109/CAI59869.2024.00018(44-51)Online publication date: 25-Jun-2024
    • (2024)Visioning a two-level human–machine communication framework: initiating conversations between explainable AI and communicationCommunication Theory10.1093/ct/qtae01634:4(216-229)Online publication date: 30-Jul-2024
    • (2024)Uncovering labeler bias in machine learning annotation tasksAI and Ethics10.1007/s43681-024-00572-wOnline publication date: 16-Sep-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media