skip to main content
10.1145/3544549.3585808acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
Work in Progress

Trust Issues with Trust Scales: Examining the Psychometric Quality of Trust Measures in the Context of AI

Published:19 April 2023Publication History

ABSTRACT

Trust is crucial for human interaction with artificial intelligence (AI) and is frequently measured through questionnaires or rating scales. One commonly used questionnaire in AI research is the Trust between People and Automation scale (TPA). However, its psychometric quality has yet to be examined in the context of AI. More recently, a Trust Scale for Explainable AI (TXAI) was recommended but not empirically evaluated. In this study, we assessed the psychometric qualities of both scales, using confirmatory and exploratory factor analyses to test the scales’ validity and coefficients α and ω for reliability estimation. Our results suggested good psychometric quality for the TXAI after removing one item. Concerning the TPA, acceptable quality was only achieved when using a two-factor model (trust and distrust) and after removing two items. We provide recommendations for using the two scales and evidence to distinguish trust and distrust as separate psychological constructs.

Footnotes

  1. 1 Low χ2 value and p > .05 for the χ2 test, RMSEA < .06, SRMR ≤ .08 and .95 ≤ CFI ≤ 1

    Footnote
Skip Supplemental Material Section

Supplemental Material

3544549.3585808-talk-video.mp4

mp4

56.7 MB

References

  1. J. Jackson Barnette. 2000. Effects of Stem and Likert Response Option Reversals on Survey Internal Consistency: If You Feel the Need, There is a Better Alternative to Using those Negatively Worded Stems. Educational and Psychological Measurement 60, 3 (2000), 361–370. https://doi.org/10.1177/00131640021970592Google ScholarGoogle ScholarCross RefCross Ref
  2. Florian Brühlmann, Serge Petralito, Lena Aeschbach, and Klaus Opwis. 2020. The Quality of Data Collected Online: An Investigation of Careless Responding in a Crowdsourced Sample. Methods in Psychology 2 (04 2020), 100022. https://doi.org/10.1016/j.metip.2020.100022Google ScholarGoogle ScholarCross RefCross Ref
  3. Zana Buçinca, Phoebe Lin, Krzysztof Z. Gajos, and Elena L. Glassman. 2020. Proxy Tasks and Subjective Measures Can Be Misleading in Evaluating Explainable AI Systems. In Proceedings of the 25th International Conference on Intelligent User Interfaces (Cagliari, Italy) (IUI ’20). ACM, New York, NY, USA, 454–464. https://doi.org/10.1145/3377325.3377498Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. John T Cacioppo and Gary G Berntson. 1994. Relationship Between Attitudes and Evaluative Space: A Critical Review, with Emphasis on the Separability of Positive and Negative Substrates.Psychological Bulletin 115, 3 (1994), 401–423. https://doi.org/10.1037/0033-2909.115.3.401Google ScholarGoogle ScholarCross RefCross Ref
  5. Lee J Cronbach. 1951. Coefficient Alpha and the Internal Structure of Tests. Psychometrika 16, 3 (1951), 297–334. https://doi.org/10.1007/BF02310555Google ScholarGoogle ScholarCross RefCross Ref
  6. Robert F. DeVellis. 2017. Scale Development: Theory and Applications(4th ed.). SAGE publications, Inc., Thousand Oaks, CA, USA.Google ScholarGoogle Scholar
  7. Alice H Eagly and Shelly Chaiken. 1993. The Psychology of Attitudes. Harcourt Brace Jovanovich College Publishers.Google ScholarGoogle Scholar
  8. Mike Furr. 2011. Scale Construction and Psychometrics for Social and Personality Psychology. SAGE publications, Ltd., London, UK.Google ScholarGoogle Scholar
  9. Robert S. Gutzwiller, Erin K. Chiou, Scotty D. Craig, Christina M. Lewis, Glenn J. Lematta, and Chi-Ping Hsiung. 2019. Positive Bias in the ‘Trust in Automated Systems Survey’? An Examination of the Jian et al. (2000) Scale. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 63, 1(2019), 217–221. https://doi.org/10.1177/1071181319631201Google ScholarGoogle ScholarCross RefCross Ref
  10. Joseph F. Hair, William C. Black, Barry J. Babin, and Rolph E. Anderson. 2010. Multivariate Data Analysis(7th ed.). Prentice Hall, Hoboken, NJ, USA. 785 pages.Google ScholarGoogle Scholar
  11. N Henze and B Zirkler. 1990. A Class of Invariant Consistent Tests for Multivariate Normality. Communications in statistics-Theory and Methods 19, 10 (1990), 3595–3617.Google ScholarGoogle Scholar
  12. Kevin A. Hoff and Masooda Bashir. 2015. Trust in Automation: Integrating Empirical Evidence on Factors That Influence Trust. Human Factors 57, 3 (2015), 407–434. https://doi.org/10.1177/0018720814547570Google ScholarGoogle ScholarCross RefCross Ref
  13. Robert R. Hoffman, Shane T. Mueller, Gary Klein, and Jordan Litman. 2023. Measures for explainable AI: Explanation goodness, user satisfaction, mental models, curiosity, trust, and human-AI performance. Frontiers in Computer Science 5 (2023), 15 pages. https://doi.org/10.3389/fcomp.2023.1096257Google ScholarGoogle ScholarCross RefCross Ref
  14. Kenneth D Hopkins. 1998. Educational and Psychological Measurement and Evaluation. Pearson, London, UK.Google ScholarGoogle Scholar
  15. Matt C. Howard. 2016. A Review of Exploratory Factor Analysis Decisions and Overview of Current Practices: What We Are Doing and How Can We Improve?International Journal of Human–Computer Interaction 32, 1(2016), 51–62. https://doi.org/10.1080/10447318.2015.1087664Google ScholarGoogle ScholarCross RefCross Ref
  16. Li‐tze Hu and Peter M. Bentler. 1999. Cutoff Criteria for Fit Indexes in Covariance Structure Analysis: Conventional Criteria Versus new Alternatives. Structural Equation Modeling: A Multidisciplinary Journal 6, 1(1999), 1–55. https://doi.org/10.1080/10705519909540118Google ScholarGoogle ScholarCross RefCross Ref
  17. Jiun-Yin Jian, Ann M. Bisantz, and Colin G. Drury. 2000. Foundations for an Empirically Determined Scale of Trust in Automated Systems. International Journal of Cognitive Ergonomics 4, 1 (2000), 53–71. https://doi.org/10.1207/S15327566IJCE0401_04Google ScholarGoogle ScholarCross RefCross Ref
  18. Elizabeth F. Juniper. 2009. Validated Questionnaires Should not be Modified. European Respiratory Journal 34, 5 (2009), 1015–1017. https://doi.org/10.1183/09031936.00110209Google ScholarGoogle ScholarCross RefCross Ref
  19. Markus Langer, Tim Hunsicker, Tina Feldkamp, Cornelius J. König, and Nina Grgić-Hlača. 2022. “Look! It’s a Computer Program! It’s an Algorithm! It’s AI!”: Does Terminology Affect Human Perceptions and Evaluations of Algorithmic Decision-Making Systems?. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 581, 28 pages. https://doi.org/10.1145/3491102.3517527Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. John D. Lee and Katrina A. See. 2004. Trust in Automation: Designing for Appropriate Reliance. Human factors 46, 1 (2004), 50–80. https://doi.org/10.1518/hfes.46.1.50_30392Google ScholarGoogle ScholarCross RefCross Ref
  21. Roy J Lewicki, Daniel J McAllister, and Robert J Bies. 1998. Trust and Distrust: New Relationships and Realities. Academy of management Review 23, 3 (1998), 438–458.Google ScholarGoogle Scholar
  22. Kantilal Vardichand Mardia. 1970. Measures of Multivariate Skewness and Kurtosis with Applications. Biometrika 57, 3 (1970), 519–530. https://doi.org/10.1093/biomet/57.3.519Google ScholarGoogle ScholarCross RefCross Ref
  23. Roger C. Mayer, James H. Davis, and F. David Schoorman. 1995. An Integrative Model of Organizational Trust. Academy of Management Review 20, 3 (1995), 709–734. https://doi.org/10.2307/258792Google ScholarGoogle ScholarCross RefCross Ref
  24. Roderick P McDonald. 1999. Test Theory: A Unified Treatment. Lawrence Erlbaum Associates, Inc., Mahwah, NJ, USA.Google ScholarGoogle Scholar
  25. Stephanie M. Merritt. 2011. Affective Processes in Human–Automation Interactions. Human Factors 53, 4 (2011), 356–370. https://doi.org/10.1177/0018720811411912Google ScholarGoogle ScholarCross RefCross Ref
  26. William J. Pilotte and Robert K. Gable. 1990. The Impact of Positive and Negative Item Stems on the Validity of a Computer Anxiety Scale. Educational and Psychological Measurement 50, 3 (1990), 603–610. https://doi.org/10.1177/0013164490503016Google ScholarGoogle ScholarCross RefCross Ref
  27. Forough Poursabzi-Sangdeh, Daniel G. Goldstein, Jake M. Hofman, Jennifer Wortman Wortman Vaughan, and Hanna Wallach. 2021. Manipulating and Measuring Model Interpretability. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). ACM, New York, NY, USA, Article 237, 52 pages. https://doi.org/10.1145/3411764.3445315Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Joseph R Priester and Richard E Petty. 1996. The Gradual Threshold Model of Ambivalence: Relating the Positive and Negative Bases of Attitudes to Subjective Ambivalence.Journal of personality and social psychology 71, 3(1996), 431.Google ScholarGoogle Scholar
  29. Jeff Sauro and James R. Lewis. 2011. When Designing Usability Questionnaires, Does It Hurt to Be Positive?. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vancouver, BC, Canada) (CHI ’11). Association for Computing Machinery, New York, NY, USA, 2215–2224. https://doi.org/10.1145/1978942.1979266Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Nicolas Scharowski, Sebastian AC Perrig, Nick von Felten, and Florian Brühlmann. 2022. Trust and Reliance in XAI–Distinguishing Between Attitudinal and Behavioral Measures. CHI TRAIT Workshop (2022), 6 pages. https://doi.org/10.48550/arXiv.2203.12318Google ScholarGoogle ScholarCross RefCross Ref
  31. Beau G Schelble, Christopher Flathmann, Matthew Scalia, Shiwen Zhou, Christopher Myers, Nathan J McNeese, Jamie Gorman, and Guo Freeman. 2022. Addressing the Spread of Trust and Distrust in Distributed Human-AI Teaming Constellations. CHI TRAIT Workshop (2022), 11 pages.Google ScholarGoogle Scholar
  32. Neal Schmitt and Daniel M. Stuits. 1985. Factors Defined by Negatively Keyed Items: The Result of Careless Respondents?Applied Psychological Measurement 9, 4 (1985), 367–373. https://doi.org/10.1177/014662168500900405Google ScholarGoogle ScholarCross RefCross Ref
  33. Chester A. Schriesheim and Kenneth D. Hill. 1981. Controlling Acquiescence Response Bias by Item Reversals: The Effect on Questionnaire Validity. Educational and Psychological Measurement 41, 4 (1981), 1101–1114. https://doi.org/10.1177/001316448104100420Google ScholarGoogle ScholarCross RefCross Ref
  34. Randall E Schumacker and Richard G Lomax. 2010. A beginner’s guide to structural equation modeling (3 ed.). Routledge, New York, NY, USA. https://doi.org/10.4324/9780203851319Google ScholarGoogle ScholarCross RefCross Ref
  35. Randall D. Spain, Ernesto A. Bustamante, and James P. Bliss. 2008. Towards an Empirically Developed Scale for System Trust: Take Two. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 52, 19 (2008), 1335–1339. https://doi.org/10.1177/154193120805201907Google ScholarGoogle ScholarCross RefCross Ref
  36. Thomas J Stewart and Ann W Frye. 2004. Investigating the use of Negatively Phrased Survey Items in Medical Education Settings: Common Wisdom or Common Mistake?Academic Medicine 79, 10 (2004), 18–20.Google ScholarGoogle Scholar
  37. Takane Ueno, Yuto Sawa, Yeongdae Kim, Jacqueline Urakami, Hiroki Oura, and Katie Seaborn. 2022. Trust in Human-AI Interaction: Scoping Out Models, Measures, and Methods. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI EA ’22). Association for Computing Machinery, New York, NY, USA, Article 254, 7 pages. https://doi.org/10.1145/3491101.3519772Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Nancy Wong, Aric Rindfleisch, and James E. Burroughs. 2003. Do Reverse-Worded Items Confound Measures in Cross-Cultural Consumer Research? The Case of the Material Values Scale. Journal of Consumer Research 30, 1 (2003), 72–91. https://doi.org/10.1086/374697Google ScholarGoogle ScholarCross RefCross Ref
  39. Ming Yin, Jennifer Wortman Vaughan, and Hanna Wallach. 2019. Understanding The Effect of Accuracy on Trust in Machine Learning Models. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3290605.3300509Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Kun Yu, Shlomo Berkovsky, Ronnie Taib, Dan Conway, Jianlong Zhou, and Fang Chen. 2017. User Trust Dynamics: An Investigation Driven by Differences in System Performance. In Proceedings of the 22nd international conference on intelligent user interfaces(Limassol, Cyprus) (IUI ’17). Association for Computing Machinery, New York, NY, USA, 307–317. https://doi.org/10.1145/3025171.3025219Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Trust Issues with Trust Scales: Examining the Psychometric Quality of Trust Measures in the Context of AI

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CHI EA '23: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems
          April 2023
          3914 pages
          ISBN:9781450394222
          DOI:10.1145/3544549

          Copyright © 2023 Owner/Author

          Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 19 April 2023

          Check for updates

          Qualifiers

          • Work in Progress
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate6,164of23,696submissions,26%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text

        HTML Format

        View this article in HTML Format .

        View HTML Format