ABSTRACT
Trust is crucial for human interaction with artificial intelligence (AI) and is frequently measured through questionnaires or rating scales. One commonly used questionnaire in AI research is the Trust between People and Automation scale (TPA). However, its psychometric quality has yet to be examined in the context of AI. More recently, a Trust Scale for Explainable AI (TXAI) was recommended but not empirically evaluated. In this study, we assessed the psychometric qualities of both scales, using confirmatory and exploratory factor analyses to test the scales’ validity and coefficients α and ω for reliability estimation. Our results suggested good psychometric quality for the TXAI after removing one item. Concerning the TPA, acceptable quality was only achieved when using a two-factor model (trust and distrust) and after removing two items. We provide recommendations for using the two scales and evidence to distinguish trust and distrust as separate psychological constructs.
Footnotes
1 Low χ2 value and p > .05 for the χ2 test, RMSEA < .06, SRMR ≤ .08 and .95 ≤ CFI ≤ 1
Footnote
Supplemental Material
- J. Jackson Barnette. 2000. Effects of Stem and Likert Response Option Reversals on Survey Internal Consistency: If You Feel the Need, There is a Better Alternative to Using those Negatively Worded Stems. Educational and Psychological Measurement 60, 3 (2000), 361–370. https://doi.org/10.1177/00131640021970592Google ScholarCross Ref
- Florian Brühlmann, Serge Petralito, Lena Aeschbach, and Klaus Opwis. 2020. The Quality of Data Collected Online: An Investigation of Careless Responding in a Crowdsourced Sample. Methods in Psychology 2 (04 2020), 100022. https://doi.org/10.1016/j.metip.2020.100022Google ScholarCross Ref
- Zana Buçinca, Phoebe Lin, Krzysztof Z. Gajos, and Elena L. Glassman. 2020. Proxy Tasks and Subjective Measures Can Be Misleading in Evaluating Explainable AI Systems. In Proceedings of the 25th International Conference on Intelligent User Interfaces (Cagliari, Italy) (IUI ’20). ACM, New York, NY, USA, 454–464. https://doi.org/10.1145/3377325.3377498Google ScholarDigital Library
- John T Cacioppo and Gary G Berntson. 1994. Relationship Between Attitudes and Evaluative Space: A Critical Review, with Emphasis on the Separability of Positive and Negative Substrates.Psychological Bulletin 115, 3 (1994), 401–423. https://doi.org/10.1037/0033-2909.115.3.401Google ScholarCross Ref
- Lee J Cronbach. 1951. Coefficient Alpha and the Internal Structure of Tests. Psychometrika 16, 3 (1951), 297–334. https://doi.org/10.1007/BF02310555Google ScholarCross Ref
- Robert F. DeVellis. 2017. Scale Development: Theory and Applications(4th ed.). SAGE publications, Inc., Thousand Oaks, CA, USA.Google Scholar
- Alice H Eagly and Shelly Chaiken. 1993. The Psychology of Attitudes. Harcourt Brace Jovanovich College Publishers.Google Scholar
- Mike Furr. 2011. Scale Construction and Psychometrics for Social and Personality Psychology. SAGE publications, Ltd., London, UK.Google Scholar
- Robert S. Gutzwiller, Erin K. Chiou, Scotty D. Craig, Christina M. Lewis, Glenn J. Lematta, and Chi-Ping Hsiung. 2019. Positive Bias in the ‘Trust in Automated Systems Survey’? An Examination of the Jian et al. (2000) Scale. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 63, 1(2019), 217–221. https://doi.org/10.1177/1071181319631201Google ScholarCross Ref
- Joseph F. Hair, William C. Black, Barry J. Babin, and Rolph E. Anderson. 2010. Multivariate Data Analysis(7th ed.). Prentice Hall, Hoboken, NJ, USA. 785 pages.Google Scholar
- N Henze and B Zirkler. 1990. A Class of Invariant Consistent Tests for Multivariate Normality. Communications in statistics-Theory and Methods 19, 10 (1990), 3595–3617.Google Scholar
- Kevin A. Hoff and Masooda Bashir. 2015. Trust in Automation: Integrating Empirical Evidence on Factors That Influence Trust. Human Factors 57, 3 (2015), 407–434. https://doi.org/10.1177/0018720814547570Google ScholarCross Ref
- Robert R. Hoffman, Shane T. Mueller, Gary Klein, and Jordan Litman. 2023. Measures for explainable AI: Explanation goodness, user satisfaction, mental models, curiosity, trust, and human-AI performance. Frontiers in Computer Science 5 (2023), 15 pages. https://doi.org/10.3389/fcomp.2023.1096257Google ScholarCross Ref
- Kenneth D Hopkins. 1998. Educational and Psychological Measurement and Evaluation. Pearson, London, UK.Google Scholar
- Matt C. Howard. 2016. A Review of Exploratory Factor Analysis Decisions and Overview of Current Practices: What We Are Doing and How Can We Improve?International Journal of Human–Computer Interaction 32, 1(2016), 51–62. https://doi.org/10.1080/10447318.2015.1087664Google ScholarCross Ref
- Li‐tze Hu and Peter M. Bentler. 1999. Cutoff Criteria for Fit Indexes in Covariance Structure Analysis: Conventional Criteria Versus new Alternatives. Structural Equation Modeling: A Multidisciplinary Journal 6, 1(1999), 1–55. https://doi.org/10.1080/10705519909540118Google ScholarCross Ref
- Jiun-Yin Jian, Ann M. Bisantz, and Colin G. Drury. 2000. Foundations for an Empirically Determined Scale of Trust in Automated Systems. International Journal of Cognitive Ergonomics 4, 1 (2000), 53–71. https://doi.org/10.1207/S15327566IJCE0401_04Google ScholarCross Ref
- Elizabeth F. Juniper. 2009. Validated Questionnaires Should not be Modified. European Respiratory Journal 34, 5 (2009), 1015–1017. https://doi.org/10.1183/09031936.00110209Google ScholarCross Ref
- Markus Langer, Tim Hunsicker, Tina Feldkamp, Cornelius J. König, and Nina Grgić-Hlača. 2022. “Look! It’s a Computer Program! It’s an Algorithm! It’s AI!”: Does Terminology Affect Human Perceptions and Evaluations of Algorithmic Decision-Making Systems?. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 581, 28 pages. https://doi.org/10.1145/3491102.3517527Google ScholarDigital Library
- John D. Lee and Katrina A. See. 2004. Trust in Automation: Designing for Appropriate Reliance. Human factors 46, 1 (2004), 50–80. https://doi.org/10.1518/hfes.46.1.50_30392Google ScholarCross Ref
- Roy J Lewicki, Daniel J McAllister, and Robert J Bies. 1998. Trust and Distrust: New Relationships and Realities. Academy of management Review 23, 3 (1998), 438–458.Google Scholar
- Kantilal Vardichand Mardia. 1970. Measures of Multivariate Skewness and Kurtosis with Applications. Biometrika 57, 3 (1970), 519–530. https://doi.org/10.1093/biomet/57.3.519Google ScholarCross Ref
- Roger C. Mayer, James H. Davis, and F. David Schoorman. 1995. An Integrative Model of Organizational Trust. Academy of Management Review 20, 3 (1995), 709–734. https://doi.org/10.2307/258792Google ScholarCross Ref
- Roderick P McDonald. 1999. Test Theory: A Unified Treatment. Lawrence Erlbaum Associates, Inc., Mahwah, NJ, USA.Google Scholar
- Stephanie M. Merritt. 2011. Affective Processes in Human–Automation Interactions. Human Factors 53, 4 (2011), 356–370. https://doi.org/10.1177/0018720811411912Google ScholarCross Ref
- William J. Pilotte and Robert K. Gable. 1990. The Impact of Positive and Negative Item Stems on the Validity of a Computer Anxiety Scale. Educational and Psychological Measurement 50, 3 (1990), 603–610. https://doi.org/10.1177/0013164490503016Google ScholarCross Ref
- Forough Poursabzi-Sangdeh, Daniel G. Goldstein, Jake M. Hofman, Jennifer Wortman Wortman Vaughan, and Hanna Wallach. 2021. Manipulating and Measuring Model Interpretability. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). ACM, New York, NY, USA, Article 237, 52 pages. https://doi.org/10.1145/3411764.3445315Google ScholarDigital Library
- Joseph R Priester and Richard E Petty. 1996. The Gradual Threshold Model of Ambivalence: Relating the Positive and Negative Bases of Attitudes to Subjective Ambivalence.Journal of personality and social psychology 71, 3(1996), 431.Google Scholar
- Jeff Sauro and James R. Lewis. 2011. When Designing Usability Questionnaires, Does It Hurt to Be Positive?. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vancouver, BC, Canada) (CHI ’11). Association for Computing Machinery, New York, NY, USA, 2215–2224. https://doi.org/10.1145/1978942.1979266Google ScholarDigital Library
- Nicolas Scharowski, Sebastian AC Perrig, Nick von Felten, and Florian Brühlmann. 2022. Trust and Reliance in XAI–Distinguishing Between Attitudinal and Behavioral Measures. CHI TRAIT Workshop (2022), 6 pages. https://doi.org/10.48550/arXiv.2203.12318Google ScholarCross Ref
- Beau G Schelble, Christopher Flathmann, Matthew Scalia, Shiwen Zhou, Christopher Myers, Nathan J McNeese, Jamie Gorman, and Guo Freeman. 2022. Addressing the Spread of Trust and Distrust in Distributed Human-AI Teaming Constellations. CHI TRAIT Workshop (2022), 11 pages.Google Scholar
- Neal Schmitt and Daniel M. Stuits. 1985. Factors Defined by Negatively Keyed Items: The Result of Careless Respondents?Applied Psychological Measurement 9, 4 (1985), 367–373. https://doi.org/10.1177/014662168500900405Google ScholarCross Ref
- Chester A. Schriesheim and Kenneth D. Hill. 1981. Controlling Acquiescence Response Bias by Item Reversals: The Effect on Questionnaire Validity. Educational and Psychological Measurement 41, 4 (1981), 1101–1114. https://doi.org/10.1177/001316448104100420Google ScholarCross Ref
- Randall E Schumacker and Richard G Lomax. 2010. A beginner’s guide to structural equation modeling (3 ed.). Routledge, New York, NY, USA. https://doi.org/10.4324/9780203851319Google ScholarCross Ref
- Randall D. Spain, Ernesto A. Bustamante, and James P. Bliss. 2008. Towards an Empirically Developed Scale for System Trust: Take Two. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 52, 19 (2008), 1335–1339. https://doi.org/10.1177/154193120805201907Google ScholarCross Ref
- Thomas J Stewart and Ann W Frye. 2004. Investigating the use of Negatively Phrased Survey Items in Medical Education Settings: Common Wisdom or Common Mistake?Academic Medicine 79, 10 (2004), 18–20.Google Scholar
- Takane Ueno, Yuto Sawa, Yeongdae Kim, Jacqueline Urakami, Hiroki Oura, and Katie Seaborn. 2022. Trust in Human-AI Interaction: Scoping Out Models, Measures, and Methods. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI EA ’22). Association for Computing Machinery, New York, NY, USA, Article 254, 7 pages. https://doi.org/10.1145/3491101.3519772Google ScholarDigital Library
- Nancy Wong, Aric Rindfleisch, and James E. Burroughs. 2003. Do Reverse-Worded Items Confound Measures in Cross-Cultural Consumer Research? The Case of the Material Values Scale. Journal of Consumer Research 30, 1 (2003), 72–91. https://doi.org/10.1086/374697Google ScholarCross Ref
- Ming Yin, Jennifer Wortman Vaughan, and Hanna Wallach. 2019. Understanding The Effect of Accuracy on Trust in Machine Learning Models. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3290605.3300509Google ScholarDigital Library
- Kun Yu, Shlomo Berkovsky, Ronnie Taib, Dan Conway, Jianlong Zhou, and Fang Chen. 2017. User Trust Dynamics: An Investigation Driven by Differences in System Performance. In Proceedings of the 22nd international conference on intelligent user interfaces(Limassol, Cyprus) (IUI ’17). Association for Computing Machinery, New York, NY, USA, 307–317. https://doi.org/10.1145/3025171.3025219Google ScholarDigital Library
Index Terms
- Trust Issues with Trust Scales: Examining the Psychometric Quality of Trust Measures in the Context of AI
Recommendations
Examining Mobile Banking User Trust: A Tripartite Perspective
Building users' trust is crucial to alleviating their perceived risk and facilitating their usage of mobile banking. Drawing on a tripartite perspective of transference-based, personality-based and self-perception-based determinants, this research ...
Re-examining the influence of trust on online repeat purchase intention: The moderating role of habit and its antecedents
Customer loyalty or repeat purchasing is critical for the survival and success of any store. By focusing on online stores, this study investigates the moderating role of habit on the relationship between trust and repeat purchase intention. Prior ...
Developing and Validating Trust Measures for e-Commerce: An Integrative Typology
Evidence suggests that consumers often hesitate to transact with Web-based vendors because of uncertainty about vendor behavior or the perceived risk of having personal information stolen by hackers. Trust plays a central role in helping consumers ...
Comments