Work in Progress

Trust Issues with Trust Scales: Examining the Psychometric Quality of Trust Measures in the Context of AI

Authors:
Sebastian A. C. Perrig

Center for General Psychology and Methodology, University of Basel, Switzerland

Center for General Psychology and Methodology, University of Basel, Switzerland

0000-0002-4301-8206
View Profile

,
Nicolas Scharowski

Center for General Psychology and Methodology, University of Basel, Switzerland

Center for General Psychology and Methodology, University of Basel, Switzerland

0000-0001-5983-346X
View Profile

,
Florian Brühlmann

Center for General Psychology and Methodology, University of Basel, Switzerland

Center for General Psychology and Methodology, University of Basel, Switzerland

0000-0001-8945-3273
View Profile

CHI EA '23: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing SystemsApril 2023Article No.: 297Pages 1–7https://doi.org/10.1145/3544549.3585808

Published:19 April 2023Publication History

CHI EA '23: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems

Pages 1–7

ABSTRACT

Trust is crucial for human interaction with artificial intelligence (AI) and is frequently measured through questionnaires or rating scales. One commonly used questionnaire in AI research is the Trust between People and Automation scale (TPA). However, its psychometric quality has yet to be examined in the context of AI. More recently, a Trust Scale for Explainable AI (TXAI) was recommended but not empirically evaluated. In this study, we assessed the psychometric qualities of both scales, using confirmatory and exploratory factor analyses to test the scales’ validity and coefficients α and ω for reliability estimation. Our results suggested good psychometric quality for the TXAI after removing one item. Concerning the TPA, acceptable quality was only achieved when using a two-factor model (trust and distrust) and after removing two items. We provide recommendations for using the two scales and evidence to distinguish trust and distrust as separate psychological constructs.

Footnotes

¹ Low χ² value and p > .05 for the χ² test, RMSEA < .06, SRMR ≤ .08 and .95 ≤ CFI ≤ 1
Footnote

Supplemental Material

3544549.3585808-talk-video.mp4

mp4

56.7 MB

Download

References

J. Jackson Barnette. 2000. Effects of Stem and Likert Response Option Reversals on Survey Internal Consistency: If You Feel the Need, There is a Better Alternative to Using those Negatively Worded Stems. Educational and Psychological Measurement 60, 3 (2000), 361–370. https://doi.org/10.1177/00131640021970592Google ScholarCross Ref
Florian Brühlmann, Serge Petralito, Lena Aeschbach, and Klaus Opwis. 2020. The Quality of Data Collected Online: An Investigation of Careless Responding in a Crowdsourced Sample. Methods in Psychology 2 (04 2020), 100022. https://doi.org/10.1016/j.metip.2020.100022Google ScholarCross Ref
Zana Buçinca, Phoebe Lin, Krzysztof Z. Gajos, and Elena L. Glassman. 2020. Proxy Tasks and Subjective Measures Can Be Misleading in Evaluating Explainable AI Systems. In Proceedings of the 25th International Conference on Intelligent User Interfaces (Cagliari, Italy) (IUI ’20). ACM, New York, NY, USA, 454–464. https://doi.org/10.1145/3377325.3377498Google ScholarDigital Library
John T Cacioppo and Gary G Berntson. 1994. Relationship Between Attitudes and Evaluative Space: A Critical Review, with Emphasis on the Separability of Positive and Negative Substrates.Psychological Bulletin 115, 3 (1994), 401–423. https://doi.org/10.1037/0033-2909.115.3.401Google ScholarCross Ref
Lee J Cronbach. 1951. Coefficient Alpha and the Internal Structure of Tests. Psychometrika 16, 3 (1951), 297–334. https://doi.org/10.1007/BF02310555Google ScholarCross Ref
Robert F. DeVellis. 2017. Scale Development: Theory and Applications(4th ed.). SAGE publications, Inc., Thousand Oaks, CA, USA.Google Scholar
Alice H Eagly and Shelly Chaiken. 1993. The Psychology of Attitudes. Harcourt Brace Jovanovich College Publishers.Google Scholar
Mike Furr. 2011. Scale Construction and Psychometrics for Social and Personality Psychology. SAGE publications, Ltd., London, UK.Google Scholar
Robert S. Gutzwiller, Erin K. Chiou, Scotty D. Craig, Christina M. Lewis, Glenn J. Lematta, and Chi-Ping Hsiung. 2019. Positive Bias in the ‘Trust in Automated Systems Survey’? An Examination of the Jian et al. (2000) Scale. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 63, 1(2019), 217–221. https://doi.org/10.1177/1071181319631201Google ScholarCross Ref
Joseph F. Hair, William C. Black, Barry J. Babin, and Rolph E. Anderson. 2010. Multivariate Data Analysis(7th ed.). Prentice Hall, Hoboken, NJ, USA. 785 pages.Google Scholar
N Henze and B Zirkler. 1990. A Class of Invariant Consistent Tests for Multivariate Normality. Communications in statistics-Theory and Methods 19, 10 (1990), 3595–3617.Google Scholar
Kevin A. Hoff and Masooda Bashir. 2015. Trust in Automation: Integrating Empirical Evidence on Factors That Influence Trust. Human Factors 57, 3 (2015), 407–434. https://doi.org/10.1177/0018720814547570Google ScholarCross Ref
Robert R. Hoffman, Shane T. Mueller, Gary Klein, and Jordan Litman. 2023. Measures for explainable AI: Explanation goodness, user satisfaction, mental models, curiosity, trust, and human-AI performance. Frontiers in Computer Science 5 (2023), 15 pages. https://doi.org/10.3389/fcomp.2023.1096257Google ScholarCross Ref
Kenneth D Hopkins. 1998. Educational and Psychological Measurement and Evaluation. Pearson, London, UK.Google Scholar
Matt C. Howard. 2016. A Review of Exploratory Factor Analysis Decisions and Overview of Current Practices: What We Are Doing and How Can We Improve?International Journal of Human–Computer Interaction 32, 1(2016), 51–62. https://doi.org/10.1080/10447318.2015.1087664Google ScholarCross Ref
Li‐tze Hu and Peter M. Bentler. 1999. Cutoff Criteria for Fit Indexes in Covariance Structure Analysis: Conventional Criteria Versus new Alternatives. Structural Equation Modeling: A Multidisciplinary Journal 6, 1(1999), 1–55. https://doi.org/10.1080/10705519909540118Google ScholarCross Ref
Jiun-Yin Jian, Ann M. Bisantz, and Colin G. Drury. 2000. Foundations for an Empirically Determined Scale of Trust in Automated Systems. International Journal of Cognitive Ergonomics 4, 1 (2000), 53–71. https://doi.org/10.1207/S15327566IJCE0401_04Google ScholarCross Ref
Elizabeth F. Juniper. 2009. Validated Questionnaires Should not be Modified. European Respiratory Journal 34, 5 (2009), 1015–1017. https://doi.org/10.1183/09031936.00110209Google ScholarCross Ref
Markus Langer, Tim Hunsicker, Tina Feldkamp, Cornelius J. König, and Nina Grgić-Hlača. 2022. “Look! It’s a Computer Program! It’s an Algorithm! It’s AI!”: Does Terminology Affect Human Perceptions and Evaluations of Algorithmic Decision-Making Systems?. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 581, 28 pages. https://doi.org/10.1145/3491102.3517527Google ScholarDigital Library
John D. Lee and Katrina A. See. 2004. Trust in Automation: Designing for Appropriate Reliance. Human factors 46, 1 (2004), 50–80. https://doi.org/10.1518/hfes.46.1.50_30392Google ScholarCross Ref
Roy J Lewicki, Daniel J McAllister, and Robert J Bies. 1998. Trust and Distrust: New Relationships and Realities. Academy of management Review 23, 3 (1998), 438–458.Google Scholar
Kantilal Vardichand Mardia. 1970. Measures of Multivariate Skewness and Kurtosis with Applications. Biometrika 57, 3 (1970), 519–530. https://doi.org/10.1093/biomet/57.3.519Google ScholarCross Ref
Roger C. Mayer, James H. Davis, and F. David Schoorman. 1995. An Integrative Model of Organizational Trust. Academy of Management Review 20, 3 (1995), 709–734. https://doi.org/10.2307/258792Google ScholarCross Ref
Roderick P McDonald. 1999. Test Theory: A Unified Treatment. Lawrence Erlbaum Associates, Inc., Mahwah, NJ, USA.Google Scholar
Stephanie M. Merritt. 2011. Affective Processes in Human–Automation Interactions. Human Factors 53, 4 (2011), 356–370. https://doi.org/10.1177/0018720811411912Google ScholarCross Ref
William J. Pilotte and Robert K. Gable. 1990. The Impact of Positive and Negative Item Stems on the Validity of a Computer Anxiety Scale. Educational and Psychological Measurement 50, 3 (1990), 603–610. https://doi.org/10.1177/0013164490503016Google ScholarCross Ref
Forough Poursabzi-Sangdeh, Daniel G. Goldstein, Jake M. Hofman, Jennifer Wortman Wortman Vaughan, and Hanna Wallach. 2021. Manipulating and Measuring Model Interpretability. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). ACM, New York, NY, USA, Article 237, 52 pages. https://doi.org/10.1145/3411764.3445315Google ScholarDigital Library
Joseph R Priester and Richard E Petty. 1996. The Gradual Threshold Model of Ambivalence: Relating the Positive and Negative Bases of Attitudes to Subjective Ambivalence.Journal of personality and social psychology 71, 3(1996), 431.Google Scholar
Jeff Sauro and James R. Lewis. 2011. When Designing Usability Questionnaires, Does It Hurt to Be Positive?. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vancouver, BC, Canada) (CHI ’11). Association for Computing Machinery, New York, NY, USA, 2215–2224. https://doi.org/10.1145/1978942.1979266Google ScholarDigital Library
Nicolas Scharowski, Sebastian AC Perrig, Nick von Felten, and Florian Brühlmann. 2022. Trust and Reliance in XAI–Distinguishing Between Attitudinal and Behavioral Measures. CHI TRAIT Workshop (2022), 6 pages. https://doi.org/10.48550/arXiv.2203.12318Google ScholarCross Ref
Beau G Schelble, Christopher Flathmann, Matthew Scalia, Shiwen Zhou, Christopher Myers, Nathan J McNeese, Jamie Gorman, and Guo Freeman. 2022. Addressing the Spread of Trust and Distrust in Distributed Human-AI Teaming Constellations. CHI TRAIT Workshop (2022), 11 pages.Google Scholar
Neal Schmitt and Daniel M. Stuits. 1985. Factors Defined by Negatively Keyed Items: The Result of Careless Respondents?Applied Psychological Measurement 9, 4 (1985), 367–373. https://doi.org/10.1177/014662168500900405Google ScholarCross Ref
Chester A. Schriesheim and Kenneth D. Hill. 1981. Controlling Acquiescence Response Bias by Item Reversals: The Effect on Questionnaire Validity. Educational and Psychological Measurement 41, 4 (1981), 1101–1114. https://doi.org/10.1177/001316448104100420Google ScholarCross Ref
Randall E Schumacker and Richard G Lomax. 2010. A beginner’s guide to structural equation modeling (3 ed.). Routledge, New York, NY, USA. https://doi.org/10.4324/9780203851319Google ScholarCross Ref
Randall D. Spain, Ernesto A. Bustamante, and James P. Bliss. 2008. Towards an Empirically Developed Scale for System Trust: Take Two. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 52, 19 (2008), 1335–1339. https://doi.org/10.1177/154193120805201907Google ScholarCross Ref
Thomas J Stewart and Ann W Frye. 2004. Investigating the use of Negatively Phrased Survey Items in Medical Education Settings: Common Wisdom or Common Mistake?Academic Medicine 79, 10 (2004), 18–20.Google Scholar
Takane Ueno, Yuto Sawa, Yeongdae Kim, Jacqueline Urakami, Hiroki Oura, and Katie Seaborn. 2022. Trust in Human-AI Interaction: Scoping Out Models, Measures, and Methods. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI EA ’22). Association for Computing Machinery, New York, NY, USA, Article 254, 7 pages. https://doi.org/10.1145/3491101.3519772Google ScholarDigital Library
Nancy Wong, Aric Rindfleisch, and James E. Burroughs. 2003. Do Reverse-Worded Items Confound Measures in Cross-Cultural Consumer Research? The Case of the Material Values Scale. Journal of Consumer Research 30, 1 (2003), 72–91. https://doi.org/10.1086/374697Google ScholarCross Ref
Ming Yin, Jennifer Wortman Vaughan, and Hanna Wallach. 2019. Understanding The Effect of Accuracy on Trust in Machine Learning Models. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3290605.3300509Google ScholarDigital Library
Kun Yu, Shlomo Berkovsky, Ronnie Taib, Dan Conway, Jianlong Zhou, and Fang Chen. 2017. User Trust Dynamics: An Investigation Driven by Differences in System Performance. In Proceedings of the 22nd international conference on intelligent user interfaces(Limassol, Cyprus) (IUI ’17). Association for Computing Machinery, New York, NY, USA, 307–317. https://doi.org/10.1145/3025171.3025219Google ScholarDigital Library

Index Terms

Trust Issues with Trust Scales: Examining the Psychometric Quality of Trust Measures in the Context of AI
1. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Examining Mobile Banking User Trust: A Tripartite Perspective

Building users' trust is crucial to alleviating their perceived risk and facilitating their usage of mobile banking. Drawing on a tripartite perspective of transference-based, personality-based and self-perception-based determinants, this research ...
Read More
Re-examining the influence of trust on online repeat purchase intention: The moderating role of habit and its antecedents

Customer loyalty or repeat purchasing is critical for the survival and success of any store. By focusing on online stores, this study investigates the moderating role of habit on the relationship between trust and repeat purchase intention. Prior ...
Read More
Developing and Validating Trust Measures for e-Commerce: An Integrative Typology

Evidence suggests that consumers often hesitate to transact with Web-based vendors because of uncertainty about vendor behavior or the perceived risk of having personal information stolen by hackers. Trust plays a central role in helping consumers ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHI EA '23: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems
April 2023
3914 pages
ISBN:9781450394222
DOI:10.1145/3544549
Editors:
Albrecht Schmidt
LMU Munich, Germany
,
Kaisa Väänänen
Tampere University, Finland
,
Tesh Goyal
Google Research, USA
,
Per Ola Kristensson
University of Cambridge, UK
,
Anicia Peters
University of Namibia, Namibia
Copyright © 2023 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 April 2023
Check for updates
Author Tags
AI
Measurement
Psychometrics
Questionnaires
Trust
Validation
XAI
Qualifiers
- Work in Progress
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate6,164of23,696submissions,26%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 703
  Total Downloads
- Downloads (Last 12 months)660
- Downloads (Last 6 weeks)133
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

Trust Issues with Trust Scales: Examining the Psychometric Quality of Trust Measures in the Context of AI

CHI EA '23: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems

ABSTRACT

Footnotes

Supplemental Material

References

Cited By

Index Terms

Recommendations

Examining Mobile Banking User Trust: A Tripartite Perspective

Re-examining the influence of trust on online repeat purchase intention: The moderating role of habit and its antecedents

Developing and Validating Trust Measures for e-Commerce: An Integrative Typology