skip to main content

An Investigation of the Test-Retest Reliability of the miniPXI

Published: 15 October 2024 Publication History


Repeated measurements of player experience are crucial in games user research, assessing how different designs evolve over time. However, this necessitates lightweight measurement instruments that are fit for the purpose. In this study, we conduct an examination of the test-retest reliability of the miniPXI-a short variant of the Player Experience Inventory (PXI), an established measure for measuring player experience. We analyzed test-retest reliability by leveraging four games involving 100 participants, comparing it with four established multi-item measures and single-item indicators such as the Net Promoter Score (NPS) and overall enjoyment. The findings show mixed outcomes; the miniPXI demonstrated varying levels of test-retest reliability. Some constructs showed good to moderate reliability, while others were less consistent. On the other hand, multi-item measures exhibited moderate to good test-retest reliability, demonstrating their effectiveness in measuring player experiences over time. Additionally, the employed single-item indicators (NPS and overall enjoyment) demonstrated good reliability. The results of our study highlight the complexity of player experience evaluations over time, utilizing single and multiple items per construct measures. We conclude that single-item measures may not be appropriate for long-term investigations of more complex PX dimensions and provide practical considerations for the applicability of such measures in repeated measurements.

Supplemental Material

ZIP File
Additional computed variables the constructs of all included measures


Vero Vanden Abeele, Katta Spiel, Lennart Nacke, Daniel Johnson, and Kathrin Gerling. 2020. Development and validation of the player experience inventory: A scale to measure player experiences at the level of functional and psychosocial consequences. International Journal of Human Computer Studies, Vol. 135, June 2019 (2020), 102370.
Ahmad Azadvar and Alessandro Canossa. 2018. UPEQ: Ubisoft Perceived Experience Questionnaire: A Self-Determination Evaluation Tool for Video Games. In Proceedings of the 13th International Conference on the Foundations of Digital Games (FDG '18). Association for Computing Machinery, New York, NY, USA, Article 5, 7 pages.
Jutta Backhaus, Klaus Junghanns, Andreas Broocks, Dieter Riemann, and Fritz Hohagen. 2002. Test--retest reliability and validity of the Pittsburgh Sleep Quality Index in primary insomnia. Journal of Psychosomatic Research, Vol. 53, 3 (2002), 737--740.
Regina Bernhaupt. 2015. Game User Experience Evaluation. Springer, Cham, Switzerland.
Godfred O. Boateng, Torsten B. Neilands, Edward A. Frongillo, Hugo R. Melgar-Quiñonez, and Sera L. Young. 2018. Best Practices for Developing and Validating Scales for Health, Social, and Behavioral Research: A Primer. Frontiers in Public Health, Vol. 6 (2018).
Graeme Borland. 2016. Sort the Court! ttps:// Accessed: February, 2024.
Jeanne H Brockmyer, Christine M Fox, Kathleen A Curtiss, Evan McBroom, Kimberly M Burkhart, and Jacquelyn N Pidruzny. 2009. The development of the Game Engagement Questionnaire: A measure of engagement in video game-playing. Journal of Experimental Social Psychology, Vol. 45, 4 (2009), 624--634.
Emily Brown and Paul Cairns. 2004 a. A grounded investigation of game immersion. In CHI '04 Extended Abstracts on Human Factors in Computing Systems (Vienna, Austria) (CHI EA '04). Association for Computing Machinery, New York, NY, USA, 1297--1300.
Emily Brown and Paul Cairns. 2004 b. A Grounded Investigation of Game Immersion. In CHI '04 Extended Abstracts on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1297--1300.
Paul Cairns, Anna Cox, and A. Imran Nordin. 2014. Immersion in Digital Games: Review of Gaming Experience Research. John Wiley & Sons, Ltd, Hoboken, NJ, USA, Chapter 12, 337--361.
Tara Carrigy, Katsiaryna Naliuka, Natasa Paterson, and Mads Haahr. 2010. Design and evaluation of player experience of a location-based mobile game. In Proceedings of the 6th Nordic Conference on Human-Computer Interaction: Extending Boundaries (Reykjavik, Iceland) (NordiCHI '10). Association for Computing Machinery, New York, NY, USA, 92--101.
Heather E. P. Cattell. 2001. The Sixteen Personality Factor (16PF) Questionnaire. In Understanding Psychological Assessment, William I. Dorfman and Michel Hersen (Eds.). Springer, Boston, MA, USA, 187--215.
M. T. Cheng, H. C. She, and L A Annetta. 2015. Game immersion experience: its hierarchical structure and impact on game-based science learning. Journal of Computer Assisted Learning, Vol. 31, 3 (2015), 232--253.
Sung Hyeon Cheon and Johnmarshall Reeve. 2015. A classroom-based intervention to help teachers decrease students' amotivation. Contemporary Educational Psychology, Vol. 40 (Jan. 2015), 99--111.
Maksim Chmutov. 2023. Station Saturn. Accessed: February, 2024.
Lee J Cronbach. 1951. Coefficient alpha and the internal structure of tests. Psychometrika, Vol. 16, 3 (1951), 297--334.
Mihaly Csikszentmihalyi. 1990. Flow: The Psychology of Optimal Experience. Harper and Row, New York, NY, USA.
Alena Denisova, A. Imran Nordin, and Paul Cairns. 2016. The convergence of player experience questionnaires. In Proceedings of the 2016 Annual Symposium on Computer-Human Interaction in Play. Association for Computing Machinery, New York, NY, USA, 33--37.
Heather Desurvire and Charlotte Wiberg. 2009. Game Usability Heuristics (PLAY) for Evaluating and Designing Better Games: The Next Iteration. In Online Communities and Social Computing, A. Ant Ozok and Panayiotis Zaphiris (Eds.). Springer, Berlin, Heidelberg, 557--566.
Robert F. DeVellis. 2017. Scale development : theory and applications fourth edition ed.). SAGE, Los Angeles, CA, USA.
Christyn L Dolbier, Judith A Webster, Katherine T McCalister, Mark W Mallon, and Mary A Steinhardt. 2005. Reliability and Validity of a Single-Item Measure of Job Satisfaction. American Journal of Health Promotion, Vol. 19, 3 (2005), 194--198.
Benjamin D. Douglas, Patrick J. Ewell, and Markus Brauer. 2023. Data quality in online human-subjects research: Comparisons between MTurk, Prolific, CloudResearch, Qualtrics, and SONA. PLOS ONE, Vol. 18, 3 (03 2023), 1--17.
Anders Drachen, Pejman Mirza-Babaei, and Lennart E. Nacke. 2018. Introduction to Games User Research. In Games User Research. Oxford University Press, Oxford, UK.
Aimee L. Drolet and Donald G. Morrison. 2001. Do We Really Need Multiple-Item Measures in Service Research? Journal of Service Research, Vol. 3, 3 (Feb. 2001), 196--204.
Thomas J Dunn, Thom Baguley, and Vivienne Brunsden. 2014. From alpha to omega: A practical solution to the pervasive problem of internal consistency estimation. British Journal of Psychology, Vol. 105, 3 (2014), 399--412.
Dustyroom. 2020. Empty. Accessed: February, 2024.
Laura Ermi and Frans Mäyr"a. 2005. Fundamental components of the gameplay experience: Analysing immersion. In DiGRA Conference. Digital Games Research Association, 14 pages.
Andy Field. 2013. Discovering statistics using IBM SPSS statistics. Sage, London, UK.
Gwenith G Fisher, Russell A. Matthews, and Alyssa Mitchell Gibbons. 2016. Developing and investigating the use of single-item measures in organizational research. Journal of occupational health psychology, Vol. 21, 1 (2016), 3--23.
James Gaskin. 2012. Data screening. Accessed: February, 2024.
Gooseworx. 2019. Little Runmo -- YouTube. Accessed: February, 2024.
Louis Guttman. 1945. A basis for analyzing test-retest reliability. Psychometrika, Vol. 10, 4 (Dec 1945), 255--282.
Aqeel Haider, Casper Harteveld, Daniel Johnson, Max V. Birk, Regan L. Mandryk, Magy Seif El-Nasr, Lennart E. Nacke, Kathrin Gerling, and Vero Vanden Abeele. 2022. MiniPXI: Development and Validation of an Eleven-Item Measure of the Player Experience Inventory. Proc. ACM Hum.-Comput. Interact., Vol. 6, CHI PLAY, Article 244 (oct 2022), 26 pages.
Aqeel Haider, Günter Wallner, Kathrin Gerling, and Vero Vanden Abeele. 2023. Preliminary Study of the Performance of the MiniPXI When Measuring Player Experience throughout Game Development. In Companion Proceedings of the Annual Symposium on Computer-Human Interaction in Play. Association for Computing Machinery, New York, NY, USA, 56--62.
Marc Hassenzahl. 2008. Aesthetics in interactive products: correlates and consequences of beauty. In Product Experience, H Schifferstein and P Hekkert (Eds.). Elsevier, New York NY, 287--302.
Marc Hassenzahl, Michael Burmester, and Franz Koller. 2003. AttrakDiff: Ein Fragebogen zur Messung wahrgenommener hedonischer und pragmatischer Qualit"at. In Mensch & Computer 2003: Interaktion in Bewegung, Gerd Szwillus and Jürgen Ziegler (Eds.). ViewegTeubner Verlag, Wiesbaden, 187--196.
Yu-Guan Hsieh, Kimon Antonakopoulos, and Panayotis Mertikopoulos. 2021. Adaptive learning in continuous games: Optimal regret bounds and convergence to Nash equilibrium. In Conference on Learning Theory. MLResearchPress, 2388--2422.
itch corp. 2024. Accessed: February, 2024.
Susan A. Jackson and Herbert W. Marsh. 1996. Development and Validation of a Scale to Measure Optimal Experience: The Flow State Scale. Journal of Sport and Exercise Psychology, Vol. 18, 1 (1996), 17--35.
Charlene Jennett, Anna L. Cox, Paul Cairns, Samira Dhoparee, Andrew Epps, Tim Tijs, and Alison Walton. 2008. Measuring and defining the experience of immersion in games. International Journal of Human Computer Studies, Vol. 66, 9 (2008), 641--661.
Daniel Johnson, M. John Gardner, and Ryan Perry. 2018. Validation of two game experience scales: The Player Experience of Need Satisfaction (PENS) and Game Experience Questionnaire (GEQ). International Journal of Human-Computer Studies, Vol. 118 (2018), 38--46.
JuhoSprite. 2023. Little Runmo.
Joseph R. Keebler, William J. Shelstad, Dustin C. Smith, Barbara S. Chaparro, and Mikki H. Phan. 2020. Validation of the GUESS-18: A Short Version of the Game User Experience Satisfaction Scale (GUESS). J. Usability Studies, Vol. 16, 1 (nov 2020), 49--62.
Terry K. Koo and Mae Y. Li. 2016. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. Journal of Chiropractic Medicine, Vol. 15, 2 (2016), 155--163.
Emily Kothe and Mathew Ling. 2019. Retention of participants recruited to a multi-year longitudinal study via Prolific. (2019).
Jeanine Krath, Maximilian Altmeyer, Gustavo F. Tondello, and Lennart E. Nacke. 2023. Hexad-12: Developing and Validating a Short Version of the Gamification User Types Hexad Scale. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, Article 677, 18 pages.
Effie Lai-Chong Law, Virpi Roto, Marc Hassenzahl, Arnold P.O.S. Vermeeren, and Joke Kort. 2009. Understanding, scoping and defining user experience: a survey approach. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Boston, MA, USA) (CHI '09). Association for Computing Machinery, New York, NY, USA, 719--728.
Robert Loo. 2002. A caveat on using single-item versus multiple-item scales. Journal of Managerial Psychology, Vol. 17, 1 (01 Jan 2002), 68--75.
Konstantinos Makantasis, Antonios Liapis, and Georgios N. Yannakakis. 2019. From Pixels to Affect: A Study on Games and Player Experience. In 8th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 1--7.
Robert G Marx, Alia Menezes, Lois Horovitz, Edward C Jones, and Russell F" Warren. 2003. A comparison of two time intervals for test-retest reliability of health status instruments. J Clin Epidemiol, Vol. 56, 8 (2003), 730--735.
Robert R. McCrae, John E. Kurtz, Shinji Yamagata, and Antonio Terracciano. 2011. Internal Consistency, Retest Reliability, and Their Implications for Personality Scale Validity. Personality and Social Psychology Review, Vol. 15, 1 (2011), 28--50.
Kenneth O. McGraw and S. P. Wong. 1996. Forming inferences about some intraclass correlation coefficients. Psychological Methods, Vol. 1, 1 (March 1996), 30--46.
Adam W. Meade and S. Bartholomew Craig. 2012. Identifying careless responses in survey data. Psychological Methods, Vol. 17, 3 (Sept. 2012), 437--455.
Elisa D. Mekler, Julia Ayumi Bopp, Alexandre N. Tuch, and Klaus Opwis. 2014. A Systematic Review of Quantitative Studies on the Enjoyment of Digital Entertainment Games. In Proceedings of the 32nd Annual ACM Conference on Human Factors in ComputingSystems. Association for Computing Machinery, New York, NY, USA, 927--936.
Pejman Mirza-Babaei, Lennart Nacke, Geraldine Fitzpatrick, Gareth White, Graham McAllister, and Nick Collins. 2012. Biometric Storyboards: Visualising Game User Research Data. In CHI '12 Extended Abstracts on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 2315--2320.
Mark S. Nagy. 2002. Using a single-item approach to measure facet job satisfaction. Journal of Occupational and Organizational Psychology, Vol. 75, 1 (2002), 77--86.
Jum C. Nunnally. 1978. Psychometric Theory. McGraw-Hill, New York, NY, USA.
Randy J. Pagulayan, Kevin Keeker, Dennis Wixon, Ramon L. Romero, and Thomas Fuller. 2003. User-centered design in games. In The human-computer interaction handbook: fundamentals, evolving technologies and emerging applications, Andrew Sears and Julie A. Jacko (Eds.). CRC Press, Baco Raton, FL, USA, 883--906.
Prolific. 2024. Prolific. Accessed: February 2024.
Qualtrics. 2024. Qualtrics XM. Accessed: February 2024.
R Core Team. 2023. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
Frederick F Reichheld. 2003. The one number you need to grow. Harvard business review, Vol. 81, 12 (2003), 46--55.
Richard M Ryan, C Scott Rigby, and Andrew Przybylski. 2006. The Motivational Pull of Video Games: A Self-Determination Theory Approach. Motivation and Emotion, Vol. 30, 4 (2006), 344--360.
Jeffrey M. Stanton, Evan F. Sinar, William K. Balzer, Amanda L. Julian, Paul Thoresen, Shahnaz Aziz, Gwenith G. Fisher, and Patricia C. Smith. 2002. Development of a compact measure of job satisfaction: The abridged Job Descriptive Index. Educational and psychological measurement, Vol. 62, 1 (2002), 173--191.
The jamovi project. 2023. Jamovi -- Ppen statistical software for the desktop and cloud. Accessed: February, 2024.
April Tyack, Peta Wyeth, and Madison Klarkowski. 2018. Video Game Selection Procedures For Experimental Research. Association for Computing Machinery, New York, NY, USA, 1--9.
Kaisa Väänänen-Vainio-Mattila, Virpi Roto, and Marc Hassenzahl. 2008. Towards Practical User Experience Evaluation Methods. In Proceedings of the International Workshop on Meaningful Measures: Valid Useful User Experience Measurement. 19--22.
Margaret Verkuyl, Naza Djafarova, Paula Mastrilli, and Lynda Atack. 2022. Virtual Gaming Simulation: Evaluating Players? Experiences. Clinical Simulation in Nursing, Vol. 63 (2022), 16--22.
Reinout E. De Vries, Anu Realo, and Jüri Allik. 2016. Using Personality Item Characteristics to Predict Single--Item Internal Reliability, Retest Reliability, and Self--Other Agreement. European Journal of Personality, Vol. 30, 6 (2016), 618--636.
Günter Wallner, Nour Halabi, and Pejman Mirza-Babaei. 2019. Aggregated Visualization of Playtesting Data. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 1--12.
J P Wanous, A E Reichers, and M J Hudy. 1997. Overall job satisfaction: how good are single-item measures? The Journal of applied psychology, Vol. 82, 2 (apr 1997), 247--252.
David Watson. 2004. Stability versus change, dependability versus error: Issues in the assessment of personality over time. Journal of Research in Personality, Vol. 38, 4 (2004), 319--350.
Josef Wiemeyer, Lennart Nacke, Christiane Moser, and Floyd Mueller. 2016. Player Experience 1st ed.). Springer, 243--271.
Paweł W. Wo'zniak, Jakob Karolus, Florian Lang, Caroline Eckerth, Johannes Schöning, Yvonne Rogers, and Jasmin Niess. 2021. Creepy Technology:What Is It and How Do You Measure It?. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI '21). Association for Computing Machinery, New York, NY, USA, Article 719, 13 pages.
Joanne M. Youngblut and Gail R. Casper. 1993. Focus on psychometrics single-item indicators in nursing research. Research in Nursing & Health, Vol. 16, 6 (Dec. 1993), 459--465.

Index Terms

  1. An Investigation of the Test-Retest Reliability of the miniPXI



    Information & Contributors


    Published In

    cover image Proceedings of the ACM on Human-Computer Interaction
    Proceedings of the ACM on Human-Computer Interaction  Volume 8, Issue CHI PLAY
    October 2024
    1726 pages
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].


    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 October 2024
    Published in PACMHCI Volume 8, Issue CHI PLAY


    Request permissions for this article.

    Check for updates

    Author Tags

    1. games user research
    2. miniPXI
    3. player experience
    4. questionnaire
    5. single-item measure
    6. test-retest reliability


    • Research-article


    Other Metrics

    Bibliometrics & Citations


    Article Metrics

    • 0
      Total Citations
    • 98
      Total Downloads
    • Downloads (Last 12 months)98
    • Downloads (Last 6 weeks)20
    Reflects downloads up to 08 Mar 2025

    Other Metrics


    View Options

    Login options

    Full Access

    View options


    View or Download as a PDF file.



    View online with eReader.







    Share this Publication link

    Share on social media