Maximum Similarity Index (MSI): A metric to differentiate the performance of novices vs. multiple-experts in serious games

doi:10.1016/j.chb.2014.07.022

Computers in Human Behavior

Volume 39, October 2014, Pages 322-330

https://doi.org/10.1016/j.chb.2014.07.022 Get rights and content

Highlights

•
Serious games need appropriate metrics to measure performance of players.
•
Comparing novice performance against multiple expert-solutions is difficult.
•
We created Maximum Similarity Index to compare novices against multiple experts.
•
Findings show Maximum Similarity Index to be more robust than nine game metrics.

Abstract

In learning environments, appropriate objectives are needed to create the conditions for learning and consequently the performance to occur. It follows that appropriate metrics would also be necessary to properly measure what actually constitute performance in situ (within that environment), and to measure if learning has indeed occurred. Serious games environments can be problematic for performance measurement because publishers often posit the game would automatically facilitate learning by their design. Stakeholders, on the other hand, require empirical proofs to quantify performance improvement and calculate Returns of Investment.

Serious games environment (an open-ended scenario) with ‘more-than-one correct solutions’ can be difficult for data analysis. In a previous study, we demonstrated the possible use of String Similarity Index to differentiate novices from experts based on how (dis-)similar their performances are within a ‘single-solution’ serious game environment. This study extends the previous study by differentiating a group of novices from the experts based on how (dis)similar their performances are within a ‘multiple-solution’ serious game environment. To facilitate the calculation of performance, we create a new metric for this purpose called, Maximum Similarity Index, to take into consideration the existence of multiple expert solutions. Our findings indicated that Maximum Similarity Index can be a useful metric for serious games analytics when such scenarios present themselves, both for the differentiation of novices from experts, and for the ranking of the player cohort. In a secondary analysis, we compared Maximum Similarity Index to other commonly available game metrics (such as time of completion) and found it to be more appropriate than other game metrics for the measurement of performance in serious games.

Introduction

A learning activity without assessment is informal at best and comparable to the endeavor of hobbyists. As serious games are “designed to support knowledge acquisition and/or skill development,” not only is player-performance assessment an important, if not the most important, aspect of serious game evaluation (Bellotti et al., 2013, Loh et al., 2007, Shute et al., 2010), it is also a necessary component for these games to set themselves apart from other entertainment games (Michael & Chen, 2006). Since the advances in technology have made it increasingly easy to trace players’ in-game actions and behaviors for performance assessment (Loh, 2012b, Thawonmas and Iizuka, 2008, Wallner, 2013), stakeholders in the training and learning industries have begun asking for “measurable evidence of training or learning” to justify their investment and to ensure a good rate of return (Loh, 2011).

But based on findings from a recent survey (Alvarez, Djaouti, Rampnoux, & Alvarez, 2011), about 90% of serious games created thus far are for the purpose of propaganda, advertisement, or message broadcasting. This suggests that designers (and educators) believed in a learning model where the computer-based instruction or learning environment created would facilitate instruction or learning automatically when designed well. Moreover, since the purpose of these ‘educative message broadcasting’ games is to disseminate an idea (educative or propaganda), performance assessment is not essential and will, in fact, inflate the cost of production if added. Even though the survey unveiled that only 10% of the serious games are actually aiming at training and learning (which may require some sort of performance assessment component), this trend is changing.

As more and more serious games are now considered as the next natural progression of electronic training, the only factor that separates serious games from entertainment games is assessment (Michael & Chen, 2005). It is clear that serious games will require appropriate performance assessment to progress further (Ifenthaler et al., 2012, Kirkley et al., 2007, Van Eck, 2006). How then, do we assess performance in a virtual ‘serious-game’ environment when we can neither directly (a) measure learning that occurs in the mind, nor (b) observe the actions that occur in a non-physical environment? In other words, the only solution is to find a way to measure the players’ performance according to their actions performed within the training environment in situ as evidence of their learning (Loh, 2007, Loh, 2012b, Schmidt and Lee, 2011).

When designing for learning using virtual environments (as in serious games), it is generally accepted that appropriate objectives are needed to create the conditions for learning, which should, in turn, produce the outcome (or performance) desired. This is the position taken by many serious-game designers and publishers, who posit that games should be well-designed enough to facilitate learning automatically. However, such claims do not always satisfy stakeholders, who need empirical evidence to quantify performance improvement and calculate Returns of Investment (ROI) (Kozlov and Reinhold, 2007, Loh, 2012a). It follows that appropriate metrics are necessary to properly measure what actually constitute performance in situ (within that environment). Such performance metrics should ideally yield some sort of empirical and quantitative value that can then be incorporated into the calculation of ROI to meet the stakeholders’ needs.

In a previous study (Loh & Sheng, 2013), we introduced the String Similarity Index (SSI) as a possible way to differentiate performance of novices from experts based on how (dis-)similar their actions are within a ‘single-solution’ serious game environment. The purpose of the SSI is to convert performance of novices and experts into a single score (see Sauro & Kindlund, 2005) for comparison and ranking. This study extends the previous study by differentiating a group of novices from the experts based on how (dis-)similar their performances are within a ‘multiple-solution’ serious game environment. Since the SSI is not suitable for multiple comparisons, we have created a new metric called Maximum Similarity Index (MSI) for this purpose by taking into consideration multiple expert solutions.

The aim of this paper is not to showcase the serious game we have created. The game was created as a procedure-based (or sequential learning) serious game with multiple possible (expert) solutions in one training environment to highlight the need for an appropriate metric for performance measurement. When used appropriately, the MSI should serve to quantify the differences between novice and expert players’ performance and reduce it into a single value. Examples of such procedure-based serious training games can include: flight simulator (to measure if novices have learned the correct flight path given certain weather conditions), ground battle (to measure if appropriate strategies are taken given certain intelligence), chemistry/biological laboratory training (to measure if students have followed laboratory procedures correctly), nursing simulation and medical education (to measure if nursing/medical students have correctly executed hospital/diagnostic procedures), and route training for fire-fighters and policemen (to measure if firefighters and policemen in training have memorized the city map).

The interest in researching the performance differences between experts (or skilled performers) and novices is long standing and can be traced back as early as the 1940s (de Groot, 1978). An understanding of how and why experts perform differently from novices has great implications not only for skill acquisitions and training in the workplace (e.g., flying a plane in aviation and surgery in medicine) but also for performance improvement in general. The work by Ericsson and colleagues (e.g., Ericsson and Charness, 1994, Ericsson et al., 2006, Ericsson et al., 2007) revealed that expert performance is not an innate ability but a result of extended, deliberate practice, which enable the performers to overcome the limitations in working memory and sequential processing. This begs the question, “Can expertise be trained?”

In the case of perceptual expertise, research led by Gauthier and others (Gauthier et al., 1998, Krigolson et al., 2009, Tanaka et al., 2005) has shown that it is indeed possible for novices to be trained in expert face recognition in a laboratory environment. If ‘expertise can be trained,’ this would mean that it is possible to shorten the extent of deliberate practice required and make expertise training accessible to more people. In an industry that requires highly complex skills (e.g., aviation and surgery), the ability to produce more experts in a shorter timeframe can be cost saving for both the organizations and trainees alike. Given the recent academic interests to use non-entertainment digital games for interactive training, it is no wonder that many researchers have turned towards (what is more commonly known as) ‘serious games’ for expertise research (Rosenberg et al., 2005, Sabri et al., 2010), as well as gameplay data collection (Loh, 2012a, Loh, 2012b, Moura et al., 2011).

To ascertain players have learned, we must find a way to measure the players’ performance according to their actions performed within the training environment in situ as evidence of their learning. As players interact with the game environment, data are constantly being generated by the game engine and are traceable digitally as quantitative variables; hence, they are known as gameplay data. There was a time when only major game development companies could afford to collect players’ gameplay data for analysis because the process are both costly to implement (inviting players to a usability lab and paying them for the tasks) and difficult to analyze (requiring strong analytical skills and experience) (Wallner & Kriglstein, 2012).

But as digital and online technology becomes cheaper and faster, “recording player gameplay data has become a prevalent feature in many game and platform systems” (Medler, 2011). It is also possible to capture gameplay data during an on-going game with an in situ data collection method (Loh et al., 2007, Loh, 2012b), and performing data visualization to reveal patterns of behaviors (Moura et al., 2011), actionable analytics/insights (Rao, 2003). Once obtained, the (serious) games analytics can then be used to identify opportunity for monetization (Canossa, Seif El-Nasr, & Drachen, 2013), to improve the design of training environment as in a usability study (Kjeldskov & Skov, 2007), and to review players’ decision-making processes for performance improvement (Loh, 2012a).

Section snippets

Expert-novice behavioral differences

The difference between experts’ and novices’ behaviors in problem solving and decision making is a very well-studied phenomenon in training and psychology literature (Dreyfus, 2004, Dreyfus and Dreyfus, 1980). Although the indicators of expert-novice behaviors vary widely and can range from time-to-task-completion rate, to mental representations of knowledge, to specific gaze patterns in scanning for information (Underwood, 2005), it is generally agreed that expert behaviors are both observable

Materials and methods

As mentioned in Section 1, the aim of this paper is not to showcase the serious game developed, but to create an environment where multiple possible expert solutions are available within one training session in order to highlight the need for an appropriate metric for performance measurement. We used an in-house produced title, called The Guardian, created using the game development kit provided with Neverwinter Nights 2 (Baudoin, Sawyer, & Avellone, 2009) for this study. In the game narrative,

Results and discussion

Three Experts (E) and 31 Players (P) from a mid-western public university participated in this study, and a total of 11,109 (gameplay) data points were collected by means of Information Trails. Out of the forty (40) sets of gameplay data, nine were Experts’ action-sequences (subjects 1–9), and 31 were Players’ action-sequences (subjects 10–40).

We identified 4 distinct Experts’ action-sequences (ES1–ES4) from the nine sets of expert-generated action-sequences. We then compare each player’s

Conclusions

Although we have entered the era of ‘big data’ (Nisen, 2013), unlike for massively multiplayer online (MMO) games, these ‘big data’ have yet to become a reality for serious games. Firstly, there have yet to be any MMO serious games available; secondly, gameplay data are proprietary information and, therefore, not readily accessible for research. This obstacle will likely persist and make it difficult for researchers to derive new algorithms to analyze the gameplay data in a timely fashion. (In

Acknowledgments

This research was made possible in part through funding from the Defense University Research Instrumentation Program (DURIP) from the U.S. Army Research Office. The authors wished to thank our students, Mr. I-Hung Li, and Mr. Ting Zhou for their assistance in data collection.

References (53)

H. Ben Zur et al.
The effect of time pressure on risky choice behavior
Acta Psychologica
(1981)
W.R. Boot et al.
The effects of video game playing on attention, memory, and executive control
Acta Psychologica
(2008)
I. Gauthier et al.
Training “Greeble” experts: A framework for studying expert object recognition processes
Vision Research
(1998)
R. Pieters et al.
Visual attention during brand choice: The impact of time pressure and task motivation
International Journal of Research in Marketing
(1999)
H. Sabri et al.
Serious games for knee replacement surgery procedure education and training
Procedia – Social and Behavioral Sciences
(2010)
Alvarez, J., Djaouti, D., Rampnoux, O., & Alvarez, V. (2011). Serious games market: Some key figures (from 1950’s to...
F. Baudoin et al.
Neverwinter Nights 2
(2009)
F. Bellotti et al.
User assessment in serious games and technology-enhanced learning
Advances in Human-Computer Interaction
(2013)
A. Canossa et al.
Benefits of game analytics: Stakeholders, contexts and domains
A. De Groot
Thought and choice in chess
(1978)

A. Drachen et al.

Game analytics – The basics

S.E. Dreyfus

The five-stage model of adult skill acquisition

Bulletin of Science, Technology and Society

(2004)

Dreyfus, S. E., & Dreyfus, H. L. (1980). A five-stage model of the mental activities involved in directed skill...

K.A. Ericsson et al.

Expert performance: Its structure and acquisition

American Psychologist

(1994)

K.A. Ericsson et al.

The making of an expert

Harvard Business Review

(2007)

P. Jaccard

The distribution of the flora in the Alpine Zone

New Phytologist

(1912)

J. Kirkley et al.

Building bridges between serious game design and instructional design

J. Kjeldskov et al.

Studying usability in sitro: Simulating real world phenomena in controlled environments

International Journal of Human-Computer Interaction

(2007)

Kozlov, S., & Reinhold, N. (2007). To play or not to play: Can companies learn to be n00bs, LFG, and lvl-up? In...

O.E. Krigolson et al.

Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise

Journal of Cognitive Neuroscience

(2009)

P.A. Lachenbruch et al.

Estimation of error rates in discriminant analysis

Technometrics

(1968)

C.S. Loh

Designing online games assessment as “information trails

Loh, C. S. (2011). Using in situ data collection to improve the impact and return of investment of game-based learning....

C.S. Loh

Improving the impact and return of investment of game-based learning

International Journal of Virtual and Personal Learning Environments

(2012)

Cited by (34)

Learning experience assessment through players chat content in multiplayer online games
2024, Computers in Human Behavior
Assessing players’ learning experiences in a proper manner is a fundamental aspect of successful game-based learning programs. One notable characteristic of these programs is stealth assessment, which involves integrating formative assessment into the learning environment without disrupting the learning process. In multiplayer online games (MOGs), the in-game online chat system is a commonly used tool that enables players to communicate through text or voice messages during gameplay. However, there is a lack of specific research on incorporating players’ in-game chat content for computational learning experience assessment, which could enhance the validity of stealth assessment. This study proposes a stealth assessment method based on natural language processing to highlight the significance of players’ in-game chat data in estimating learners’ skills in MOGs. A natural language processing model is developed using a distilled version of the Google BERT pre-trained model. The evaluations demonstrate that the proposed method accurately estimates a player’s skill level by analyzing a few chat messages from the player. This method has the potential to make a profound impact on the field of game-based learning by enabling more precise assessment and supporting the design of tailored interventions and adaptive learning systems. This study pioneers computational skill assessment through chats in MOGs, opening up new opportunities for future investigations in skill assessment and having the potential to transform the field of game-based learning.
Data on player activity and characteristics in a Serious Game Environment
2020, Data in Brief
Citation Excerpt :
Chord Diagrams [5] and the R circlize package [6] were used to visualize player tool use frequency and duration of action. In addition, the data was further analyzed using similarity measure, which is a statistical function to quantify the (dis)similarity of two objects, such as text strings, documents, audio files, digital photographic images, DNA sequences, and other digitized objects for pattern recognition [7,8]. Specifically, three different similarity measures including Cosine (Cos), Jaccard (Jac), and Longest Common Substring (LCS) coefficients were used [9] to compare room visit sequences of player groups based on their metacognition and goal orientation groups.
This dataset presents 159 players' computer log data when they play a Serious Game along with their characteristics data including problem-solving performance scores, demographic information, metacognition, and goal orientation measurements. A total of 85,194 log files were recorded during one-hour game play. The data here are related to the research paper entitled “The Impact of Learner Metacognition and Goal Orientation on Problem-Solving in a Serious Game Environment” by Liu and Liu [1]. The data collected would provide insights on the relationship among individual player characteristics, in-game behavior and performance in the game.
The impact of learner metacognition and goal orientation on problem-solving in a serious game environment
2020, Computers in Human Behavior
To understand the impact of learner metacognition and goal orientation on problem-solving, this study investigated 159 undergraduate students’ metacognition, goal orientations, and their problem-solving performances and processes in a laboratory setting using a Serious Game (SG) environment that adopts problem-based learning (PBL) pedagogy to teach space science. Utilizing cluster analysis, multiple regression, similarity measure and data visualization, this study analyzed multiple data sources, including computer log data, problem-solving performance scores, and survey data. The results show that both learner metacognition and goal orientation affected problem-solving. The findings of this study offer insights of how learner characteristics impact on problem-solving in SG environments with PBL pedagogy. It also contributes to understanding the design of SG environments to benefit learners based on their metacognitive levels.
Applications of data science to game learning analytics data: A systematic literature review
2019, Computers and Education
Data science techniques, nowadays widespread across all fields, can also be applied to the wealth of information derived from student interactions with serious games. Use of data science techniques can greatly improve the evaluation of games, and allow both teachers and institutions to make evidence-based decisions. This can increase both teacher and institutional confidence regarding the use of serious games in formal education, greatly raising their attractiveness. This paper presents a systematic literature review on how authors have applied data science techniques on game analytics data and learning analytics data from serious games to determine: (1) the purposes for which data science has been applied to game learning analytics data, (2) which algorithms or analysis techniques are commonly used, (3) which stakeholders have been chosen to benefit from this information and (4) which results and conclusions have been drawn from these applications. Based on the categories established after the mapping and the findings of the review, we discuss the limitations of the studies analyzed and propose recommendations for future research in this field.
Skill assessment in learning experiences based on serious games: A Systematic Mapping Study
2017, Computers and Education
Serious games are games with an educational purpose. In these games, players develop their skills by facing a number of challenges, and students are assessed according to their game playing behaviour. Assessment of serious game-based learning experiences has to take into account diverse features as game genre, pedagogical aim or game context. This paper analyses how skills are usually assessed in learning experiences based on serious games. To reach this objective, a systematic mapping study of more than 400 papers is undertaken. Papers were identified and classified according to a framework based on four categories: assessment aim, implementation, integration and primary assessment type. The reviewed literature mainly deals with contributions on methods and approaches for serious games. Results have revealed that most assessment methods are applied for a formative purpose more than for a certification purpose. Most frequent implementations such as game scoring and integrations like monitoring states were also uncovered. The main primary type of assessment detected was in-process. In addition, several limitations were found in the assessment methods: regarding the aim of assessment, certification of previous or attained skills was usually implemented out of the game; the scope of some implementations was limited because results were predefined earlier; and most of methods analysed present scalability issues because they rely on manual assessments. Such findings are analysed and discussed to clarify the state of the art and provide recommendations for further work in the area of serious games-based learning.
Using gameplay data to examine learning behavior patterns in a serious game
2017, Computers in Human Behavior
Citation Excerpt :
In addition, the high-performing students showed more strategic tool use patterns; that is, they tended to work toward several sub-goals of determining suitable homes for each alien (Chi et al., 1981; Dreyfus, 2004). The emergence of serious games analytics enables researchers to trace students’ sequences of actions during the problem-solving process within the SG environment as evidence of their learning performance (Loh & Sheng, 2014; Schmidt & Lee, 2011). Such tracing results can promote diverse problem-solving strategies and identify challenges students with different expertise may face.
Research has shown how open-ended serious games can facilitate students' development of specific skills and improve learning performance through problem-solving. However, understanding how students learn these complex skills in a game environment is a challenge, as much research uses typical paper-and-pencil assessments and self-reported surveys or other traditional observational and quantitative methods. The purpose of this study is to identify students' learning behavior patterns of problem-solving and explore behavior patterns of different performing groups within an open-ended serious game called Alien Rescue. To accomplish this purpose, this study intends to use gameplay data by incorporating sequential pattern mining and statistical analysis. The findings of this study confirmed the results from previous research (using ex situ data such as interviews) and at the same time provide an analytical approach to understand in-depth students' sequential behavior patterns using in situ gameplay data. This study examined the frequent sequential patterns between low- and high-performing students and showed that problem-solving strategies were different between these two performing groups. By using this integrated analytical method, we can gain a better understanding of the learning pathway of students’ performance and problem-solving strategies of students with different learning characteristics in a serious games context.

View all citing articles on Scopus

¹: Tel.: +1 618 453 6913.

View full text

Published by Elsevier Ltd.