Maximum Similarity Index (MSI): A metric to differentiate the performance of novices vs. multiple-experts in serious games
Introduction
A learning activity without assessment is informal at best and comparable to the endeavor of hobbyists. As serious games are “designed to support knowledge acquisition and/or skill development,” not only is player-performance assessment an important, if not the most important, aspect of serious game evaluation (Bellotti et al., 2013, Loh et al., 2007, Shute et al., 2010), it is also a necessary component for these games to set themselves apart from other entertainment games (Michael & Chen, 2006). Since the advances in technology have made it increasingly easy to trace players’ in-game actions and behaviors for performance assessment (Loh, 2012b, Thawonmas and Iizuka, 2008, Wallner, 2013), stakeholders in the training and learning industries have begun asking for “measurable evidence of training or learning” to justify their investment and to ensure a good rate of return (Loh, 2011).
But based on findings from a recent survey (Alvarez, Djaouti, Rampnoux, & Alvarez, 2011), about 90% of serious games created thus far are for the purpose of propaganda, advertisement, or message broadcasting. This suggests that designers (and educators) believed in a learning model where the computer-based instruction or learning environment created would facilitate instruction or learning automatically when designed well. Moreover, since the purpose of these ‘educative message broadcasting’ games is to disseminate an idea (educative or propaganda), performance assessment is not essential and will, in fact, inflate the cost of production if added. Even though the survey unveiled that only 10% of the serious games are actually aiming at training and learning (which may require some sort of performance assessment component), this trend is changing.
As more and more serious games are now considered as the next natural progression of electronic training, the only factor that separates serious games from entertainment games is assessment (Michael & Chen, 2005). It is clear that serious games will require appropriate performance assessment to progress further (Ifenthaler et al., 2012, Kirkley et al., 2007, Van Eck, 2006). How then, do we assess performance in a virtual ‘serious-game’ environment when we can neither directly (a) measure learning that occurs in the mind, nor (b) observe the actions that occur in a non-physical environment? In other words, the only solution is to find a way to measure the players’ performance according to their actions performed within the training environment in situ as evidence of their learning (Loh, 2007, Loh, 2012b, Schmidt and Lee, 2011).
When designing for learning using virtual environments (as in serious games), it is generally accepted that appropriate objectives are needed to create the conditions for learning, which should, in turn, produce the outcome (or performance) desired. This is the position taken by many serious-game designers and publishers, who posit that games should be well-designed enough to facilitate learning automatically. However, such claims do not always satisfy stakeholders, who need empirical evidence to quantify performance improvement and calculate Returns of Investment (ROI) (Kozlov and Reinhold, 2007, Loh, 2012a). It follows that appropriate metrics are necessary to properly measure what actually constitute performance in situ (within that environment). Such performance metrics should ideally yield some sort of empirical and quantitative value that can then be incorporated into the calculation of ROI to meet the stakeholders’ needs.
In a previous study (Loh & Sheng, 2013), we introduced the String Similarity Index (SSI) as a possible way to differentiate performance of novices from experts based on how (dis-)similar their actions are within a ‘single-solution’ serious game environment. The purpose of the SSI is to convert performance of novices and experts into a single score (see Sauro & Kindlund, 2005) for comparison and ranking. This study extends the previous study by differentiating a group of novices from the experts based on how (dis-)similar their performances are within a ‘multiple-solution’ serious game environment. Since the SSI is not suitable for multiple comparisons, we have created a new metric called Maximum Similarity Index (MSI) for this purpose by taking into consideration multiple expert solutions.
The aim of this paper is not to showcase the serious game we have created. The game was created as a procedure-based (or sequential learning) serious game with multiple possible (expert) solutions in one training environment to highlight the need for an appropriate metric for performance measurement. When used appropriately, the MSI should serve to quantify the differences between novice and expert players’ performance and reduce it into a single value. Examples of such procedure-based serious training games can include: flight simulator (to measure if novices have learned the correct flight path given certain weather conditions), ground battle (to measure if appropriate strategies are taken given certain intelligence), chemistry/biological laboratory training (to measure if students have followed laboratory procedures correctly), nursing simulation and medical education (to measure if nursing/medical students have correctly executed hospital/diagnostic procedures), and route training for fire-fighters and policemen (to measure if firefighters and policemen in training have memorized the city map).
The interest in researching the performance differences between experts (or skilled performers) and novices is long standing and can be traced back as early as the 1940s (de Groot, 1978). An understanding of how and why experts perform differently from novices has great implications not only for skill acquisitions and training in the workplace (e.g., flying a plane in aviation and surgery in medicine) but also for performance improvement in general. The work by Ericsson and colleagues (e.g., Ericsson and Charness, 1994, Ericsson et al., 2006, Ericsson et al., 2007) revealed that expert performance is not an innate ability but a result of extended, deliberate practice, which enable the performers to overcome the limitations in working memory and sequential processing. This begs the question, “Can expertise be trained?”
In the case of perceptual expertise, research led by Gauthier and others (Gauthier et al., 1998, Krigolson et al., 2009, Tanaka et al., 2005) has shown that it is indeed possible for novices to be trained in expert face recognition in a laboratory environment. If ‘expertise can be trained,’ this would mean that it is possible to shorten the extent of deliberate practice required and make expertise training accessible to more people. In an industry that requires highly complex skills (e.g., aviation and surgery), the ability to produce more experts in a shorter timeframe can be cost saving for both the organizations and trainees alike. Given the recent academic interests to use non-entertainment digital games for interactive training, it is no wonder that many researchers have turned towards (what is more commonly known as) ‘serious games’ for expertise research (Rosenberg et al., 2005, Sabri et al., 2010), as well as gameplay data collection (Loh, 2012a, Loh, 2012b, Moura et al., 2011).
To ascertain players have learned, we must find a way to measure the players’ performance according to their actions performed within the training environment in situ as evidence of their learning. As players interact with the game environment, data are constantly being generated by the game engine and are traceable digitally as quantitative variables; hence, they are known as gameplay data. There was a time when only major game development companies could afford to collect players’ gameplay data for analysis because the process are both costly to implement (inviting players to a usability lab and paying them for the tasks) and difficult to analyze (requiring strong analytical skills and experience) (Wallner & Kriglstein, 2012).
But as digital and online technology becomes cheaper and faster, “recording player gameplay data has become a prevalent feature in many game and platform systems” (Medler, 2011). It is also possible to capture gameplay data during an on-going game with an in situ data collection method (Loh et al., 2007, Loh, 2012b), and performing data visualization to reveal patterns of behaviors (Moura et al., 2011), actionable analytics/insights (Rao, 2003). Once obtained, the (serious) games analytics can then be used to identify opportunity for monetization (Canossa, Seif El-Nasr, & Drachen, 2013), to improve the design of training environment as in a usability study (Kjeldskov & Skov, 2007), and to review players’ decision-making processes for performance improvement (Loh, 2012a).
Section snippets
Expert-novice behavioral differences
The difference between experts’ and novices’ behaviors in problem solving and decision making is a very well-studied phenomenon in training and psychology literature (Dreyfus, 2004, Dreyfus and Dreyfus, 1980). Although the indicators of expert-novice behaviors vary widely and can range from time-to-task-completion rate, to mental representations of knowledge, to specific gaze patterns in scanning for information (Underwood, 2005), it is generally agreed that expert behaviors are both observable
Materials and methods
As mentioned in Section 1, the aim of this paper is not to showcase the serious game developed, but to create an environment where multiple possible expert solutions are available within one training session in order to highlight the need for an appropriate metric for performance measurement. We used an in-house produced title, called The Guardian, created using the game development kit provided with Neverwinter Nights 2 (Baudoin, Sawyer, & Avellone, 2009) for this study. In the game narrative,
Results and discussion
Three Experts (E) and 31 Players (P) from a mid-western public university participated in this study, and a total of 11,109 (gameplay) data points were collected by means of Information Trails. Out of the forty (40) sets of gameplay data, nine were Experts’ action-sequences (subjects 1–9), and 31 were Players’ action-sequences (subjects 10–40).
We identified 4 distinct Experts’ action-sequences (ES1–ES4) from the nine sets of expert-generated action-sequences. We then compare each player’s
Conclusions
Although we have entered the era of ‘big data’ (Nisen, 2013), unlike for massively multiplayer online (MMO) games, these ‘big data’ have yet to become a reality for serious games. Firstly, there have yet to be any MMO serious games available; secondly, gameplay data are proprietary information and, therefore, not readily accessible for research. This obstacle will likely persist and make it difficult for researchers to derive new algorithms to analyze the gameplay data in a timely fashion. (In
Acknowledgments
This research was made possible in part through funding from the Defense University Research Instrumentation Program (DURIP) from the U.S. Army Research Office. The authors wished to thank our students, Mr. I-Hung Li, and Mr. Ting Zhou for their assistance in data collection.
References (53)
- et al.
The effect of time pressure on risky choice behavior
Acta Psychologica
(1981) - et al.
The effects of video game playing on attention, memory, and executive control
Acta Psychologica
(2008) - et al.
Training “Greeble” experts: A framework for studying expert object recognition processes
Vision Research
(1998) - et al.
Visual attention during brand choice: The impact of time pressure and task motivation
International Journal of Research in Marketing
(1999) - et al.
Serious games for knee replacement surgery procedure education and training
Procedia – Social and Behavioral Sciences
(2010) - Alvarez, J., Djaouti, D., Rampnoux, O., & Alvarez, V. (2011). Serious games market: Some key figures (from 1950’s to...
- et al.
Neverwinter Nights 2
(2009) - et al.
User assessment in serious games and technology-enhanced learning
Advances in Human-Computer Interaction
(2013) - et al.
Benefits of game analytics: Stakeholders, contexts and domains
Thought and choice in chess
(1978)
Game analytics – The basics
The five-stage model of adult skill acquisition
Bulletin of Science, Technology and Society
Expert performance: Its structure and acquisition
American Psychologist
The making of an expert
Harvard Business Review
The distribution of the flora in the Alpine Zone
New Phytologist
Building bridges between serious game design and instructional design
Studying usability in sitro: Simulating real world phenomena in controlled environments
International Journal of Human-Computer Interaction
Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise
Journal of Cognitive Neuroscience
Estimation of error rates in discriminant analysis
Technometrics
Designing online games assessment as “information trails
Improving the impact and return of investment of game-based learning
International Journal of Virtual and Personal Learning Environments
Cited by (34)
Learning experience assessment through players chat content in multiplayer online games
2024, Computers in Human BehaviorData on player activity and characteristics in a Serious Game Environment
2020, Data in BriefCitation Excerpt :Chord Diagrams [5] and the R circlize package [6] were used to visualize player tool use frequency and duration of action. In addition, the data was further analyzed using similarity measure, which is a statistical function to quantify the (dis)similarity of two objects, such as text strings, documents, audio files, digital photographic images, DNA sequences, and other digitized objects for pattern recognition [7,8]. Specifically, three different similarity measures including Cosine (Cos), Jaccard (Jac), and Longest Common Substring (LCS) coefficients were used [9] to compare room visit sequences of player groups based on their metacognition and goal orientation groups.
The impact of learner metacognition and goal orientation on problem-solving in a serious game environment
2020, Computers in Human BehaviorApplications of data science to game learning analytics data: A systematic literature review
2019, Computers and EducationSkill assessment in learning experiences based on serious games: A Systematic Mapping Study
2017, Computers and EducationUsing gameplay data to examine learning behavior patterns in a serious game
2017, Computers in Human BehaviorCitation Excerpt :In addition, the high-performing students showed more strategic tool use patterns; that is, they tended to work toward several sub-goals of determining suitable homes for each alien (Chi et al., 1981; Dreyfus, 2004). The emergence of serious games analytics enables researchers to trace students’ sequences of actions during the problem-solving process within the SG environment as evidence of their learning performance (Loh & Sheng, 2014; Schmidt & Lee, 2011). Such tracing results can promote diverse problem-solving strategies and identify challenges students with different expertise may face.
- 1
Tel.: +1 618 453 6913.