Elsevier

Computers in Human Behavior

Volume 39, October 2014, Pages 322-330
Computers in Human Behavior

Maximum Similarity Index (MSI): A metric to differentiate the performance of novices vs. multiple-experts in serious games

https://doi.org/10.1016/j.chb.2014.07.022Get rights and content

Highlights

  • Serious games need appropriate metrics to measure performance of players.

  • Comparing novice performance against multiple expert-solutions is difficult.

  • We created Maximum Similarity Index to compare novices against multiple experts.

  • Findings show Maximum Similarity Index to be more robust than nine game metrics.

Abstract

In learning environments, appropriate objectives are needed to create the conditions for learning and consequently the performance to occur. It follows that appropriate metrics would also be necessary to properly measure what actually constitute performance in situ (within that environment), and to measure if learning has indeed occurred. Serious games environments can be problematic for performance measurement because publishers often posit the game would automatically facilitate learning by their design. Stakeholders, on the other hand, require empirical proofs to quantify performance improvement and calculate Returns of Investment.

Serious games environment (an open-ended scenario) with ‘more-than-one correct solutions’ can be difficult for data analysis. In a previous study, we demonstrated the possible use of String Similarity Index to differentiate novices from experts based on how (dis-)similar their performances are within a ‘single-solution’ serious game environment. This study extends the previous study by differentiating a group of novices from the experts based on how (dis)similar their performances are within a ‘multiple-solution’ serious game environment. To facilitate the calculation of performance, we create a new metric for this purpose called, Maximum Similarity Index, to take into consideration the existence of multiple expert solutions. Our findings indicated that Maximum Similarity Index can be a useful metric for serious games analytics when such scenarios present themselves, both for the differentiation of novices from experts, and for the ranking of the player cohort. In a secondary analysis, we compared Maximum Similarity Index to other commonly available game metrics (such as time of completion) and found it to be more appropriate than other game metrics for the measurement of performance in serious games.

Introduction

A learning activity without assessment is informal at best and comparable to the endeavor of hobbyists. As serious games are “designed to support knowledge acquisition and/or skill development,” not only is player-performance assessment an important, if not the most important, aspect of serious game evaluation (Bellotti et al., 2013, Loh et al., 2007, Shute et al., 2010), it is also a necessary component for these games to set themselves apart from other entertainment games (Michael & Chen, 2006). Since the advances in technology have made it increasingly easy to trace players’ in-game actions and behaviors for performance assessment (Loh, 2012b, Thawonmas and Iizuka, 2008, Wallner, 2013), stakeholders in the training and learning industries have begun asking for “measurable evidence of training or learning” to justify their investment and to ensure a good rate of return (Loh, 2011).

But based on findings from a recent survey (Alvarez, Djaouti, Rampnoux, & Alvarez, 2011), about 90% of serious games created thus far are for the purpose of propaganda, advertisement, or message broadcasting. This suggests that designers (and educators) believed in a learning model where the computer-based instruction or learning environment created would facilitate instruction or learning automatically when designed well. Moreover, since the purpose of these ‘educative message broadcasting’ games is to disseminate an idea (educative or propaganda), performance assessment is not essential and will, in fact, inflate the cost of production if added. Even though the survey unveiled that only 10% of the serious games are actually aiming at training and learning (which may require some sort of performance assessment component), this trend is changing.

As more and more serious games are now considered as the next natural progression of electronic training, the only factor that separates serious games from entertainment games is assessment (Michael & Chen, 2005). It is clear that serious games will require appropriate performance assessment to progress further (Ifenthaler et al., 2012, Kirkley et al., 2007, Van Eck, 2006). How then, do we assess performance in a virtual ‘serious-game’ environment when we can neither directly (a) measure learning that occurs in the mind, nor (b) observe the actions that occur in a non-physical environment? In other words, the only solution is to find a way to measure the players’ performance according to their actions performed within the training environment in situ as evidence of their learning (Loh, 2007, Loh, 2012b, Schmidt and Lee, 2011).

When designing for learning using virtual environments (as in serious games), it is generally accepted that appropriate objectives are needed to create the conditions for learning, which should, in turn, produce the outcome (or performance) desired. This is the position taken by many serious-game designers and publishers, who posit that games should be well-designed enough to facilitate learning automatically. However, such claims do not always satisfy stakeholders, who need empirical evidence to quantify performance improvement and calculate Returns of Investment (ROI) (Kozlov and Reinhold, 2007, Loh, 2012a). It follows that appropriate metrics are necessary to properly measure what actually constitute performance in situ (within that environment). Such performance metrics should ideally yield some sort of empirical and quantitative value that can then be incorporated into the calculation of ROI to meet the stakeholders’ needs.

In a previous study (Loh & Sheng, 2013), we introduced the String Similarity Index (SSI) as a possible way to differentiate performance of novices from experts based on how (dis-)similar their actions are within a ‘single-solution’ serious game environment. The purpose of the SSI is to convert performance of novices and experts into a single score (see Sauro & Kindlund, 2005) for comparison and ranking. This study extends the previous study by differentiating a group of novices from the experts based on how (dis-)similar their performances are within a ‘multiple-solution’ serious game environment. Since the SSI is not suitable for multiple comparisons, we have created a new metric called Maximum Similarity Index (MSI) for this purpose by taking into consideration multiple expert solutions.

The aim of this paper is not to showcase the serious game we have created. The game was created as a procedure-based (or sequential learning) serious game with multiple possible (expert) solutions in one training environment to highlight the need for an appropriate metric for performance measurement. When used appropriately, the MSI should serve to quantify the differences between novice and expert players’ performance and reduce it into a single value. Examples of such procedure-based serious training games can include: flight simulator (to measure if novices have learned the correct flight path given certain weather conditions), ground battle (to measure if appropriate strategies are taken given certain intelligence), chemistry/biological laboratory training (to measure if students have followed laboratory procedures correctly), nursing simulation and medical education (to measure if nursing/medical students have correctly executed hospital/diagnostic procedures), and route training for fire-fighters and policemen (to measure if firefighters and policemen in training have memorized the city map).

The interest in researching the performance differences between experts (or skilled performers) and novices is long standing and can be traced back as early as the 1940s (de Groot, 1978). An understanding of how and why experts perform differently from novices has great implications not only for skill acquisitions and training in the workplace (e.g., flying a plane in aviation and surgery in medicine) but also for performance improvement in general. The work by Ericsson and colleagues (e.g., Ericsson and Charness, 1994, Ericsson et al., 2006, Ericsson et al., 2007) revealed that expert performance is not an innate ability but a result of extended, deliberate practice, which enable the performers to overcome the limitations in working memory and sequential processing. This begs the question, “Can expertise be trained?”

In the case of perceptual expertise, research led by Gauthier and others (Gauthier et al., 1998, Krigolson et al., 2009, Tanaka et al., 2005) has shown that it is indeed possible for novices to be trained in expert face recognition in a laboratory environment. If ‘expertise can be trained,’ this would mean that it is possible to shorten the extent of deliberate practice required and make expertise training accessible to more people. In an industry that requires highly complex skills (e.g., aviation and surgery), the ability to produce more experts in a shorter timeframe can be cost saving for both the organizations and trainees alike. Given the recent academic interests to use non-entertainment digital games for interactive training, it is no wonder that many researchers have turned towards (what is more commonly known as) ‘serious games’ for expertise research (Rosenberg et al., 2005, Sabri et al., 2010), as well as gameplay data collection (Loh, 2012a, Loh, 2012b, Moura et al., 2011).

To ascertain players have learned, we must find a way to measure the players’ performance according to their actions performed within the training environment in situ as evidence of their learning. As players interact with the game environment, data are constantly being generated by the game engine and are traceable digitally as quantitative variables; hence, they are known as gameplay data. There was a time when only major game development companies could afford to collect players’ gameplay data for analysis because the process are both costly to implement (inviting players to a usability lab and paying them for the tasks) and difficult to analyze (requiring strong analytical skills and experience) (Wallner & Kriglstein, 2012).

But as digital and online technology becomes cheaper and faster, “recording player gameplay data has become a prevalent feature in many game and platform systems” (Medler, 2011). It is also possible to capture gameplay data during an on-going game with an in situ data collection method (Loh et al., 2007, Loh, 2012b), and performing data visualization to reveal patterns of behaviors (Moura et al., 2011), actionable analytics/insights (Rao, 2003). Once obtained, the (serious) games analytics can then be used to identify opportunity for monetization (Canossa, Seif El-Nasr, & Drachen, 2013), to improve the design of training environment as in a usability study (Kjeldskov & Skov, 2007), and to review players’ decision-making processes for performance improvement (Loh, 2012a).

Section snippets

Expert-novice behavioral differences

The difference between experts’ and novices’ behaviors in problem solving and decision making is a very well-studied phenomenon in training and psychology literature (Dreyfus, 2004, Dreyfus and Dreyfus, 1980). Although the indicators of expert-novice behaviors vary widely and can range from time-to-task-completion rate, to mental representations of knowledge, to specific gaze patterns in scanning for information (Underwood, 2005), it is generally agreed that expert behaviors are both observable

Materials and methods

As mentioned in Section 1, the aim of this paper is not to showcase the serious game developed, but to create an environment where multiple possible expert solutions are available within one training session in order to highlight the need for an appropriate metric for performance measurement. We used an in-house produced title, called The Guardian, created using the game development kit provided with Neverwinter Nights 2 (Baudoin, Sawyer, & Avellone, 2009) for this study. In the game narrative,

Results and discussion

Three Experts (E) and 31 Players (P) from a mid-western public university participated in this study, and a total of 11,109 (gameplay) data points were collected by means of Information Trails. Out of the forty (40) sets of gameplay data, nine were Experts’ action-sequences (subjects 1–9), and 31 were Players’ action-sequences (subjects 10–40).

We identified 4 distinct Experts’ action-sequences (ES1–ES4) from the nine sets of expert-generated action-sequences. We then compare each player’s

Conclusions

Although we have entered the era of ‘big data’ (Nisen, 2013), unlike for massively multiplayer online (MMO) games, these ‘big data’ have yet to become a reality for serious games. Firstly, there have yet to be any MMO serious games available; secondly, gameplay data are proprietary information and, therefore, not readily accessible for research. This obstacle will likely persist and make it difficult for researchers to derive new algorithms to analyze the gameplay data in a timely fashion. (In

Acknowledgments

This research was made possible in part through funding from the Defense University Research Instrumentation Program (DURIP) from the U.S. Army Research Office. The authors wished to thank our students, Mr. I-Hung Li, and Mr. Ting Zhou for their assistance in data collection.

References (53)

  • A. Drachen et al.

    Game analytics – The basics

  • S.E. Dreyfus

    The five-stage model of adult skill acquisition

    Bulletin of Science, Technology and Society

    (2004)
  • Dreyfus, S. E., & Dreyfus, H. L. (1980). A five-stage model of the mental activities involved in directed skill...
  • K.A. Ericsson et al.

    Expert performance: Its structure and acquisition

    American Psychologist

    (1994)
  • K.A. Ericsson et al.

    The making of an expert

    Harvard Business Review

    (2007)
  • P. Jaccard

    The distribution of the flora in the Alpine Zone

    New Phytologist

    (1912)
  • J. Kirkley et al.

    Building bridges between serious game design and instructional design

  • J. Kjeldskov et al.

    Studying usability in sitro: Simulating real world phenomena in controlled environments

    International Journal of Human-Computer Interaction

    (2007)
  • Kozlov, S., & Reinhold, N. (2007). To play or not to play: Can companies learn to be n00bs, LFG, and lvl-up? In...
  • O.E. Krigolson et al.

    Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise

    Journal of Cognitive Neuroscience

    (2009)
  • P.A. Lachenbruch et al.

    Estimation of error rates in discriminant analysis

    Technometrics

    (1968)
  • C.S. Loh

    Designing online games assessment as “information trails

  • Loh, C. S. (2011). Using in situ data collection to improve the impact and return of investment of game-based learning....
  • C.S. Loh

    Improving the impact and return of investment of game-based learning

    International Journal of Virtual and Personal Learning Environments

    (2012)
  • Cited by (34)

    • Data on player activity and characteristics in a Serious Game Environment

      2020, Data in Brief
      Citation Excerpt :

      Chord Diagrams [5] and the R circlize package [6] were used to visualize player tool use frequency and duration of action. In addition, the data was further analyzed using similarity measure, which is a statistical function to quantify the (dis)similarity of two objects, such as text strings, documents, audio files, digital photographic images, DNA sequences, and other digitized objects for pattern recognition [7,8]. Specifically, three different similarity measures including Cosine (Cos), Jaccard (Jac), and Longest Common Substring (LCS) coefficients were used [9] to compare room visit sequences of player groups based on their metacognition and goal orientation groups.

    • Using gameplay data to examine learning behavior patterns in a serious game

      2017, Computers in Human Behavior
      Citation Excerpt :

      In addition, the high-performing students showed more strategic tool use patterns; that is, they tended to work toward several sub-goals of determining suitable homes for each alien (Chi et al., 1981; Dreyfus, 2004). The emergence of serious games analytics enables researchers to trace students’ sequences of actions during the problem-solving process within the SG environment as evidence of their learning performance (Loh & Sheng, 2014; Schmidt & Lee, 2011). Such tracing results can promote diverse problem-solving strategies and identify challenges students with different expertise may face.

    View all citing articles on Scopus
    1

    Tel.: +1 618 453 6913.

    View full text