Keywords

1 Introduction

Video games can be used to support player learning and decision-making strategies, and the training power in games can be increased by incorporating intelligent tutoring approaches. Few intelligent tutoring systems have been implement in a serious, 3D, replayable, video game [5, 6, 11, 19]. This combination introduces some difficult challenges for ITS designers. VanLehn (2011) evaluated the state of human tutoring and intelligent tutoring and identified several novel approaches to intelligent tutoring that needed further research (e.g. open student model) [15].

As part of a large research project a number of intelligent tutoring design alternatives were integrated into a serious, 3D-immersive game called Heuristica. The purpose of Heuristica is to teach students to recognize and mitigate cognitive biases [12]. Cognitive biases are tendencies for humans to think in certain ways that provide shortcuts in everyday decision-making, but which can lead to errors in complex decision-making. Heuristica provides a set of scenarios within a space station narrative where the player can interact with game characters to perform tasks such as diagnosing and repairing problems, and observing and evaluating game characters performing tasks. The student is provided with definitions of the cognitive biases, methods to mitigate the tendency toward those biases and opportunities to apply those methods in both decision-making opportunities and in recognizing those biases in others. Figure 1 shows an example of the in-game experience.

Fig. 1.
figure 1

Example Learning Opportunity (LO) in the Heuristica game, asks players to diagnose a patient and recognize confirmation bias, a common cognitive bias.

Video games can be used to support player learning and decision-making strategies [9, 10, 14, 16, 17], and the training power in games can be increased by incorporating intelligent tutoring approaches [13, 19]. For the Heuristica project we developed and implemented a set of intelligent tutoring approaches and experimented with alternatives centered around the Student Model working with the Content Selector, which makes decisions based on the state of the Student Model. The learning experience was tailored based on the student’s prior knowledge, performance in gameplay, and personal preferences. Game activities were modularized, enabling game content to be selected and ordered to support the student’s need for extra practice in certain areas and less in others. These game activities are called Learning Opportunities (LOs).

We describe the design and functionality of the Student Modeler and interface features that support student feedback and interaction, along with the representation of the Student Model, its mapping to concepts to be learned, and the indexing of in-game learning opportunities to support the tailoring of gameplay consistent with the Content Selector’s reasoning. We describe our experiments with open student model summary screen representations, mixed-initiative opportunities, massed vs. spaced practice, and algorithms for selection of learning opportunities.

1.1 Intelligent Tutoring in the Heuristica Game

The Heuristica game hosts a set of modular components for facilitating a student’s training and the analysis of that student’s learning through game participation. These training components, shown in Fig. 2, include the Student Modeler and the Content Selector. The Student Modeler and the Content Selector use reasoning techniques that are guided by the Learning Theories and Teaching Theories extended from existing learning and teaching theories by general psychology theory and studies performed during the Heuristica project. Our Student Model, overlaid on the Curriculum Model, is used to identify the areas of misunderstanding or lack of knowledge and reason about the student’s knowledge based on activities in the game, including the identification of cognitive biases exhibited in the student’s performance. Figure 3 on the next page shows, in detail, the step-by-step interactions among the components of the Heuristica intelligent tutoring system.

Fig. 2.
figure 2

Heuristica intelligent tutoring components

Fig. 3.
figure 3

Timeline of the interactions among the Heuristica Game and student modeling components.

Student Model and Curriculum Representation.

Traditionally, student models are tied to a domain model that defines the study areas of the learning system [5, 8, 10]. A domain model provides a way to represent, index or describe the concepts, procedures or skills that the student should be learning [6]. There are many approaches to modeling the domain. We have chosen a practical approach that allows us to index the game LOs by which concepts they teach and/or evaluate. This is tied to the Student Model through a common representation called an overlay model.

An overlay representation [1] of a student model is an approach in which the student model is built on top of a curriculum model; that is, they use a common representation. The curriculum is a form of the domain model and is a representation, enumeration, or indexing of the concepts being taught by the learning system. The overlay student model representation is consistent with the curriculum model and associates a mastery level with each concept in the curriculum. The Student Model uses student performance and behaviors in the learning system to provide evidence of mastery levels associated with elements in the curriculum. These mastery levels in the Student Model are updated as the student moves through the tutoring system. The mastery levels increase or decrease depending on the student’s performance.

A readout of an individual student model at any time provides an estimate of student’s accomplishments and makes explicit the concepts or skills that need further work. This information is used by the Content Selector to select and order scenarios and activities in the simulations.

The curriculum model in Heuristica consists of an explicit representation of the concepts and skills related to recognizing and mitigating cognitive biases as well as the relationships and interconnections among them. This curriculum model includes the concepts that are to be taught or experienced through the student’s interaction with Heuristica. It represents concepts that are declarative knowledge, procedural knowledge and metacognitive knowledge. We also represent relationships among concepts, including prerequisite and sub-concept relationships that are used to reason about the sequence of activities and game scenarios that should be made available to the student. In our evaluations, LOs were developed that teach the definitions, recognition and mitigation techniques for the following cognitive biases:

  • Confirmation Bias refers to the tendency to favor confirming information over disconfirming information when searching for information or when testing hypotheses.

  • Fundamental Attribution Error occurs when individuals weigh personal or dispositional explanations for others’ actions, while neglecting the role that situational forces have in determining behavior.

  • Bias Blind Spot is a meta-cognitive bias in which a person reports that he/she is less susceptible to a bias than others.

  • Anchoring Bias occurs when a person’s estimates are unduly affected upward or downward based on a single piece of information given immediately before the estimate.

  • Representativeness Bias arises when a person focuses too much on salient characteristics that are similar to certain populations.

  • Projection Bias is a person’s belief that other people have characteristics similar to his or her own.

Two different Heuristica games were developed, each teaching three of the six biases. For each of the six biases, a single concept group encompasses the set of specific concepts that collectively cover the knowledge that students should learn with respect to that bias.

Student Modeler Maintains the Student Model.

The Student Modeler uses information about the student, such as his or her background, preferences, knowledge state and goals to create and maintain the Student Model. The Student Model is built and stored in a database, then updated over time as the student interacts with the system. It is this part of an intelligent tutoring system that enables it to answer questions about the student. It can be used to provide a snapshot of the student’s mastery levels at any time. The information in the Student Model is used to tailor the instruction to the needs of the student. The Student Modeler monitors the activity of the student in the serious game, infers and models his or her strengths and weaknesses (by analysis of the activities log produced by the gaming system and the use of inferencing techniques guided by the Learning Theory) and updates a set of values in the Student Model to represent the current state or mastery level of each concept that is a component of the student’s knowledge.

The Student Modeler works in conjunction with the part of the Heuristica framework that scores performance on each LO. It accesses database tables that contain LO decisions and appropriate answers that are used in evaluating the student’s performance and identifying the level of mastery exhibited for skills and concepts used in the activities. This information is made available to the Student Modeler which then uses it to update the Student Model.

The Content Selector Organizes Gameplay.

The Content Selector reasons about and selects scenarios and activities stored in a database structure where each learning activity has associated with it a set of concepts or skills (used to index the content in a content library) that are expected to be used by the student in performing that activity. The Content Selector chooses (guided by the Teaching Theory and the current state of the Student Model) a scenario or activity (an LO) that the student needs to complete in order to master the cognitive biases curriculum. It explicitly keeps track of the activities which the student has already participated in, and concepts that the student has already shown the mastery of when selecting what to teach next.

The Content Selector chooses the learning opportunities to be presented to the student in a given interaction with the game, and selects the sequence for presentation through a computational implementation guided by the learning and teaching theories. It may choose to replay an LO that the student has experienced before (within the replay limit parameter associated with that LO in the database), and it may decide to skip learning activities that address concepts that the student already understand, based on the contents of the Student Model. These learning activities are sequenced in a database table for use in driving the gaming scenarios.

Scoring and Mastery Levels.

In Heuristica, the scoring performance of the student within a given LO is performed by the game component being played. The LO records the performance scores for each concept into the game log. The Student Model calculates a concept understanding score for the student, called the mastery level, which is a value from 0.0 to 1.0 for each concept. A mastery level threshold, which is adjustable by an initialization parameter, is used to determine when the student’s performance is sufficient to decide that the student no longer needs additional practice on a given concept. The choice of this threshold must be made in consideration of the length of gameplay allowed and the level of student performance which indicates sufficient mastery. This choice of threshold is also affected by the strictness of the scoring component in the game.

Initializing the Student Model.

A student’s interaction with the serious game system begins outside the game itself, in the form of a pretest conducted online. This pretest serves two important roles: it provides a baseline picture of the student’s initial understanding of the relevant concepts, and pretest scores may be used to seed the Student Model with initial mastery values. When the Student Model is used to make gameplay decisions, students who demonstrate a priori knowledge can move more quickly through concepts that need not be covered in as much detail, than students with less initial understanding.

2 Practical Game Play Considerations

In the experiments with Heuristica reported elsewhere [16,17,18,19], students were given a pretest on the concepts which are taught in the game and posttest after playing the game to provide a measure of the learning that occurred in the game and to compare the learning in the tailored game approach to that in a fixed-order game.

One of the constraints of the Heuristica intelligent learning serious game was to limit gameplay to approximately one hour, with learning to take place over a single playthrough of the game. This is a set of conditions that is different from the conditions in which student models are traditionally applied, and some of the advantages of a longitudinal student model are not available.

Several goals are desired for the proper learning and gameplay experience, and the Content Selector uses these conditions to reason about the LO selections. Among these goals are the following:

  1. 1.

    Students should reach mastery in all concepts.

  2. 2.

    Students should experience a variety of gameplay options, such as game LOs mixed with worked examples.

  3. 3.

    Students should feel like they are making progress as the game is played.

  4. 4.

    Students’ time should be used in an efficient, but effective manner: gameplay should not be too short or too long.

  5. 5.

    Mastery of the concepts should improve posttest cognitive bias test scores.

The Student Model is designed such that students are expected to play until they reach mastery in all concepts. There are three conditions that will end the game:

  • Game complete: All concepts have been mastered.

  • Content exhausted: there are no more playable LOs available to teach the concepts that have not yet been mastered. This is usually because the relevant LOs have already reached their replay limits. The replay limits for individual LOs are parameters that can be tuned to provide a balance between the improved learning from additional practice and the potential for the student to lose interest after a certain number of LO replays.

  • Time limit reached: the student has exceeded the maximum time allowed for gameplay.

The first end state is preferred and we call this a “complete” game. The other two end states indicate that something prevented the student from learning the relevant content before the game ended. These are “incomplete” games. There are several underlying causes that can result in an incomplete game. For example:

  1. 1.

    Gameplay difficulty is greater than learning difficulty, which means a competent student is unable to successfully complete in-game tasks irrespective of learning the underlying material.

  2. 2.

    Mastery scoring in gameplay is too harsh with respect to game objectives, which means a student who is performing qualitatively well is earning poor quantitative scores.

  3. 3.

    Gameplay content is failing to teach the learning material, which means that a competent student is failing to advance in learning.

  4. 4.

    A student is unable to learn the material quickly enough due to personal limitations.

This analysis was important to the project as the game was being developed and we were experimenting with scoring techniques, gameplay LO development and teaching quality, gameplay mechanics and ease of use. Analysis of the student model representation and mastery levels was useful in the design iterations of these game components.

3 Intelligent Tutoring Design Alternatives

The modular design of Heuristica and its student modeling components enabled the testing and evaluation of numerous teaching approaches and system design alternatives. As we experimented with some of our earliest designs, some of the shortcomings were revealed and we replaced some of the reasoning and algorithms with improved approaches. There are lessons to be learned from a review of the alternatives and experiences in this project.

In the context of Heuristica under the constraints of gameplay duration and single-playthrough learning, a few of the intelligent tutoring design alternatives provided learning improvements over a non-tailored game experience, but many did not. Still, there are lessons to be learned from the software and interaction designs and approaches. Some of these interaction approaches may be useful with extensions or under other conditions, and we present the most interesting of them here.

3.1 Open Student Model

An open student model [4, 7, 20] allows the student to inspect his or her progress as tracked by the student model, including the opportunity for the student to learn from the weaknesses recognized by reflecting on the mastery levels. An open student model also provides the opportunity for mixed-initiative interaction with the game; for example, if the student needs more practice on a particular concept, he or she could request that practice based on what was learned by inspecting the student model. Figure 4 shows one iteration of this interface within the game.

Fig. 4.
figure 4

Open Student Model interface within the game, including the student’s progress in six concept subgroups and a reflection opportunity.

In order to support the open student model, we provided a summary screen to the Student as a snapshot of his or her progress for insight into learning and for motivation. Throughout the Heuristica project we experimented with several representations of the summary screen. We were given feedback that our initial approach was too “busy” and contained too much information; later versions included a simpler graph that combined several related concepts and tracked progress in those primary areas. We experimented with a screen that better fit with the game’s narrative and used the language of badges and promotion on the Heuristica space station. A straightforward bar graph that showed progress made in each of the concept subgroups and how far the student was from mastery in each of those concept subgroups was well accepted.

3.2 Massed vs. Spaced Practice

Repetitively exposing students to material in small lessons (i.e., spaced presentation), versus aggregated presentation of the same material (i.e., massed presentation), generally leads to improved learning outcomes. The benefits of distributed spacing have been realized for over a century [21]. In Heuristica, in which gameplay consists of a set of short LOs and in which total gameplay time is limited, the Content Selector must decide how much gameplay be presented on a particular topic before switching to LOs focused on a different bias concept group. Working with cognitive psychologists we identified three approaches:

  1. 1.

    Go deep (massed practice): provide the student with experience in LOs associated with one cognitive bias at a time

  2. 2.

    Go broad (spaced practice): rotating through LOs related to multiple cognitive biases and then returning to iterate on all of them

  3. 3.

    Mixed approach: allow the student to achieve a moderate level of mastery in a given cognitive bias before moving to the next one. Once the student has covered each bias to a moderate level, return to reach full mastery in each of the cognitive biases.

Recognizing that there will be instances when any one of these is most desirable, we implemented a parameter that would allow the game to run in any one of these modes. The mixed approach solved several problems for us. The cognitive bias content that the Heuristica game teaches requires several LOs before the student has enough experience on a cognitive bias to begin to grasp the concepts, which argues against a pure go-broad approach. In addition, one of our designs includes a mixed-initiative functionality (described later in this paper) which offers the student a choice of two LOs for his or her next learning activity. If we are giving the student a meaningful choice, these two LOs should contain notably different content. The mixed approach enabled us to address both of these considerations. We recognize that there will likely be different tradeoffs in other serious games.

3.3 Mixed Initiative

In the context of an intelligent tutoring system, “mixed-initiative” interaction refers to an approach in which, in some cases, the system decides which learning opportunity the student will experience next, and in other cases the student may decide. As one of the experimental designs in Heuristica, we developed an interaction that would provide the student with two choices as to which LO to play next. The choices offered by the Content Selector were both relevant to the student’s learning process, as represented in the Student Model. This choice encouraged the student to reason about his or her own learning needs and to allow the student the opportunity to select something that may be of interest [4, 20]. In the Heuristica mixed-initiative approach, the set of choices was constrained by the narrative flow (in the form of LO prerequisites) and concept presentation order (in the form of concept prerequisites). Figure 5 shows how this choice was presented to players.

Fig. 5.
figure 5

In-game user interface showing mixed-initiative choice. The highlighted bars indicate to the student which bias subgroups are covered by the choice under the cursor.

The first mixed-initiative algorithm design was implemented with the go-deep approach, which resulted in the student often being offered the choice between two LOs that were very similar in content. We followed this with a combined approach, in which the student received either (a) a choice between two LOs of the same bias whenever the student was in the massed portion of the learning process, or (b) a choice between two LOs in different biases whenever the student had reached moderate mastery of the current bias. The mixed approach to the massed-vs-spaced tradeoff worked well in support of the mixed-initiative goal of allowing the student some choice within the constraints of Heuristica.

3.4 Reflection

In an effort to increase interaction and to encourage students to analyze their own learning [3, 20, 22], we implemented a reflection screen after some of the LOs. This reflection screen presented the student with a question related to what had been learned about the most-recently presented topic, such as “What is an important consideration in bias mitigation?” Another version of reflection used an in-game narrative prompt that encouraged the student to leave hints for the next person who played that LO. Figure 4 on the previous page includes one iteration of the reflection interaction within the game.

The reflection approach provided us with some interesting insight into what was being learned in some of the LOs. It allowed us to see if the messages that were being presented in the LOs were coming through clearly. It also presented a trade-off decision regarding how often we could ask the student to reflect without becoming irritating [18].

3.5 Student Model Seeding Alternatives

We experimented with seeding the Student Model using different multipliers against the pretest values, in addition to conditions in which students began with no seeding. The implementation of the Student Modeler included parameters that controlled seeding behavior. A maximum seed multiplier, 1.0, would allow students to receive full credit for per-concept scores earned on the pretest, enabling a student to reach measured mastery quickly, potentially shortening gameplay time but leaving gaps in the student’s real-world learning. Lower multiplier values (and completely disabled seeding) would require the student to spend more time in the game to prove their mastery of concepts. The seeding multiplier must also take into consideration the accuracy of the mapping between pretest questions and in-game concept identifiers. A seeding multiplier of 0.65 represented a compromise between these two extremes.

3.6 Mastery Level Calculation Alternatives

Multiple algorithms for calculating the mastery level of a given concept were evaluated as part of this project. Our experiments identified several tradeoff dimensions with the mastery level calculations: the desire to manage the length of the gameplay, the recognition that scores in the distant past are not as relevant to estimating the student’s current knowledge state, and the desire to keep students motivated by not decreasing mastery levels following an unsuccessful LO performance.

In one design alternative, the mastery level is based on a rolling average of the last n scores posted for a particular concept for this student (where n = 4, typically, for a short game like Heuristica). The technique of basing the measure of the student’s knowledge about a given concept on a rolling average allows for the student’s improvement over time to be measured in a way that excludes the scores from the “distant” past, e.g., the complete lack of knowledge in the beginning, but includes the student’s recent performance that exhibits his or her state of knowledge over LOs.

In an attempt to shorten the time of gameplay, we tested the use of a “best-case” mastery level, in which each concept’s overall mastery level was calculated using the student’s best n scores. This approach also addressed a concern expressed in playtesting comments and reviews by designers that students were discouraged by instances when more play and experience with a particular concept could result in a drop in their mastery levels. This can happen if a student performs poorly on a later LO and the low score gets averaged in for mastery level calculation. However, the best-case scoring algorithm (as opposed to the most-recent scoring algorithm) resulted in too little practice for students as it allowed mastery levels to rise more quickly than actual concept learning, which was indicated by lower learning levels in the posttest.

3.7 Novelty in the Content Selector

In Heuristica, each LO has a parameter that controls the number of times that a student could be given that LO to play. Some of the LOs teach several important concepts and the Content Selector Algorithms tend to choose them for replay as many times as the replay parameter will allow. The result is that some LOs never get played. In order to give students adequate practice and repetition, but still provide variety for an interesting game, a “novelty” metric was added to the Content Selector. This decreased the selection likelihood of LOs that had already been played in favor of unplayed LOs that covered the same target concepts. The purpose of this parameter is to prevent the game from being too repetitive and to provide more interest and variety to the players.

3.8 Simulations to Support Rapid Prototyping

The Student Model works in tandem with the game to evaluate student performance and schedule gameplay elements that are best suited to the student’s current progress. To test Student Modeler improvements without playing the entire game, we developed a Game Surrogate that stands in for the game itself and allows the tester to enter concept scores either manually or from historical cases gathered from real students.

Figure 6 shows the Student Model above the Game Surrogate, its History window, and the Game Surrogate Automator. The Game Surrogate prompts the tester to enter scores for each concept in the current LO. The History window enables the user to select from a set of real concept scores gathered from the logs of previous student tests.

Fig. 6.
figure 6

The Student Model diagnostic view, showing the current player’s scores for all concepts after each LO played so far (top); the Game Surrogate testing harness (bottom left) and its historical data selection dialog (bottom middle); and the Game

In addition to direct testing of the Student Model for a single configuration, statistical analysis of many concepts across many student scores was necessary. The Game Surrogate Automator allows rapid simulation of multiple complete gameplays in order to determine the effect of parameter and behavior adjustments to the Student Modeler and Content Selector components. The Automator simulates responses from a student playing the game, randomly selecting a set of concept scores for each LO based on the scores from real student gameplay history. It creates multiple games and virtually plays each game to completion with the Student Modeler and Content Selector performing their normal roles in the evaluation and LO-selection process.

For example, to test the effect of adjusting the Mastery Score Threshold from 0.8 to 0.82, the Automator was configured to play 50 games with each setting. The logs of those games were then analyzed, revealing that the average number of LOs played increased by 2 based on in game scores earned by students in the previous testing cycles.

Evaluation of the game with real students is required for the testing of some design decisions. For many changes, however, these testing components provided a harness within which certain parameters and behaviors of student modeling components could be compared much more rapidly but with a reasonable level of fidelity.

4 Conclusion

This paper has described our project’s investigation of design alternatives to support learning in the serious game Heuristica and the lessons learned through the exploration of their usage and behaviors in the integrated system. The exploration centered on the design decisions and resulting tradeoffs for a student model and its associated content selector, which makes decisions based on the state of the student model. These design alternatives were explored within the constraints of a short (approximately one hour) gameplay session with a requirement to complete learning from one playthrough of the game. We believe that the described designs and components can be extended and adapted to other games with a longer allowed gameplay and multiple sessions, and that the lessons learned can be applied under other conditions.

This paper discusses the design tradeoffs for a number of characteristics of the intelligent tutoring portion of this project, including:

  1. 1.

    An open student model that shows students how the system perceives their concept mastery;

  2. 2.

    Massed vs. spaced practice approaches that guide content selection with respect to bias concept groups;

  3. 3.

    Mixed-initiative choices offered to students, and their dependence on the massed vs. spaced approaches;

  4. 4.

    A reflection question to encourage the student to reason about his or her own learning and the value of the responses in providing insight into what was actually being learned by the LOs;

  5. 5.

    Pretest seeding as a form of initialization of the student model mastery levels so that students who demonstrate a priori knowledge can move more quickly through concepts that need not be covered in as much depth;

  6. 6.

    Different mastery level calculation approaches to best represent the learning of a student for content selection; and

  7. 7.

    A novelty metric to guide gameplay so that it is diverse and engaging to the student.

We also described the design and implementation of an automated testing framework that allowed the behaviors of the student modeling components to be evaluated with respect to their software design, algorithm implementations, and parameter adjustments, prior to testing with students.

Together this work describes several different design choices that leverage aspects that have been effective in human tutors [2] and intelligent tutoring systems [4, 15] and contributes to a growing body of research that has described integrating intelligent tutoring into video games [5, 6, 10, 11, 19].